14
Research Article Comparative Genomics of Ten Solanaceous Plastomes Harpreet Kaur, 1 Bhupinder Pal Singh, 1 Harpreet Singh, 2 and Avinash Kaur Nagpal 1 1 Department of Botanical and Environmental Sciences, Guru Nanak Dev University, Amritsar 143005, India 2 Department of Bioinformatics, Hans Raj Mahila Maha Vidyalaya, Jalandhar 144008, India Correspondence should be addressed to Avinash Kaur Nagpal; [email protected] Received 26 August 2014; Accepted 14 October 2014; Published 17 November 2014 Academic Editor: Paul Harrison Copyright © 2014 Harpreet Kaur et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Availability of complete plastid genomes of ten solanaceous species, Atropa belladonna, Capsicum annuum, Datura stramonium, Nicotiana sylvestris, Nicotiana tabacum, Nicotiana tomentosiformis, Nicotiana undulata, Solanum bulbocastanum, Solanum lycopersicum, and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depth. e size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied till date (exception: SSC region of A. belladonna). AT content of coding regions was found to be less than noncoding regions. A duplicate copy of trnH gene in C. annuum and two alternative tRNA genes for proline in D. stramonium were observed for the first time in this analysis. Further, homology search revealed the presence of rps19 pseudogene and infA genes in A. belladonna and D. stramonium, a region identical to rps19 pseudogene in C. annum and orthologues of sprA gene in another six species. Among the eighteen intron-containing genes, 3 genes have two introns and 15 genes have one intron. e longest insertion was found in accD gene in C. annuum. Phylogenetic analysis using concatenated protein coding sequences gave two clades, one for Nicotiana species and another for Solanum, Capsicum, Atropa, and Datura. 1. Introduction Chloroplasts are essential cellular organelles within plant cells possessing the enzymatic machinery for the process of photo- synthesis which provides essential energy to plants. Besides photosynthesis, chloroplasts are also involved in biosynthesis of fatty acids, amino acids, pigments, and vitamins [1, 2]. De- spite enormous divergence in whole plant form and habi- tat, chloroplast structure and function have remained re- markably conserved which might be due to intense evolution- ary selection pressures associated with the functional require- ments of photosynthesis [37]. e chloroplast genome is actually a reduced genome derived from a cyano- bacterial ancestor that was captured early in the evolution of the eukaryotic cell [8, 9]. Among the three genomes of the plant cell, the plastome is the most gene dense with more than 100 genes in a genome of only 120 to 210 kb [10]. In the last two decades, the nucleotide sequences of large number of plastid genomes have been published leading to better understanding of their organization and evolution [2, 11, 12]. Currently, about 470 eukaryotic chloroplast genomes have been sequenced completely (http://www.ncbi.nlm.nih.gov/ genomes/GenomesHome.cgi?taxid=2759&hopt=html) with the best representation from flowering plants. Most land plant chloroplast genomes are composed of a single circular chromosome with a quadripartite structure which includes two copies of an inverted repeat (IR) region that separates the large and small single copy regions (LSC and SSC). Genes of chloroplast genomes of higher plants can be divided into three broad categories [13, 14]. In the first, there are genetic system genes encoding for rRNAs, tRNAs, ribosomal proteins, and RNA polymerase subunits. e sec- ond category is comprised of genes for photosynthesis which encode subunits of the two photosystems, the cytochrome b6f complex and the ATP synthase. Open reading frames (orfs) of unknown function constitute the third category. Besides, there are some other genes coding for different kinds of proteins including infA, matK, clpP, cemA, accD, and ccsA. Although overall chloroplast genome organization is highly conserved among taxa, structural rearrangements due to inversions have been reported in different taxa like Campanulaceae [15], Cyatheaceae [16], Fabaceae [17], Hindawi Publishing Corporation Advances in Bioinformatics Volume 2014, Article ID 424873, 13 pages http://dx.doi.org/10.1155/2014/424873

Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Research ArticleComparative Genomics of Ten Solanaceous Plastomes

Harpreet Kaur1 Bhupinder Pal Singh1 Harpreet Singh2 and Avinash Kaur Nagpal1

1 Department of Botanical and Environmental Sciences Guru Nanak Dev University Amritsar 143005 India2Department of Bioinformatics Hans Raj Mahila Maha Vidyalaya Jalandhar 144008 India

Correspondence should be addressed to Avinash Kaur Nagpal avnagpalyahoocoin

Received 26 August 2014 Accepted 14 October 2014 Published 17 November 2014

Academic Editor Paul Harrison

Copyright copy 2014 Harpreet Kaur et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Availability of complete plastid genomes of ten solanaceous species Atropa belladonna Capsicum annuum Datura stramoniumNicotiana sylvestris Nicotiana tabacum Nicotiana tomentosiformis Nicotiana undulata Solanum bulbocastanum Solanumlycopersicum and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depthThe size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than thatof any other species studied till date (exception SSC region ofA belladonna) AT content of coding regions was found to be less thannoncoding regions A duplicate copy of trnH gene in C annuum and two alternative tRNA genes for proline inD stramoniumwereobserved for the first time in this analysis Further homology search revealed the presence of rps19 pseudogene and infA genes inA belladonna andD stramonium a region identical to rps19 pseudogene in C annum and orthologues of sprA gene in another sixspecies Among the eighteen intron-containing genes 3 genes have two introns and 15 genes have one intronThe longest insertionwas found in accD gene in C annuum Phylogenetic analysis using concatenated protein coding sequences gave two clades one forNicotiana species and another for Solanum Capsicum Atropa and Datura

1 Introduction

Chloroplasts are essential cellular organelles within plant cellspossessing the enzymaticmachinery for the process of photo-synthesis which provides essential energy to plants Besidesphotosynthesis chloroplasts are also involved in biosynthesisof fatty acids amino acids pigments and vitamins [1 2] De-spite enormous divergence in whole plant form and habi-tat chloroplast structure and function have remained re-markably conservedwhichmight be due to intense evolution-ary selection pressures associatedwith the functional require-ments of photosynthesis [3ndash7] The chloroplast genomeis actually a reduced genome derived from a cyano-bacterial ancestor that was captured early in the evolution ofthe eukaryotic cell [8 9] Among the three genomes of theplant cell the plastome is the most gene dense with morethan 100 genes in a genome of only 120 to 210 kb [10] In thelast two decades the nucleotide sequences of large numberof plastid genomes have been published leading to betterunderstanding of their organization and evolution [2 11 12]Currently about 470 eukaryotic chloroplast genomes have

been sequenced completely (httpwwwncbinlmnihgovgenomesGenomesHomecgitaxid=2759amphopt=html) withthe best representation from flowering plants

Most land plant chloroplast genomes are composed ofa single circular chromosome with a quadripartite structurewhich includes two copies of an inverted repeat (IR) regionthat separates the large and small single copy regions (LSCand SSC) Genes of chloroplast genomes of higher plants canbe divided into three broad categories [13 14] In the firstthere are genetic system genes encoding for rRNAs tRNAsribosomal proteins and RNA polymerase subunits The sec-ond category is comprised of genes for photosynthesis whichencode subunits of the two photosystems the cytochromeb6f complex and the ATP synthase Open reading frames(orfs) of unknown function constitute the third categoryBesides there are some other genes coding for differentkinds of proteins including infA matK clpP cemA accDand ccsA Although overall chloroplast genome organizationis highly conserved among taxa structural rearrangementsdue to inversions have been reported in different taxalike Campanulaceae [15] Cyatheaceae [16] Fabaceae [17]

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2014 Article ID 424873 13 pageshttpdxdoiorg1011552014424873

2 Advances in Bioinformatics

Funariaceae [18] Geraniaceae [19] Onagraceae [20] andPoaceae [21 22] Besides structural rearrangements sequencepolymorphisms have also been reported in some cereals[23 24] and Oenothera species [20] These studies revealedthat highly divergent sequences were concentrated in specificregions called ldquohotspotsrdquo Such sequence polymorphismshave been used to derive phylogenetic relationships amongspecies

Solanaceae is an important family of dicots comprisingmore than 3000 species placed within about 90 genera Itis an ethnobotanical family and is extensively utilized byhumans and has recently become amodel of comparative andevolutionary genomics research Few efforts have been madeto study the variations in chloroplast genomes of Solanaceaefamily by using in silico tools Most of these attempts havebeen concentrated on comparison of newly sequencedchloroplast genome with the available complete chloroplastgenomes from some members of this family [25ndash29] Theavailability of complete nucleotide sequences of plastidgenomes of ten solanaceous species Atropa belladonna(NC 0045611 [30]) Capsicum annuum (NC 0185521 [29])Datura stramonium (NC 0181171 Li et al (unpublished))Nicotiana sylvestris (NC 0075001 [28]) Nicotiana taba-cum (NC 0018792 [31]) Nicotiana tomentosiformis(NC 0076021 [28])Nicotiana undulata (NC 0160681 [32])Solanum bulbocastanum (NC 0079431 [26]) Solanum lyco-persicum (NC 0078982 [27]) and Solanum tuberosum(NC 0080962 [33]) provided us with an opportunity toconduct their in silico comparative analysis in depth Hencethe present study is an attempt to compare the genomeorganization structure and coding capacity of chloroplastgenomes of ten solanaceous species The study focuses onlengthmutations intron-containing genes grouping of genesin different identity classes based on pairwise comparison ofindividual genes and InDel analysis of divergent genes

2 Materials and Methods

21 Sequence Analysis Whole chloroplast genome sequencesas well as individual gene and protein sequences of tensolanaceous species were obtained from ldquoOrganelle GenomeResourcesrdquo section of NCBI in Genbank as well as inFasta format Sequence regions corresponding to variousgenomic features including genes exons introns and cdswere specifically extracted from the Genbank files usingExtractfeat Extractseq and Featcopy programs from Jem-boss package AT percentage for different genomic regionswas calculated using Wordcount and Union programs fromJemboss package Pairwise comparison of gene sequenceswasdone by using NCBI BLAST program and multiple sequencealignment of nucleotide as well as protein sequences wasdone by using ClustalW Alignments of protein sequences forsome of the genes weremanually edited in correspondence toInDels observed in alignments of their nucleotide sequences

22 Phylogenetic Analysis of Concatenated Protein-CodingGenes 75 protein-coding genes of plastomes of ten solana-ceous species and two outgroup species (Daucus carota and

Coffea arabica) were selected for phylogenetic analysis fromthe total of 79 classified protein-coding genes excludingaccD rpl20 ycf1 and ycf15 Ycf15 was excluded due to itsabsence on the plastome of both outgroup species chosenwhile the other three were not included in the phylogeneticanalysis due to their high levels of variation Multiplesequence alignment of each gene was obtained usingClustalW (httpswwwebiacukToolsmsaclustalw2)These alignments were then concatenated using standaloneBIOEDIT version 725 (httpwwwmbioncsuedubioeditbioedithtml) and maximum likelihood phylogenetictree with 500 bootstrap iterations was constructed usingPhyMLv30 (httpwwwatgc-montpellierfrphyml) Agraphical view of tree was generated using Archaeopteryx0988 SR (httpssitesgooglecomsitecmzmasekhomesoftwarearchaeopteryx)

3 Results and Discussion

31 Comparison of Properties of Chloroplast Genomes Com-parison of the properties of plastid genomes of ten solana-ceous species with respect to their genome size (size of com-plete plastid genome and LSC SSC and IR regions) percentcoding regions introns and intergenic regions AT contentof overall plastid genomes as well as coding and noncodingregions is presented in Table 1 The total plastid genomesize ranged from 155296 bp (S tuberosum) to 156781 bp (Cannuum) The large size of plastome of C annuum can beattributed to large LSC region as compared to other speciesOn the contrary size of SSC region inC annuumwas the leastas compared to other speciesThe largest size of IR regionwasin A belladonna Among four Nicotiana species studied Nsylvestris and N tabacum were almost identical to each otherwith respect to size of complete genome (difference of only2 bps) or LSC SSC or IR regions compared with plastome ofany other species studied However the percent coding regionwas slightlymore forN sylvestris (6149) than inN tabacum(6112) The size of complete chloroplast genome and LSCand SSC regions of three species of Solanum is comparativelysmaller than that of any other species studied except forA belladonna where size of SSC region was the smallest(18008 bp) However the size of IR region of Solanum speciesis larger as compared to Nicotiana species Coding regionpercentage was found to be higher in Nicotiana species ascompared to all other species with maximum forN undulata(6312) andminimum for S tuberosum (5845)Maximumof 128 of the plastome was shown to be introns for S bulbo-castanum whereas minimum intron percentage (1162) wasobserved for D stramonium Maximum percentage (2919)of intergenic region was observed in D stramonium andminimum (2419) was observed in N undulata The ATcontent of noncoding regions was found to be higher ascompared to coding regions for all the ten species studiedSimilarly protein-coding regions have shown higher contentof AT base pairs as compared to RNA coding genes whichcan be explained by the requirement of more GC base pairsfor proper folding of highly structured ribosomal RNAs andtRNAs [13ndash27] Comparison of AT content in LSC SSC andIR regions reveals that AT content was the highest in SSC

Advances in Bioinformatics 3

Table1Prop

ertieso

fthe

solanaceou

schlorop

lastgeno

mes

Prop

erty

Nam

eofspecies

ABE

CAN

DST

NSY

NTA

NTO

NUN

SBU

SLY

STU

Genom

esize(bp

)1566

87156781

155871

155941

155943

155745

155863

155371

155461

155296

LSC(bp)

(coo

rdinates)lowast

86869

(1ndash86869)

87366

(1ndash87366)

86297

(1ndash86297)

86684

(1ndash86684)

86686

(1ndash86686)

86392

(1ndash86392)

86633

(1ndash86633)

85785

(1ndash85785)

85882

(1ndash85882)

85737

(1ndash85737)

IRB(bp)

(coo

rdinates)lowast

25905

(86870ndash112774)

25783

(87367ndash113149)

25563

(86298ndash111860)

25342

(866

85ndash112026)

25343

(866

87ndash112029)

25429

(86393ndash1118

21)

25331

(86634ndash1119

64)

25588

(85786ndash1113

73)

25608

(85883ndash1114

90)

25593

(85738ndash1113

30)

SSC(bp)

(coo

rdinates)lowast

18008

(112775ndash130782)

17849

(113150ndash

130998)

1844

8(111861ndash130308)

18573

(112027ndash130599)

18571

(112030ndash

1306

00)

18495

(1118

22ndash130316)

18568

(1119

65ndash130532)

18381

(111374ndash

129754)

18363

(1114

91ndash129853)

18373

(111331ndash129703)

IRA(bp)

(coo

rdinates)lowast

25905

(130783ndash1566

87)

25783

(130999ndash

156781)

25563

(130309ndash

155871)

25342

(13060

0ndash155941)

25343

(130601ndash155943)

25429

(130317ndash155745)

25331

(130533ndash155863)

25588

(129755ndash155342)

25608

(129854ndash

155461)

25593

(129704ndash

155296)

Cod

ingregion

s()

5889

5850

5919

6149

6112

6158

6312

5852

5891

5845

Intro

ns(

)1251

1271

1162

1270

1270

1268

1269

1282

1247

1249

Intergenicregion

s()

2860

2879

2919

2581

2618

2573

2419

2866

2862

2906

ATcontent()

Overall

6244

6227

6212

6215

6215

6221

6212

6212

6214

6212

Cod

ingregion

s5986

5968

5965

5985

5979

5979

5970

5961

5965

5959

Non

coding

region

s6613

6593

6572

6584

6587

6609

6627

6566

6571

6568

tRNAs

4770

4738

4708

4706

4705

4710

4708

4712

4701

4706

rRNAs

4464

4473

4463

4464

4464

4464

4464

4466

4466

4465

Protein-coding

genes

6201

6183

6179

6191

6186

6184

6168

6176

6180

6174

LSC

6437

6425

6404

6405

6405

6412

6401

6399

6401

6399

SSC

6835

6799

6772

6794

6793

6803

6787

6787

6797

6791

IR5714

5694

5687

5678

5678

5684

5678

5693

5691

5690

ABE

Atro

pabelladonn

aCA

NC

apsicum

annu

umD

STD

aturastram

oniumN

SYN

icotia

nasylve

stris

NTA

Nico

tiana

tabacumN

TON

icotia

natomentosiformis

NUNN

icotia

naun

dulataS

BUS

olanum

bulbocastanu

mSLY

Solanum

lycopersicum

STU

Solanum

tuberosumLSC

large

singlec

opyregion

SSC

smallsinglec

opyregion

and

IRinvertedrepeatregion

lowast

Startand

endpo

sitionof

nucle

otideintheg

enom

e

4 Advances in Bioinformatics

regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region

32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow

(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH

(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent

(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species

(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search

33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in

Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns

34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged

Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below

(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding

Advances in Bioinformatics 5

Table 2 The lengths of introns and exons for the split genes of ten solanaceous species

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

trnK-UUU(LSC)

Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35

rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227

trnG-UCC(LSC)

Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48

atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410

rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614

ycf3 (LSC)

Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153

trnL-UAA(LSC)

Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50

trnV-UAC(LSC)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35

rps12lowast

Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26

clpP (LSC)

Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234

petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642

petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

2 Advances in Bioinformatics

Funariaceae [18] Geraniaceae [19] Onagraceae [20] andPoaceae [21 22] Besides structural rearrangements sequencepolymorphisms have also been reported in some cereals[23 24] and Oenothera species [20] These studies revealedthat highly divergent sequences were concentrated in specificregions called ldquohotspotsrdquo Such sequence polymorphismshave been used to derive phylogenetic relationships amongspecies

Solanaceae is an important family of dicots comprisingmore than 3000 species placed within about 90 genera Itis an ethnobotanical family and is extensively utilized byhumans and has recently become amodel of comparative andevolutionary genomics research Few efforts have been madeto study the variations in chloroplast genomes of Solanaceaefamily by using in silico tools Most of these attempts havebeen concentrated on comparison of newly sequencedchloroplast genome with the available complete chloroplastgenomes from some members of this family [25ndash29] Theavailability of complete nucleotide sequences of plastidgenomes of ten solanaceous species Atropa belladonna(NC 0045611 [30]) Capsicum annuum (NC 0185521 [29])Datura stramonium (NC 0181171 Li et al (unpublished))Nicotiana sylvestris (NC 0075001 [28]) Nicotiana taba-cum (NC 0018792 [31]) Nicotiana tomentosiformis(NC 0076021 [28])Nicotiana undulata (NC 0160681 [32])Solanum bulbocastanum (NC 0079431 [26]) Solanum lyco-persicum (NC 0078982 [27]) and Solanum tuberosum(NC 0080962 [33]) provided us with an opportunity toconduct their in silico comparative analysis in depth Hencethe present study is an attempt to compare the genomeorganization structure and coding capacity of chloroplastgenomes of ten solanaceous species The study focuses onlengthmutations intron-containing genes grouping of genesin different identity classes based on pairwise comparison ofindividual genes and InDel analysis of divergent genes

2 Materials and Methods

21 Sequence Analysis Whole chloroplast genome sequencesas well as individual gene and protein sequences of tensolanaceous species were obtained from ldquoOrganelle GenomeResourcesrdquo section of NCBI in Genbank as well as inFasta format Sequence regions corresponding to variousgenomic features including genes exons introns and cdswere specifically extracted from the Genbank files usingExtractfeat Extractseq and Featcopy programs from Jem-boss package AT percentage for different genomic regionswas calculated using Wordcount and Union programs fromJemboss package Pairwise comparison of gene sequenceswasdone by using NCBI BLAST program and multiple sequencealignment of nucleotide as well as protein sequences wasdone by using ClustalW Alignments of protein sequences forsome of the genes weremanually edited in correspondence toInDels observed in alignments of their nucleotide sequences

22 Phylogenetic Analysis of Concatenated Protein-CodingGenes 75 protein-coding genes of plastomes of ten solana-ceous species and two outgroup species (Daucus carota and

Coffea arabica) were selected for phylogenetic analysis fromthe total of 79 classified protein-coding genes excludingaccD rpl20 ycf1 and ycf15 Ycf15 was excluded due to itsabsence on the plastome of both outgroup species chosenwhile the other three were not included in the phylogeneticanalysis due to their high levels of variation Multiplesequence alignment of each gene was obtained usingClustalW (httpswwwebiacukToolsmsaclustalw2)These alignments were then concatenated using standaloneBIOEDIT version 725 (httpwwwmbioncsuedubioeditbioedithtml) and maximum likelihood phylogenetictree with 500 bootstrap iterations was constructed usingPhyMLv30 (httpwwwatgc-montpellierfrphyml) Agraphical view of tree was generated using Archaeopteryx0988 SR (httpssitesgooglecomsitecmzmasekhomesoftwarearchaeopteryx)

3 Results and Discussion

31 Comparison of Properties of Chloroplast Genomes Com-parison of the properties of plastid genomes of ten solana-ceous species with respect to their genome size (size of com-plete plastid genome and LSC SSC and IR regions) percentcoding regions introns and intergenic regions AT contentof overall plastid genomes as well as coding and noncodingregions is presented in Table 1 The total plastid genomesize ranged from 155296 bp (S tuberosum) to 156781 bp (Cannuum) The large size of plastome of C annuum can beattributed to large LSC region as compared to other speciesOn the contrary size of SSC region inC annuumwas the leastas compared to other speciesThe largest size of IR regionwasin A belladonna Among four Nicotiana species studied Nsylvestris and N tabacum were almost identical to each otherwith respect to size of complete genome (difference of only2 bps) or LSC SSC or IR regions compared with plastome ofany other species studied However the percent coding regionwas slightlymore forN sylvestris (6149) than inN tabacum(6112) The size of complete chloroplast genome and LSCand SSC regions of three species of Solanum is comparativelysmaller than that of any other species studied except forA belladonna where size of SSC region was the smallest(18008 bp) However the size of IR region of Solanum speciesis larger as compared to Nicotiana species Coding regionpercentage was found to be higher in Nicotiana species ascompared to all other species with maximum forN undulata(6312) andminimum for S tuberosum (5845)Maximumof 128 of the plastome was shown to be introns for S bulbo-castanum whereas minimum intron percentage (1162) wasobserved for D stramonium Maximum percentage (2919)of intergenic region was observed in D stramonium andminimum (2419) was observed in N undulata The ATcontent of noncoding regions was found to be higher ascompared to coding regions for all the ten species studiedSimilarly protein-coding regions have shown higher contentof AT base pairs as compared to RNA coding genes whichcan be explained by the requirement of more GC base pairsfor proper folding of highly structured ribosomal RNAs andtRNAs [13ndash27] Comparison of AT content in LSC SSC andIR regions reveals that AT content was the highest in SSC

Advances in Bioinformatics 3

Table1Prop

ertieso

fthe

solanaceou

schlorop

lastgeno

mes

Prop

erty

Nam

eofspecies

ABE

CAN

DST

NSY

NTA

NTO

NUN

SBU

SLY

STU

Genom

esize(bp

)1566

87156781

155871

155941

155943

155745

155863

155371

155461

155296

LSC(bp)

(coo

rdinates)lowast

86869

(1ndash86869)

87366

(1ndash87366)

86297

(1ndash86297)

86684

(1ndash86684)

86686

(1ndash86686)

86392

(1ndash86392)

86633

(1ndash86633)

85785

(1ndash85785)

85882

(1ndash85882)

85737

(1ndash85737)

IRB(bp)

(coo

rdinates)lowast

25905

(86870ndash112774)

25783

(87367ndash113149)

25563

(86298ndash111860)

25342

(866

85ndash112026)

25343

(866

87ndash112029)

25429

(86393ndash1118

21)

25331

(86634ndash1119

64)

25588

(85786ndash1113

73)

25608

(85883ndash1114

90)

25593

(85738ndash1113

30)

SSC(bp)

(coo

rdinates)lowast

18008

(112775ndash130782)

17849

(113150ndash

130998)

1844

8(111861ndash130308)

18573

(112027ndash130599)

18571

(112030ndash

1306

00)

18495

(1118

22ndash130316)

18568

(1119

65ndash130532)

18381

(111374ndash

129754)

18363

(1114

91ndash129853)

18373

(111331ndash129703)

IRA(bp)

(coo

rdinates)lowast

25905

(130783ndash1566

87)

25783

(130999ndash

156781)

25563

(130309ndash

155871)

25342

(13060

0ndash155941)

25343

(130601ndash155943)

25429

(130317ndash155745)

25331

(130533ndash155863)

25588

(129755ndash155342)

25608

(129854ndash

155461)

25593

(129704ndash

155296)

Cod

ingregion

s()

5889

5850

5919

6149

6112

6158

6312

5852

5891

5845

Intro

ns(

)1251

1271

1162

1270

1270

1268

1269

1282

1247

1249

Intergenicregion

s()

2860

2879

2919

2581

2618

2573

2419

2866

2862

2906

ATcontent()

Overall

6244

6227

6212

6215

6215

6221

6212

6212

6214

6212

Cod

ingregion

s5986

5968

5965

5985

5979

5979

5970

5961

5965

5959

Non

coding

region

s6613

6593

6572

6584

6587

6609

6627

6566

6571

6568

tRNAs

4770

4738

4708

4706

4705

4710

4708

4712

4701

4706

rRNAs

4464

4473

4463

4464

4464

4464

4464

4466

4466

4465

Protein-coding

genes

6201

6183

6179

6191

6186

6184

6168

6176

6180

6174

LSC

6437

6425

6404

6405

6405

6412

6401

6399

6401

6399

SSC

6835

6799

6772

6794

6793

6803

6787

6787

6797

6791

IR5714

5694

5687

5678

5678

5684

5678

5693

5691

5690

ABE

Atro

pabelladonn

aCA

NC

apsicum

annu

umD

STD

aturastram

oniumN

SYN

icotia

nasylve

stris

NTA

Nico

tiana

tabacumN

TON

icotia

natomentosiformis

NUNN

icotia

naun

dulataS

BUS

olanum

bulbocastanu

mSLY

Solanum

lycopersicum

STU

Solanum

tuberosumLSC

large

singlec

opyregion

SSC

smallsinglec

opyregion

and

IRinvertedrepeatregion

lowast

Startand

endpo

sitionof

nucle

otideintheg

enom

e

4 Advances in Bioinformatics

regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region

32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow

(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH

(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent

(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species

(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search

33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in

Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns

34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged

Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below

(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding

Advances in Bioinformatics 5

Table 2 The lengths of introns and exons for the split genes of ten solanaceous species

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

trnK-UUU(LSC)

Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35

rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227

trnG-UCC(LSC)

Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48

atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410

rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614

ycf3 (LSC)

Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153

trnL-UAA(LSC)

Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50

trnV-UAC(LSC)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35

rps12lowast

Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26

clpP (LSC)

Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234

petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642

petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 3

Table1Prop

ertieso

fthe

solanaceou

schlorop

lastgeno

mes

Prop

erty

Nam

eofspecies

ABE

CAN

DST

NSY

NTA

NTO

NUN

SBU

SLY

STU

Genom

esize(bp

)1566

87156781

155871

155941

155943

155745

155863

155371

155461

155296

LSC(bp)

(coo

rdinates)lowast

86869

(1ndash86869)

87366

(1ndash87366)

86297

(1ndash86297)

86684

(1ndash86684)

86686

(1ndash86686)

86392

(1ndash86392)

86633

(1ndash86633)

85785

(1ndash85785)

85882

(1ndash85882)

85737

(1ndash85737)

IRB(bp)

(coo

rdinates)lowast

25905

(86870ndash112774)

25783

(87367ndash113149)

25563

(86298ndash111860)

25342

(866

85ndash112026)

25343

(866

87ndash112029)

25429

(86393ndash1118

21)

25331

(86634ndash1119

64)

25588

(85786ndash1113

73)

25608

(85883ndash1114

90)

25593

(85738ndash1113

30)

SSC(bp)

(coo

rdinates)lowast

18008

(112775ndash130782)

17849

(113150ndash

130998)

1844

8(111861ndash130308)

18573

(112027ndash130599)

18571

(112030ndash

1306

00)

18495

(1118

22ndash130316)

18568

(1119

65ndash130532)

18381

(111374ndash

129754)

18363

(1114

91ndash129853)

18373

(111331ndash129703)

IRA(bp)

(coo

rdinates)lowast

25905

(130783ndash1566

87)

25783

(130999ndash

156781)

25563

(130309ndash

155871)

25342

(13060

0ndash155941)

25343

(130601ndash155943)

25429

(130317ndash155745)

25331

(130533ndash155863)

25588

(129755ndash155342)

25608

(129854ndash

155461)

25593

(129704ndash

155296)

Cod

ingregion

s()

5889

5850

5919

6149

6112

6158

6312

5852

5891

5845

Intro

ns(

)1251

1271

1162

1270

1270

1268

1269

1282

1247

1249

Intergenicregion

s()

2860

2879

2919

2581

2618

2573

2419

2866

2862

2906

ATcontent()

Overall

6244

6227

6212

6215

6215

6221

6212

6212

6214

6212

Cod

ingregion

s5986

5968

5965

5985

5979

5979

5970

5961

5965

5959

Non

coding

region

s6613

6593

6572

6584

6587

6609

6627

6566

6571

6568

tRNAs

4770

4738

4708

4706

4705

4710

4708

4712

4701

4706

rRNAs

4464

4473

4463

4464

4464

4464

4464

4466

4466

4465

Protein-coding

genes

6201

6183

6179

6191

6186

6184

6168

6176

6180

6174

LSC

6437

6425

6404

6405

6405

6412

6401

6399

6401

6399

SSC

6835

6799

6772

6794

6793

6803

6787

6787

6797

6791

IR5714

5694

5687

5678

5678

5684

5678

5693

5691

5690

ABE

Atro

pabelladonn

aCA

NC

apsicum

annu

umD

STD

aturastram

oniumN

SYN

icotia

nasylve

stris

NTA

Nico

tiana

tabacumN

TON

icotia

natomentosiformis

NUNN

icotia

naun

dulataS

BUS

olanum

bulbocastanu

mSLY

Solanum

lycopersicum

STU

Solanum

tuberosumLSC

large

singlec

opyregion

SSC

smallsinglec

opyregion

and

IRinvertedrepeatregion

lowast

Startand

endpo

sitionof

nucle

otideintheg

enom

e

4 Advances in Bioinformatics

regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region

32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow

(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH

(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent

(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species

(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search

33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in

Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns

34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged

Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below

(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding

Advances in Bioinformatics 5

Table 2 The lengths of introns and exons for the split genes of ten solanaceous species

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

trnK-UUU(LSC)

Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35

rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227

trnG-UCC(LSC)

Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48

atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410

rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614

ycf3 (LSC)

Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153

trnL-UAA(LSC)

Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50

trnV-UAC(LSC)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35

rps12lowast

Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26

clpP (LSC)

Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234

petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642

petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

4 Advances in Bioinformatics

regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region

32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow

(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH

(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent

(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species

(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search

33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in

Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns

34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged

Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below

(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding

Advances in Bioinformatics 5

Table 2 The lengths of introns and exons for the split genes of ten solanaceous species

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

trnK-UUU(LSC)

Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35

rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227

trnG-UCC(LSC)

Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48

atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410

rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614

ycf3 (LSC)

Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153

trnL-UAA(LSC)

Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50

trnV-UAC(LSC)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35

rps12lowast

Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26

clpP (LSC)

Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234

petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642

petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 5

Table 2 The lengths of introns and exons for the split genes of ten solanaceous species

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

trnK-UUU(LSC)

Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35

rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227

trnG-UCC(LSC)

Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48

atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410

rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614

ycf3 (LSC)

Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153

trnL-UAA(LSC)

Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50

trnV-UAC(LSC)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35

rps12lowast

Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26

clpP (LSC)

Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234

petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642

petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

6 Advances in Bioinformatics

Table 2 Continued

Gene(region)

Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU

rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396

rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434

ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756

trnI-GAU(IR)

Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35

trnA-UGC(IR)

Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35

ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539

lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum

Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes

S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6

2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30

3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns

Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes

S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 7

AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT

AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA

TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC

--GTAA-GGTCA

AATAGA

--GTCA-GGTCA

AAT--------- -GGTCA-GGTCA

TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA

TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC

ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------

InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24

ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT

ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA

ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG

TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC

TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA

ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA

SprA

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG

CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT

CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT

accD

InDel 1 InDel 2 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG

InDel 3

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATA------- --AAA -AGAA

ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------

AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA

AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA

TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA

ATT- --CAAA---CAAA--CAAA--CAAA--CAAA

ATTGTTTTTTTT TTTT-CAAA

ATTGTTTTT-

ATT-------

GTT

-GTT

CTT-

AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT

InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA

TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT

TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT

TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC

TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT

GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA

AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC

ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA

AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG

ndhA

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

ABECANDSTNSYNTANTONUNSBUSLYSTU

TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG

ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG

GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG

TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG

CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT

AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT

rps16

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6

ABECANDSTNSYNTANTONUNSBUSLYSTU

TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT

TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT

TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT

TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT

TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT

TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC

TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA

InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2

ABECANDSTNSYNTANTONUNSBUSLYSTU

rpl32

TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA

GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA

TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG

GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA

AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA

--TACA--TACA--TAAA--TACA--TACA--TACA--TACA

---TACA

AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA

clpP

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 6

GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG

TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA

TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT

ATTGTTTTTTTTTT

ATTGTTTTTTTTTT

CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT

TTCTTATT

AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG

AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG

GGGAATAGAAAGGAGGGG

ATCATACTGGAAAATC

ATGATGATGATAATGATGATAATG

TTTTTTTTTTTTTT

TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA

TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT

Figure 1 Continued

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

8 Advances in Bioinformatics

AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA

AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA

ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT

TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA

tRNA-Leu(UAA)

InDel 1 InDel 2 InDel 3 InDel 4

ABE CANDSTNSYNTANTONUNSBUSLYSTU

---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT

GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG

AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA

AAAG---------------------

AAAGGGTGGAAAGTGAG

GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5

ABECANDSTNSYNTANTONUNSBUSLYSTU

CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT

TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG

AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG

TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC

AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT

InDel 6 InDel 7 InDel 8 InDel 9 InDel 10

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG

CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G

GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA

GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA

AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG

CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA

InDel 11 InDel 13 InDel 14 InDel 15 InDel 16

ABECANDSTNSYNTANTONUNSBUSLYSTU

CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG

TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA

TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT

CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA

InDel 17 InDel 18 InDel 19 InDel 20

ABECANDSTNSYNTANTONUNSBUSLYSTU

AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA

CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG

ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG

ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG

GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA

AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA

AAACAAAACTT

AACTT

------CTT

AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT

InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27

ABECANDSTNSYNTANTONUNSBUSLYSTU

TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT

TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG

AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT

TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA

InDel 28 InDel 29 InDel 30 InDel 31

ABECANDSTNSYNTANTONUNSBUSLYSTU

--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC

tRNA-Ala(UGC)

InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 12

AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT

AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT

Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 9

INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI

HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI

LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI

QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF

NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK

LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN

LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF

IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG

InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23

QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ

NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF

SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK

PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER

DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD

PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK

InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29

NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS

YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM

SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE

AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK

accD

InDel 1 InDel 2 InDel 3 InDel 4

ABECANDSTNSYNTANTONUNSBUSLYSTU

-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ

NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL

KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT

TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE

KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST

PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF

SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE

GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN

NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN

ycf1

InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9

NTONUNSBUSLYSTU

ABECANDSTNSYNTA

PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV

IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------

SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT

RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN

clpP rpl32 rps16

InDel 1 InDel 2 InDel 1 InDel 1

ABECANDSTNSYNTANTONUNSBUSLYSTU

ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF

SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN

DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA

ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE

FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI

RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK

InDel 10 InDel 12 InDel 13 InDel 14

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

ABECANDSTNSYNTANTONUNSBUSLYSTU

InDel 11 InDel 15

Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens

protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]

(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to

the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated

(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part

(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

10 Advances in Bioinformatics

The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally

(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral

(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally

(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function

(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum

(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four

Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa

35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]

Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]

4 Conclusions

The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 11

0051 CAR

0083 DCA

0461500

0002500

0004 NTO

0403

0003 NUN

0002500

0 NSY

0 NTA

0001468

0006 ABE

0001500

0006 DST

0001500

0007 CAN

0007500

0003 SLY

0001500

0001 SBU

0001 STU

Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species

found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur

References

[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006

[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011

[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985

[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994

[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004

[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005

[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008

[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998

[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999

[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001

[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009

[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010

[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991

[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992

[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997

[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

12 Advances in Bioinformatics

[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000

[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007

[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987

[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008

[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992

[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003

[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008

[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009

[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006

[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006

[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006

[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006

[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large

insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011

[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002

[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997

[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012

[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005

[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004

[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005

[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991

[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987

[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994

[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997

[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008

[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991

[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007

[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006

[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Advances in Bioinformatics 13

the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010

[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007

[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997

[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Comparative Genomics of Ten …downloads.hindawi.com/journals/abi/2014/424873.pdfResearch Article Comparative Genomics of Ten Solanaceous Plastomes HarpreetKaur, 1

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology