Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Research ArticleComparative Genomics of Ten Solanaceous Plastomes
Harpreet Kaur1 Bhupinder Pal Singh1 Harpreet Singh2 and Avinash Kaur Nagpal1
1 Department of Botanical and Environmental Sciences Guru Nanak Dev University Amritsar 143005 India2Department of Bioinformatics Hans Raj Mahila Maha Vidyalaya Jalandhar 144008 India
Correspondence should be addressed to Avinash Kaur Nagpal avnagpalyahoocoin
Received 26 August 2014 Accepted 14 October 2014 Published 17 November 2014
Academic Editor Paul Harrison
Copyright copy 2014 Harpreet Kaur et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Availability of complete plastid genomes of ten solanaceous species Atropa belladonna Capsicum annuum Datura stramoniumNicotiana sylvestris Nicotiana tabacum Nicotiana tomentosiformis Nicotiana undulata Solanum bulbocastanum Solanumlycopersicum and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depthThe size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than thatof any other species studied till date (exception SSC region ofA belladonna) AT content of coding regions was found to be less thannoncoding regions A duplicate copy of trnH gene in C annuum and two alternative tRNA genes for proline inD stramoniumwereobserved for the first time in this analysis Further homology search revealed the presence of rps19 pseudogene and infA genes inA belladonna andD stramonium a region identical to rps19 pseudogene in C annum and orthologues of sprA gene in another sixspecies Among the eighteen intron-containing genes 3 genes have two introns and 15 genes have one intronThe longest insertionwas found in accD gene in C annuum Phylogenetic analysis using concatenated protein coding sequences gave two clades one forNicotiana species and another for Solanum Capsicum Atropa and Datura
1 Introduction
Chloroplasts are essential cellular organelles within plant cellspossessing the enzymaticmachinery for the process of photo-synthesis which provides essential energy to plants Besidesphotosynthesis chloroplasts are also involved in biosynthesisof fatty acids amino acids pigments and vitamins [1 2] De-spite enormous divergence in whole plant form and habi-tat chloroplast structure and function have remained re-markably conservedwhichmight be due to intense evolution-ary selection pressures associatedwith the functional require-ments of photosynthesis [3ndash7] The chloroplast genomeis actually a reduced genome derived from a cyano-bacterial ancestor that was captured early in the evolution ofthe eukaryotic cell [8 9] Among the three genomes of theplant cell the plastome is the most gene dense with morethan 100 genes in a genome of only 120 to 210 kb [10] In thelast two decades the nucleotide sequences of large numberof plastid genomes have been published leading to betterunderstanding of their organization and evolution [2 11 12]Currently about 470 eukaryotic chloroplast genomes have
been sequenced completely (httpwwwncbinlmnihgovgenomesGenomesHomecgitaxid=2759amphopt=html) withthe best representation from flowering plants
Most land plant chloroplast genomes are composed ofa single circular chromosome with a quadripartite structurewhich includes two copies of an inverted repeat (IR) regionthat separates the large and small single copy regions (LSCand SSC) Genes of chloroplast genomes of higher plants canbe divided into three broad categories [13 14] In the firstthere are genetic system genes encoding for rRNAs tRNAsribosomal proteins and RNA polymerase subunits The sec-ond category is comprised of genes for photosynthesis whichencode subunits of the two photosystems the cytochromeb6f complex and the ATP synthase Open reading frames(orfs) of unknown function constitute the third categoryBesides there are some other genes coding for differentkinds of proteins including infA matK clpP cemA accDand ccsA Although overall chloroplast genome organizationis highly conserved among taxa structural rearrangementsdue to inversions have been reported in different taxalike Campanulaceae [15] Cyatheaceae [16] Fabaceae [17]
Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2014 Article ID 424873 13 pageshttpdxdoiorg1011552014424873
2 Advances in Bioinformatics
Funariaceae [18] Geraniaceae [19] Onagraceae [20] andPoaceae [21 22] Besides structural rearrangements sequencepolymorphisms have also been reported in some cereals[23 24] and Oenothera species [20] These studies revealedthat highly divergent sequences were concentrated in specificregions called ldquohotspotsrdquo Such sequence polymorphismshave been used to derive phylogenetic relationships amongspecies
Solanaceae is an important family of dicots comprisingmore than 3000 species placed within about 90 genera Itis an ethnobotanical family and is extensively utilized byhumans and has recently become amodel of comparative andevolutionary genomics research Few efforts have been madeto study the variations in chloroplast genomes of Solanaceaefamily by using in silico tools Most of these attempts havebeen concentrated on comparison of newly sequencedchloroplast genome with the available complete chloroplastgenomes from some members of this family [25ndash29] Theavailability of complete nucleotide sequences of plastidgenomes of ten solanaceous species Atropa belladonna(NC 0045611 [30]) Capsicum annuum (NC 0185521 [29])Datura stramonium (NC 0181171 Li et al (unpublished))Nicotiana sylvestris (NC 0075001 [28]) Nicotiana taba-cum (NC 0018792 [31]) Nicotiana tomentosiformis(NC 0076021 [28])Nicotiana undulata (NC 0160681 [32])Solanum bulbocastanum (NC 0079431 [26]) Solanum lyco-persicum (NC 0078982 [27]) and Solanum tuberosum(NC 0080962 [33]) provided us with an opportunity toconduct their in silico comparative analysis in depth Hencethe present study is an attempt to compare the genomeorganization structure and coding capacity of chloroplastgenomes of ten solanaceous species The study focuses onlengthmutations intron-containing genes grouping of genesin different identity classes based on pairwise comparison ofindividual genes and InDel analysis of divergent genes
2 Materials and Methods
21 Sequence Analysis Whole chloroplast genome sequencesas well as individual gene and protein sequences of tensolanaceous species were obtained from ldquoOrganelle GenomeResourcesrdquo section of NCBI in Genbank as well as inFasta format Sequence regions corresponding to variousgenomic features including genes exons introns and cdswere specifically extracted from the Genbank files usingExtractfeat Extractseq and Featcopy programs from Jem-boss package AT percentage for different genomic regionswas calculated using Wordcount and Union programs fromJemboss package Pairwise comparison of gene sequenceswasdone by using NCBI BLAST program and multiple sequencealignment of nucleotide as well as protein sequences wasdone by using ClustalW Alignments of protein sequences forsome of the genes weremanually edited in correspondence toInDels observed in alignments of their nucleotide sequences
22 Phylogenetic Analysis of Concatenated Protein-CodingGenes 75 protein-coding genes of plastomes of ten solana-ceous species and two outgroup species (Daucus carota and
Coffea arabica) were selected for phylogenetic analysis fromthe total of 79 classified protein-coding genes excludingaccD rpl20 ycf1 and ycf15 Ycf15 was excluded due to itsabsence on the plastome of both outgroup species chosenwhile the other three were not included in the phylogeneticanalysis due to their high levels of variation Multiplesequence alignment of each gene was obtained usingClustalW (httpswwwebiacukToolsmsaclustalw2)These alignments were then concatenated using standaloneBIOEDIT version 725 (httpwwwmbioncsuedubioeditbioedithtml) and maximum likelihood phylogenetictree with 500 bootstrap iterations was constructed usingPhyMLv30 (httpwwwatgc-montpellierfrphyml) Agraphical view of tree was generated using Archaeopteryx0988 SR (httpssitesgooglecomsitecmzmasekhomesoftwarearchaeopteryx)
3 Results and Discussion
31 Comparison of Properties of Chloroplast Genomes Com-parison of the properties of plastid genomes of ten solana-ceous species with respect to their genome size (size of com-plete plastid genome and LSC SSC and IR regions) percentcoding regions introns and intergenic regions AT contentof overall plastid genomes as well as coding and noncodingregions is presented in Table 1 The total plastid genomesize ranged from 155296 bp (S tuberosum) to 156781 bp (Cannuum) The large size of plastome of C annuum can beattributed to large LSC region as compared to other speciesOn the contrary size of SSC region inC annuumwas the leastas compared to other speciesThe largest size of IR regionwasin A belladonna Among four Nicotiana species studied Nsylvestris and N tabacum were almost identical to each otherwith respect to size of complete genome (difference of only2 bps) or LSC SSC or IR regions compared with plastome ofany other species studied However the percent coding regionwas slightlymore forN sylvestris (6149) than inN tabacum(6112) The size of complete chloroplast genome and LSCand SSC regions of three species of Solanum is comparativelysmaller than that of any other species studied except forA belladonna where size of SSC region was the smallest(18008 bp) However the size of IR region of Solanum speciesis larger as compared to Nicotiana species Coding regionpercentage was found to be higher in Nicotiana species ascompared to all other species with maximum forN undulata(6312) andminimum for S tuberosum (5845)Maximumof 128 of the plastome was shown to be introns for S bulbo-castanum whereas minimum intron percentage (1162) wasobserved for D stramonium Maximum percentage (2919)of intergenic region was observed in D stramonium andminimum (2419) was observed in N undulata The ATcontent of noncoding regions was found to be higher ascompared to coding regions for all the ten species studiedSimilarly protein-coding regions have shown higher contentof AT base pairs as compared to RNA coding genes whichcan be explained by the requirement of more GC base pairsfor proper folding of highly structured ribosomal RNAs andtRNAs [13ndash27] Comparison of AT content in LSC SSC andIR regions reveals that AT content was the highest in SSC
Advances in Bioinformatics 3
Table1Prop
ertieso
fthe
solanaceou
schlorop
lastgeno
mes
Prop
erty
Nam
eofspecies
ABE
CAN
DST
NSY
NTA
NTO
NUN
SBU
SLY
STU
Genom
esize(bp
)1566
87156781
155871
155941
155943
155745
155863
155371
155461
155296
LSC(bp)
(coo
rdinates)lowast
86869
(1ndash86869)
87366
(1ndash87366)
86297
(1ndash86297)
86684
(1ndash86684)
86686
(1ndash86686)
86392
(1ndash86392)
86633
(1ndash86633)
85785
(1ndash85785)
85882
(1ndash85882)
85737
(1ndash85737)
IRB(bp)
(coo
rdinates)lowast
25905
(86870ndash112774)
25783
(87367ndash113149)
25563
(86298ndash111860)
25342
(866
85ndash112026)
25343
(866
87ndash112029)
25429
(86393ndash1118
21)
25331
(86634ndash1119
64)
25588
(85786ndash1113
73)
25608
(85883ndash1114
90)
25593
(85738ndash1113
30)
SSC(bp)
(coo
rdinates)lowast
18008
(112775ndash130782)
17849
(113150ndash
130998)
1844
8(111861ndash130308)
18573
(112027ndash130599)
18571
(112030ndash
1306
00)
18495
(1118
22ndash130316)
18568
(1119
65ndash130532)
18381
(111374ndash
129754)
18363
(1114
91ndash129853)
18373
(111331ndash129703)
IRA(bp)
(coo
rdinates)lowast
25905
(130783ndash1566
87)
25783
(130999ndash
156781)
25563
(130309ndash
155871)
25342
(13060
0ndash155941)
25343
(130601ndash155943)
25429
(130317ndash155745)
25331
(130533ndash155863)
25588
(129755ndash155342)
25608
(129854ndash
155461)
25593
(129704ndash
155296)
Cod
ingregion
s()
5889
5850
5919
6149
6112
6158
6312
5852
5891
5845
Intro
ns(
)1251
1271
1162
1270
1270
1268
1269
1282
1247
1249
Intergenicregion
s()
2860
2879
2919
2581
2618
2573
2419
2866
2862
2906
ATcontent()
Overall
6244
6227
6212
6215
6215
6221
6212
6212
6214
6212
Cod
ingregion
s5986
5968
5965
5985
5979
5979
5970
5961
5965
5959
Non
coding
region
s6613
6593
6572
6584
6587
6609
6627
6566
6571
6568
tRNAs
4770
4738
4708
4706
4705
4710
4708
4712
4701
4706
rRNAs
4464
4473
4463
4464
4464
4464
4464
4466
4466
4465
Protein-coding
genes
6201
6183
6179
6191
6186
6184
6168
6176
6180
6174
LSC
6437
6425
6404
6405
6405
6412
6401
6399
6401
6399
SSC
6835
6799
6772
6794
6793
6803
6787
6787
6797
6791
IR5714
5694
5687
5678
5678
5684
5678
5693
5691
5690
ABE
Atro
pabelladonn
aCA
NC
apsicum
annu
umD
STD
aturastram
oniumN
SYN
icotia
nasylve
stris
NTA
Nico
tiana
tabacumN
TON
icotia
natomentosiformis
NUNN
icotia
naun
dulataS
BUS
olanum
bulbocastanu
mSLY
Solanum
lycopersicum
STU
Solanum
tuberosumLSC
large
singlec
opyregion
SSC
smallsinglec
opyregion
and
IRinvertedrepeatregion
lowast
Startand
endpo
sitionof
nucle
otideintheg
enom
e
4 Advances in Bioinformatics
regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region
32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow
(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH
(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent
(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species
(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search
33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in
Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns
34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged
Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below
(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding
Advances in Bioinformatics 5
Table 2 The lengths of introns and exons for the split genes of ten solanaceous species
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
trnK-UUU(LSC)
Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35
rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227
trnG-UCC(LSC)
Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48
atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410
rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614
ycf3 (LSC)
Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153
trnL-UAA(LSC)
Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50
trnV-UAC(LSC)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35
rps12lowast
Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26
clpP (LSC)
Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234
petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642
petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 Advances in Bioinformatics
Funariaceae [18] Geraniaceae [19] Onagraceae [20] andPoaceae [21 22] Besides structural rearrangements sequencepolymorphisms have also been reported in some cereals[23 24] and Oenothera species [20] These studies revealedthat highly divergent sequences were concentrated in specificregions called ldquohotspotsrdquo Such sequence polymorphismshave been used to derive phylogenetic relationships amongspecies
Solanaceae is an important family of dicots comprisingmore than 3000 species placed within about 90 genera Itis an ethnobotanical family and is extensively utilized byhumans and has recently become amodel of comparative andevolutionary genomics research Few efforts have been madeto study the variations in chloroplast genomes of Solanaceaefamily by using in silico tools Most of these attempts havebeen concentrated on comparison of newly sequencedchloroplast genome with the available complete chloroplastgenomes from some members of this family [25ndash29] Theavailability of complete nucleotide sequences of plastidgenomes of ten solanaceous species Atropa belladonna(NC 0045611 [30]) Capsicum annuum (NC 0185521 [29])Datura stramonium (NC 0181171 Li et al (unpublished))Nicotiana sylvestris (NC 0075001 [28]) Nicotiana taba-cum (NC 0018792 [31]) Nicotiana tomentosiformis(NC 0076021 [28])Nicotiana undulata (NC 0160681 [32])Solanum bulbocastanum (NC 0079431 [26]) Solanum lyco-persicum (NC 0078982 [27]) and Solanum tuberosum(NC 0080962 [33]) provided us with an opportunity toconduct their in silico comparative analysis in depth Hencethe present study is an attempt to compare the genomeorganization structure and coding capacity of chloroplastgenomes of ten solanaceous species The study focuses onlengthmutations intron-containing genes grouping of genesin different identity classes based on pairwise comparison ofindividual genes and InDel analysis of divergent genes
2 Materials and Methods
21 Sequence Analysis Whole chloroplast genome sequencesas well as individual gene and protein sequences of tensolanaceous species were obtained from ldquoOrganelle GenomeResourcesrdquo section of NCBI in Genbank as well as inFasta format Sequence regions corresponding to variousgenomic features including genes exons introns and cdswere specifically extracted from the Genbank files usingExtractfeat Extractseq and Featcopy programs from Jem-boss package AT percentage for different genomic regionswas calculated using Wordcount and Union programs fromJemboss package Pairwise comparison of gene sequenceswasdone by using NCBI BLAST program and multiple sequencealignment of nucleotide as well as protein sequences wasdone by using ClustalW Alignments of protein sequences forsome of the genes weremanually edited in correspondence toInDels observed in alignments of their nucleotide sequences
22 Phylogenetic Analysis of Concatenated Protein-CodingGenes 75 protein-coding genes of plastomes of ten solana-ceous species and two outgroup species (Daucus carota and
Coffea arabica) were selected for phylogenetic analysis fromthe total of 79 classified protein-coding genes excludingaccD rpl20 ycf1 and ycf15 Ycf15 was excluded due to itsabsence on the plastome of both outgroup species chosenwhile the other three were not included in the phylogeneticanalysis due to their high levels of variation Multiplesequence alignment of each gene was obtained usingClustalW (httpswwwebiacukToolsmsaclustalw2)These alignments were then concatenated using standaloneBIOEDIT version 725 (httpwwwmbioncsuedubioeditbioedithtml) and maximum likelihood phylogenetictree with 500 bootstrap iterations was constructed usingPhyMLv30 (httpwwwatgc-montpellierfrphyml) Agraphical view of tree was generated using Archaeopteryx0988 SR (httpssitesgooglecomsitecmzmasekhomesoftwarearchaeopteryx)
3 Results and Discussion
31 Comparison of Properties of Chloroplast Genomes Com-parison of the properties of plastid genomes of ten solana-ceous species with respect to their genome size (size of com-plete plastid genome and LSC SSC and IR regions) percentcoding regions introns and intergenic regions AT contentof overall plastid genomes as well as coding and noncodingregions is presented in Table 1 The total plastid genomesize ranged from 155296 bp (S tuberosum) to 156781 bp (Cannuum) The large size of plastome of C annuum can beattributed to large LSC region as compared to other speciesOn the contrary size of SSC region inC annuumwas the leastas compared to other speciesThe largest size of IR regionwasin A belladonna Among four Nicotiana species studied Nsylvestris and N tabacum were almost identical to each otherwith respect to size of complete genome (difference of only2 bps) or LSC SSC or IR regions compared with plastome ofany other species studied However the percent coding regionwas slightlymore forN sylvestris (6149) than inN tabacum(6112) The size of complete chloroplast genome and LSCand SSC regions of three species of Solanum is comparativelysmaller than that of any other species studied except forA belladonna where size of SSC region was the smallest(18008 bp) However the size of IR region of Solanum speciesis larger as compared to Nicotiana species Coding regionpercentage was found to be higher in Nicotiana species ascompared to all other species with maximum forN undulata(6312) andminimum for S tuberosum (5845)Maximumof 128 of the plastome was shown to be introns for S bulbo-castanum whereas minimum intron percentage (1162) wasobserved for D stramonium Maximum percentage (2919)of intergenic region was observed in D stramonium andminimum (2419) was observed in N undulata The ATcontent of noncoding regions was found to be higher ascompared to coding regions for all the ten species studiedSimilarly protein-coding regions have shown higher contentof AT base pairs as compared to RNA coding genes whichcan be explained by the requirement of more GC base pairsfor proper folding of highly structured ribosomal RNAs andtRNAs [13ndash27] Comparison of AT content in LSC SSC andIR regions reveals that AT content was the highest in SSC
Advances in Bioinformatics 3
Table1Prop
ertieso
fthe
solanaceou
schlorop
lastgeno
mes
Prop
erty
Nam
eofspecies
ABE
CAN
DST
NSY
NTA
NTO
NUN
SBU
SLY
STU
Genom
esize(bp
)1566
87156781
155871
155941
155943
155745
155863
155371
155461
155296
LSC(bp)
(coo
rdinates)lowast
86869
(1ndash86869)
87366
(1ndash87366)
86297
(1ndash86297)
86684
(1ndash86684)
86686
(1ndash86686)
86392
(1ndash86392)
86633
(1ndash86633)
85785
(1ndash85785)
85882
(1ndash85882)
85737
(1ndash85737)
IRB(bp)
(coo
rdinates)lowast
25905
(86870ndash112774)
25783
(87367ndash113149)
25563
(86298ndash111860)
25342
(866
85ndash112026)
25343
(866
87ndash112029)
25429
(86393ndash1118
21)
25331
(86634ndash1119
64)
25588
(85786ndash1113
73)
25608
(85883ndash1114
90)
25593
(85738ndash1113
30)
SSC(bp)
(coo
rdinates)lowast
18008
(112775ndash130782)
17849
(113150ndash
130998)
1844
8(111861ndash130308)
18573
(112027ndash130599)
18571
(112030ndash
1306
00)
18495
(1118
22ndash130316)
18568
(1119
65ndash130532)
18381
(111374ndash
129754)
18363
(1114
91ndash129853)
18373
(111331ndash129703)
IRA(bp)
(coo
rdinates)lowast
25905
(130783ndash1566
87)
25783
(130999ndash
156781)
25563
(130309ndash
155871)
25342
(13060
0ndash155941)
25343
(130601ndash155943)
25429
(130317ndash155745)
25331
(130533ndash155863)
25588
(129755ndash155342)
25608
(129854ndash
155461)
25593
(129704ndash
155296)
Cod
ingregion
s()
5889
5850
5919
6149
6112
6158
6312
5852
5891
5845
Intro
ns(
)1251
1271
1162
1270
1270
1268
1269
1282
1247
1249
Intergenicregion
s()
2860
2879
2919
2581
2618
2573
2419
2866
2862
2906
ATcontent()
Overall
6244
6227
6212
6215
6215
6221
6212
6212
6214
6212
Cod
ingregion
s5986
5968
5965
5985
5979
5979
5970
5961
5965
5959
Non
coding
region
s6613
6593
6572
6584
6587
6609
6627
6566
6571
6568
tRNAs
4770
4738
4708
4706
4705
4710
4708
4712
4701
4706
rRNAs
4464
4473
4463
4464
4464
4464
4464
4466
4466
4465
Protein-coding
genes
6201
6183
6179
6191
6186
6184
6168
6176
6180
6174
LSC
6437
6425
6404
6405
6405
6412
6401
6399
6401
6399
SSC
6835
6799
6772
6794
6793
6803
6787
6787
6797
6791
IR5714
5694
5687
5678
5678
5684
5678
5693
5691
5690
ABE
Atro
pabelladonn
aCA
NC
apsicum
annu
umD
STD
aturastram
oniumN
SYN
icotia
nasylve
stris
NTA
Nico
tiana
tabacumN
TON
icotia
natomentosiformis
NUNN
icotia
naun
dulataS
BUS
olanum
bulbocastanu
mSLY
Solanum
lycopersicum
STU
Solanum
tuberosumLSC
large
singlec
opyregion
SSC
smallsinglec
opyregion
and
IRinvertedrepeatregion
lowast
Startand
endpo
sitionof
nucle
otideintheg
enom
e
4 Advances in Bioinformatics
regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region
32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow
(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH
(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent
(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species
(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search
33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in
Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns
34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged
Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below
(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding
Advances in Bioinformatics 5
Table 2 The lengths of introns and exons for the split genes of ten solanaceous species
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
trnK-UUU(LSC)
Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35
rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227
trnG-UCC(LSC)
Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48
atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410
rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614
ycf3 (LSC)
Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153
trnL-UAA(LSC)
Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50
trnV-UAC(LSC)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35
rps12lowast
Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26
clpP (LSC)
Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234
petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642
petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 3
Table1Prop
ertieso
fthe
solanaceou
schlorop
lastgeno
mes
Prop
erty
Nam
eofspecies
ABE
CAN
DST
NSY
NTA
NTO
NUN
SBU
SLY
STU
Genom
esize(bp
)1566
87156781
155871
155941
155943
155745
155863
155371
155461
155296
LSC(bp)
(coo
rdinates)lowast
86869
(1ndash86869)
87366
(1ndash87366)
86297
(1ndash86297)
86684
(1ndash86684)
86686
(1ndash86686)
86392
(1ndash86392)
86633
(1ndash86633)
85785
(1ndash85785)
85882
(1ndash85882)
85737
(1ndash85737)
IRB(bp)
(coo
rdinates)lowast
25905
(86870ndash112774)
25783
(87367ndash113149)
25563
(86298ndash111860)
25342
(866
85ndash112026)
25343
(866
87ndash112029)
25429
(86393ndash1118
21)
25331
(86634ndash1119
64)
25588
(85786ndash1113
73)
25608
(85883ndash1114
90)
25593
(85738ndash1113
30)
SSC(bp)
(coo
rdinates)lowast
18008
(112775ndash130782)
17849
(113150ndash
130998)
1844
8(111861ndash130308)
18573
(112027ndash130599)
18571
(112030ndash
1306
00)
18495
(1118
22ndash130316)
18568
(1119
65ndash130532)
18381
(111374ndash
129754)
18363
(1114
91ndash129853)
18373
(111331ndash129703)
IRA(bp)
(coo
rdinates)lowast
25905
(130783ndash1566
87)
25783
(130999ndash
156781)
25563
(130309ndash
155871)
25342
(13060
0ndash155941)
25343
(130601ndash155943)
25429
(130317ndash155745)
25331
(130533ndash155863)
25588
(129755ndash155342)
25608
(129854ndash
155461)
25593
(129704ndash
155296)
Cod
ingregion
s()
5889
5850
5919
6149
6112
6158
6312
5852
5891
5845
Intro
ns(
)1251
1271
1162
1270
1270
1268
1269
1282
1247
1249
Intergenicregion
s()
2860
2879
2919
2581
2618
2573
2419
2866
2862
2906
ATcontent()
Overall
6244
6227
6212
6215
6215
6221
6212
6212
6214
6212
Cod
ingregion
s5986
5968
5965
5985
5979
5979
5970
5961
5965
5959
Non
coding
region
s6613
6593
6572
6584
6587
6609
6627
6566
6571
6568
tRNAs
4770
4738
4708
4706
4705
4710
4708
4712
4701
4706
rRNAs
4464
4473
4463
4464
4464
4464
4464
4466
4466
4465
Protein-coding
genes
6201
6183
6179
6191
6186
6184
6168
6176
6180
6174
LSC
6437
6425
6404
6405
6405
6412
6401
6399
6401
6399
SSC
6835
6799
6772
6794
6793
6803
6787
6787
6797
6791
IR5714
5694
5687
5678
5678
5684
5678
5693
5691
5690
ABE
Atro
pabelladonn
aCA
NC
apsicum
annu
umD
STD
aturastram
oniumN
SYN
icotia
nasylve
stris
NTA
Nico
tiana
tabacumN
TON
icotia
natomentosiformis
NUNN
icotia
naun
dulataS
BUS
olanum
bulbocastanu
mSLY
Solanum
lycopersicum
STU
Solanum
tuberosumLSC
large
singlec
opyregion
SSC
smallsinglec
opyregion
and
IRinvertedrepeatregion
lowast
Startand
endpo
sitionof
nucle
otideintheg
enom
e
4 Advances in Bioinformatics
regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region
32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow
(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH
(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent
(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species
(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search
33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in
Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns
34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged
Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below
(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding
Advances in Bioinformatics 5
Table 2 The lengths of introns and exons for the split genes of ten solanaceous species
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
trnK-UUU(LSC)
Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35
rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227
trnG-UCC(LSC)
Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48
atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410
rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614
ycf3 (LSC)
Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153
trnL-UAA(LSC)
Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50
trnV-UAC(LSC)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35
rps12lowast
Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26
clpP (LSC)
Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234
petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642
petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 Advances in Bioinformatics
regions and the lowest in IR regions Some earlier studies havealso shown similar distribution of AT content in LSC SSCand IR regions with the lowest AT content in IR region andthe highest AT content in SSC region [2 27 34 35] The lowAT content of IR regions reflects low AT content in the fourribosomal RNA genes in this region
32 Gene Content of Solanaceous Chloroplast Genomes Thegenes present in different regions of the plastid genomesare highly conserved except for several open reading frames[6 26 36]There are typically 111 genes 5 hypothetical chloro-plast reading frames (ycfs) and few open reading frames(orfs) Some of our unique findings have been discussedbelow
(i) The trnP-GGG which codes for tRNA for prolinewas annotated only in D stramonium whereas itsalternative code trnP-UGGwas annotated in all otherspecies includingD stramonium (NC 0181171 Li et al(unpublished)) We mined all the species for similarsequence by BLAST search but no similar sequencewas found in any other species Gene trnH was onlyreported to be trnH coding gene in C annuum In allother species this region was reported to be part ofycf2 gene as in C annuum also These observationsindicate the presence of duplicate copy of trnHgene sequence in C annuum and two alternativetRNA genes coding for proline amino acid in Dstramonium However no other evidence was foundin databases about this particular region coding fortrnH
(ii) Rps19 pseudogene was reported in three speciesnamely N tomentosiformis S bulbocastanum and Stuberosum All other species were mined for similarpseudogene using BLAST pairwise algorithm whichconfirmed the presence of rps19 pseudogene in otherspecies namely A belladonna C annuum and Dstramonium The presence of pseudogene may beattributed to the expansion of IRB into the LSCregion In three species namely N sylvestris Ntabacum and N undulata rps19 pseudogene wasfound to be absent
(iii) infA a pseudogene for all species except A bel-ladonna D stramonium and S Lycopercicum is aprotein-coding gene for S bulbocastanum Homologysearch with infA sequence from S bulbocastanumagainst plastomes ofA belladonna andD stramoniumrevealed identical sequence in both species
(iv) sprA gene has been annotated for N sylvestris Ntomentosiformis S lycopersicum and S tuberosumIts identical orthologous gene sequences were foundin A belladonna C annuum D stramonium Ntabacum N undulata and S bulbocastanum usingBLAST search
33 Split Genes A total of eighteen split genes have beenreported The sizes of exons and introns for these genesin all the solanaceous species studied are summarized in
Table 2 The rps12 gene is divided such that its 51015840 end exonis located in the LSC region whereas second and third exonsare located in the IR region Maturation of RNA transcriptrequires a trans-splicingmechanismbetween exon 1 and exon2 [34 37] Among the eighteen intron-containing genes ycf3clpP and rps12 contained two introns whereas the other 15genes contain only one intron As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intronwhereas all other introns belong to group II Generally thesize of exons was shown to be conserved and variability wasobserved in the intron regions however ndhB was found tobe highly conserved for both exons and introns
34 Pairwise Comparison of Plastid Genes of Solanaceaeand InDel Analyses Pairwise comparison of nucleotidesequences of individual gene sequences (45 combinations) for116 genes was also performed to classify genes based on per-cent identity Supplementary Table 1 (Supplementary Mate-rial available online at httpdxdoiorg1011552014424873)shows grouping of genes in different clusters based on percentidentity in pairwise comparison Genes which showed 100identity in comparison were considered as highly conservedand the genes showing less than 95 identity at least once inthe comparison were considered as highly divergent Thesehighly divergent genes were further explored at nucleotide aswell as at protein level to probe the variations in detail A totalof 11 highly divergent genes were found whereas the numberof highly conserved genes varied from 26 (for species pairN tomentosiformis and S lycopersicum) to 107 (for speciespair N sylvestris and N tabacum) Most of the tRNA geneswere found to be highly conserved Genes accD cemA clpPndhA rpl32 rpl36 rps16 sprA trnA-UGC trnL-UAA andycf1 were found to be highly diverged
Tables 3 and 4 describe the summary of InDels observedin nucleotide and amino acid sequences respectively Partialmultiple sequence alignment of 9 genes and 5 proteins isshown in Figures 1 and 2 respectively The longest insertionof 141 bp was observed in accD gene sequence of C annuumSince genes clpP ndhA rps16 and trnL-UAA containedintrons it was important to examine whether these InDelswere present in exon or intron region It was found that allthe InDels reported in ndhA and trnL-UAA were presentin introns whereas in case of clpP InDel 24 was located inexon of the gene Similarly the first and last InDels of generps16 were present in exons of the gene Keeping in view theobservations in number and length of InDels in nucleotideand protein sequences of genes under consideration thevariation for individual genes is discussed below
(1) accD A total of four InDels were observed in accD gene asdepicted in Figure 1 Insertion of 24 bp was present interest-ingly in all Nicotiana species and D stramonium followed byinsertion of 9 bp in all Solanum species indicating strongersequence conservation at genus level These insertions werealso reported by Chung et al [25] A 141 bp insertion wasobserved specifically in C annuum which has also beenreported by Jo et al [29] and confirmed by RT-PCR Similarlya species specific deletion of 6 bpwas found inD stramoniumAll these InDels were also reflected in the corresponding
Advances in Bioinformatics 5
Table 2 The lengths of introns and exons for the split genes of ten solanaceous species
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
trnK-UUU(LSC)
Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35
rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227
trnG-UCC(LSC)
Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48
atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410
rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614
ycf3 (LSC)
Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153
trnL-UAA(LSC)
Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50
trnV-UAC(LSC)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35
rps12lowast
Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26
clpP (LSC)
Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234
petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642
petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 5
Table 2 The lengths of introns and exons for the split genes of ten solanaceous species
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
trnK-UUU(LSC)
Exon I 37 37 37 37 37 37 37 37 37 37Intron I 2519 2500 2506 2526 2526 2526 2521 2501 2514 2512Exon II 36 35 35 35 35 35 35 35 35 35
rps16 (LSC)Exon I 40 40 40 40 40 40 40 40 40 40Intron I 822 865 866 860 860 860 859 855 864 855Exon II 227 227 227 218 218 218 218 227 227 227
trnG-UCC(LSC)
Exon I 23 23 23 23 23 23 23 23 23 23Intron I 692 692 694 692 692 690 691 701 695 692Exon II 48 48 48 48 48 48 48 37 48 48
atpF (LSC)Exon I 145 145 145 145 145 145 145 144 144 145Intron I 715 693 700 695 695 692 692 693 686 693Exon II 410 410 410 410 410 410 410 411 411 410
rpoC1 (LSC)Exon I 432 453 453 453 453 432 453 453 453 453Intron I 737 742 737 737 737 709 733 737 737 737Exon II 1614 1614 1614 1614 1614 1614 1623 1614 1614 1614
ycf3 (LSC)
Exon I 124 124 124 124 124 124 124 124 124 124Intron I 739 742 740 739 738 731 735 730 729 727Exon II 230 230 230 230 230 230 230 230 230 230Intron II 763 744 753 783 783 779 781 750 750 750Exon III 153 153 159 153 153 153 153 153 153 153
trnL-UAA(LSC)
Exon I 35 35 35 35 35 35 35 37 35 35Intron I 497 426 501 503 503 497 498 502 497 497Exon II 50 50 50 50 50 50 50 50 50 50
trnV-UAC(LSC)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 572 575 569 571 571 572 573 569 571 571Exon II 35 35 37 35 35 35 35 37 35 35
rps12lowast
Exon I 114 114 114 114 114 114 114 114 114 114Intron I mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashExon II 232 232 232 232 232 232 232 232 232 232Intron II 535 536 536 536 536 536 536 536 536 536Exon III 26 26 26 26 26 26 26 26 26 26
clpP (LSC)
Exon I 71 71 71 71 71 71 71 71 71 71Intron I 799 811 792 807 807 789 789 789 798 789Exon II 292 292 292 292 292 292 292 292 292 292Intron II 622 626 624 637 637 634 631 625 617 620Exon III 228 228 234 228 228 228 228 234 258 234
petB (LSC)Exon I 6 6 6 6 6 6 6 6 6 6Intron I 759 755 746 753 753 753 753 747 747 747Exon II 642 642 642 642 642 642 642 642 642 642
petD (LSC)Exon I 8 8 9 8 8 8 8 6 8 8Intron I 742 742 748 742 742 742 742 739 738 739Exon II 475 475 474 475 475 475 475 477 475 475
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 Advances in Bioinformatics
Table 2 Continued
Gene(region)
Exonintron ABE CAN DST NSY NTA NTO NUN SBU SLY STU
rpl16 (LSC)Exon I 9 9 9 9 9 9 9 9 9 9Intron I 1019 1026 1025 1020 1020 1021 1020 1014 1018 1014Exon II 396 396 396 396 396 396 396 396 396 396
rpl2 (IR)Exon I 391 391 393 391 391 391 391 390 391 391Intron I 664 665 669 666 666 666 666 666 666 666Exon II 434 434 429 434 434 434 434 435 434 434
ndhB (IR)Exon I 777 777 777 777 777 777 777 777 777 777Intron I 679 679 679 679 679 679 679 679 679 679Exon II 756 756 756 756 756 756 756 756 756 756
trnI-GAU(IR)
Exon I 37 37 42 37 37 37 37 42 37 37Intron I 717 722 717 707 707 716 716 717 722 722Exon II 34 35 35 35 35 35 35 35 35 35
trnA-UGC(IR)
Exon I 38 38 38 38 38 38 38 38 38 38Intron I 681 811 811 709 709 709 709 811 811 811Exon II 35 35 35 35 35 35 35 35 35 35
ndhA (SSC)Exon I 553 553 552 553 553 553 553 552 553 553Intron I 1150 1157 1154 1148 1148 1149 1148 1158 1133 1158Exon II 539 539 537 539 539 539 539 540 539 539
lowastrps12 gene is dividing gene The 31015840-rps12 locates on the IR-region while the 51015840-rps12 locates on the LSC regionABE Atropa belladonna CAN Capsicum annuum DST Datura stramonium NSY Nicotiana sylvestris NTA Nicotiana tabacum NTO Nicotiana tomentosi-formis NUN Nicotiana undulata SBU Solanum bulbocastanum SLY Solanum lycopersicum and STU Solanum tuberosum
Table 3 InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes
S number Geneabc Total number of InDels InDel length (bp)1 accDa 4 24 9 141 6
2 clpPa 24 8(I) 14(I) 13(I) 7(I) 1(I) 2-3(I) 7(I) 1ndash7(I) 3(I) 2(I) 3(I) 1ndash7(I) 1ndash3(I) 1(I) 1(I) 1(I)1ndash5(I) 4ndash7(I) 1(I) 9(I) 1-2(I) 3(I) 5(I) 24ndash30
3 ndhAb 14 9(I) 5-6(I) 3(I) 1(I) 9(I) 3(I) 4(I) 1ndash4(I) 1-2(I) 1ndash23(I) 1-2(I) 2(I) 1(I) 3(I)4 rpl32b 2 2-3 45 rps16a 11 1ndash38 9(I) 1(I) 1(I) 5(I) 1-2(I) 5(I) 4(I) 6(I) 1(I) 96 sprAb 2 109 77 trnA-UGCc 1 102ndash1308 trnL-UAAa 4 1 6 71 49 ycf1b 31 3 18 18 21 6 6 48 9 6 6 42 3 6 30 3 15 12ndash39 18 6 9ndash36 6 6 6 9 9 12 6 6 6 57 6abcLocation in different regions aLSC bSSC and cIR I InDels present in introns
Table 4 InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes
S number Protein Total number of InDels InDel length (bp)1 accD 4 8 3 47 22 clpP 2 2 103 rpl32 1 1-24 rps16 1 35 ycf1 29 1 6 6 7 2 2 7 3 2 2 14 1ndash10 1 5 4ndash13 6 2 3ndash12 2 2 2 3 3 4 2 2 2 19 2
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 7
AA--GATTTT AT----TTTT AG---ATTTT AAAAGATTTT AAAAGATTTT AAAAGATTTT AAA-GATTTT AT-----TTT AA-----TTT AT-----TTT
AAAA-----GAAAAAAAA-----GAAAAAAAG-------AAAAAAAAAAAAGAAAAAAAAAAAAAGAAAAAAAAA----GAAAAAAAAA----GAAAAAAAA-----GAAAAAAAA-----GAAAAAAAA-----GAAAA
TTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTT-CTCTTTTCTCTTT-CTC
--GTAA-GGTCA
AATAGA
--GTCA-GGTCA
AAT--------- -GGTCA-GGTCA
TTT---CTATTT---CTATTTTTTCTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTATTT---CTA
TTT-----ATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATCTTTGTTATATC
ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------ATGA------------------------------AGGAAAATAA------------------------AGGAAAAGAACATGGATTTCACGCAGATCTATGAAGGAAAATAA------------------------
InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23 InDel 24
ACAT-----CTTTACAT-----CTTTACATTACATCTTTACAT-----CTTTACAT-----CTTTATAT-----CTTTACAT-----CTTTAGAT-----CTTTAGAT-----CTTTAGAT-----CTTT
ACA----GAGACAAAACATACAGAGACAAAACA----GAGACAAAACA----GAGATAAAACA----GAGATAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAAACA----GAGACAAA
ACC------TCGACCAAAACCTCGACC------TCGACC------TCGACC------TCGACC------TCGACC------TCGACT------TCGACT------TCGACT------TCG
TTTTCTCTTT-CTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTCTTTTCTC
TCAACCTAAATTCAATTAATCAACCAAAATTCAATTAATCAACCGAAATTCAATTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAATC---------CTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAATCAACCGAAATTCAATTAA
ACT-------GAAACTATTCACTGAAACTATTTACTGAAACT-------GAAACT-------GAAACT-------GAAACT-------GAAACTATTCACTGAAACTATTCACTGAAACTATTCACTGAA
SprA
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
CATTC------------------------ATAGCATTC------------------------CTAGCATTCATAGTTGGGGGAATCCGGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTCATAGTTGGAGGAATCGTGACAATTCTAGCATTC------------------------ATAGCATTC------------------------ATAGCATTC------------------------ATAG
CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGT---------ATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT CATGTCTTACATGTATGAT
CTTTTTTCCTTTCTTTTTTCCTTTCTT------TTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTTCTTTTTTCCTTT
accD
InDel 1 InDel 2 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATG----------------------------------------------------------------------------------------------------------------------------- ---------------- ATCTCGAGAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATTGGCAGTGATTTAACTATTGGCAGTGATTTAACTAATGGCAGTGATTTAACTATAAGAGAAAGTTCTAATGATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------AAAGCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAGAATG---------------------------------------------------------------------------------------------------------------------------------------------ATCTCGAG
InDel 3
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATA------- --AAA -AGAA
ATA-------ATA------- ATCGAAA ATA-------ATA-------ATA-------ATA-------ATA------- -------AAA ATA-------
AAA---AAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATAAAAAATATCAAAAATATCAAAAATATCAA
AAAA--GAAAAAA--GAAAAAAAAGAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAAAAAA--GAA
TTT---GAATTT---GAATTT---GAATTTTTTGAATTTTTTGAATTT---GAATTT---GAATTT---GAATTT---GAATTT---GAA
ATT- --CAAA---CAAA--CAAA--CAAA--CAAA
ATTGTTTTTTTT TTTT-CAAA
ATTGTTTTT-
ATT-------
GTT
-GTT
CTT-
AAAAAAAAATATAAAAAGAAAAAAAAAAAAAAAAAAAAAAAAAAT-ATAAT-AAAAAGAT
InDel 7 InDel 8 InDel 9 InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTT---------CTACTTTATAGAAACTCTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTT---------CTACTTG---------CTACTTT---------CTACTTT---------CTA
TATA-----TTT TATTTTATATTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT TAT------TTT
TTTATTTTTTTAATTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTCATTTTTTTATTTTTTT---TTTTTATTTT
TTTATTCTTTCTTCTTC-TTCTTA-TTCTTA-TTCTTA-TTCTTA-TTCTTC-TTCTTC-TTCTTC-TTC
TAT---------GGAGTTTTATCAACTTTATGGGGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GGAGTTTTAT---------GAAGTTTTAT---------GGAGTTTTAT---------GGATTTTTAT---------GGAGTTTTAT---------GGAGTTT
GTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTT---GAAGTTTTTGAAGTTTTTGAAGTTTTTGAA
AAG----CCCAAGAAAGGCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----GCCAAG----ACCAAG----GCC
ATAAATCGAAATAAATCGAAATAAATCGAAATA----GAAATA----GAAATA----GAAATA----GAAAAAA-TCGAAAAAA-TCGAAAAAA-TCGAA
AGGG-ATG GGG--ATG AGGG-ATG GGGG-ATG GGGG-ATG GGGG-ATG GGGGGATG AGGG-ATG AGGG-ATG AGGG-ATG
ndhA
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
ABECANDSTNSYNTANTONUNSBUSLYSTU
TGGT--------------------------------------AATGTGGTTCTGTTTCACTAGAACCCCTCGCTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCTC-TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCTTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCATTAGAACCCCT--TTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGTGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATGGGGTTCTGTTTCACTAGAACCCCTCGTTTTTTGTTGTCTTGGAATG
ATCT---------TCGGATCT---------TCGGATCT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCGGATTT---------TCAGATCT---------TAGGATCTAATATATCTTAGGATCT---------TAGG
GTCT-GTAGGTCTTGTAGGTCTTGTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGGTCT-GTAGATCTTGTAGATCTTGTAGATCTTGTAG
TTTAAATTGTTTA-ATTGTTTAAATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTTATTGTTTTAATTGTTTTAATTGTTTTAATTG
CCAAATCGATTTTCCGA-----TTTTCCAAATCGATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCAAATCCATTTTCCGA-----TTTTCCGA-----TTTTCCGA-----TTTT
AAAAA-GACTTAAAAAGGACTTAAAAA-GATTTAAAAAAGACTTAAAAAAGACTTAAAAA-GACTTAAAAAAGACTTAAAA--GACTTAAAA--GACTTAAAA--GACTT
rps16
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6
ABECANDSTNSYNTANTONUNSBUSLYSTU
TTTT---------CTTCCTTTTTTCTTTTTTT-----------------------TTTTTTTTCTTCCTTTTTTCTTTTTTTTTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTTTCTTACTTTTTTCTTTTTTTGTTTTTTTT-CTTACTTTTTTCTTTTTTTTTTTTTTTT-CTTACTTTTTTCTTTTTTTGTTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTTTTT----------------------GTTTTTTTTTCTTCCTTTTTTCTTTTTTTGTTT
TTTTTGTT TTT--ATT TTTTTATT TTT--ATT TTT--ATT TTTTTATT TTT--ATT TTTT-ATT TTTT-ATT TTTT-ATT
TTTATTCTCTT--TCTCTT--TCTCTT--TCTCTT--TCTCTT--TTTCTT--TCTCTT--TCTCTT--TCTCTT--TCT
TTTTCGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGTTTT-CGT
TTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTT---ATTTTTTTTATTTTTTTTATTTTTTTTATT
TTTTTTT---GTACTTTTTTTTTGGTACTTTTTTTT--GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTACTTTTTTT---GTAC
TAAGTAA TAAGTAATAA----TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA TAAGTAA
InDel 10 InDel 11 InDel 12 InDel 13 InDel 14 InDel 1 InDel 2
ABECANDSTNSYNTANTONUNSBUSLYSTU
rpl32
TAAA--------TTGATAAAACGATAAATTGATAAA--------TTAATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTTATAAA--------TTGATAAA--------TTGATAAA--------TTGA
GAATAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAAAAATTCAATTAGAATAAGAAAAATTCAATTAGAATAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAAGAA--------------TAA
TCTATG-------------ATTG TCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTGTCTATG-------------ATTG TCTATG-------------ATTGTCTATGTAAGATATCTATGATTG TCTATG-------------ATTG
GTAA-------AGGGAGTAATTTGTAAAGAGAGGAA-------AAAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGAGTAA-------AGAGA
AAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAA-GAAAA-GAAAAGAAAAAAAAAAAGAA
--TACA--TACA--TAAA--TACA--TACA--TACA--TACA
---TACA
AAAAAAAAAAAAAAAAAAAAAAAAAAAATACAAA---TACA
clpP
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 6
GGGGGGGGGTCAGGAGGTCAGGAGGTCAGGAGGTCAGGGGGGGG
TCGAAAATTCGAAAATTGGAAAATCGAAA TGGAAATGGAAAATCGAAA TAGAAAATCGAAA TGGAAAATCGAAA TGGAAAATCGAAA TAGTGGAAAATCGAAA
TTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTTATTGTTTTTTTT
ATTGTTTTTTTTTT
ATTGTTTTTTTTTT
CTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTTCTTTTT
TTCTTATT
AATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGAGGGG
AAGGAGGGGAATAGAAAGGAGGGGAATAGAAAGGGGGGGAATAGAAAGGAGGGG
GGGAATAGAAAGGAGGGG
ATCATACTGGAAAATC
ATGATGATGATAATGATGATAATG
TTTTTTTTTTTTTT
TTTTTCAAATTTTTCAAATTTTTCAAATTTTTCAAA
TTTGTTTTTGTTTTTGGTTTTGTTTTTTTTGTTTTTTTGTTTTTGTTTTTGTT
Figure 1 Continued
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 Advances in Bioinformatics
AAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAA-GGAAAAAGGAAAA-GGAAAA-GGAAAA-GGA
AAA------TGA AAA------TGA AAA------TGA AAAATCAAATGA AAAATCAAATGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA AAA------TGA
ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCC-----------------------------------------------------------------------CTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGACTCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCTGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATAGAAGAATTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTATTTTTTCTATAAAAAATGGAAGAGTTGGTGTGAATCGATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT ATCCGTAGTTTTTCTATAAAAAATCGAAGAATTGGTGTGAATCCATTCTACATTGAAGAAAGAATCGAATATTCATTGAT
TTT----GAATTT----GAATTTTTTTGAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAATTT----GAA
tRNA-Leu(UAA)
InDel 1 InDel 2 InDel 3 InDel 4
ABE CANDSTNSYNTANTONUNSBUSLYSTU
---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTTGTGATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT---ATGATTTTT
GTAC------------------CTTGGTACATTCGATCTAATAAGTACCTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTGGTAC------------------CTTG
AAGC------------------CTCA AAGCTAAGAAACTGAAAGAGGCCTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA AAGC------------------CTCA
AAAG---------------------
AAAGGGTGGAAAGTGAG
GTAGAAATAGAAACTGGTAGAA------ACTGGTAGAAATAGAAACTGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACAGGTAGAAATAGAAACTGGTAGAAATTGAAACTGGTAGAAATAGAAACTG
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5
ABECANDSTNSYNTANTONUNSBUSLYSTU
CTTCTCCTTCCCTCTTCTCCTTCCCTCTTC------CCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCTCTTCTCCTTCCCT
TTTT------------------------------------------------TCGG TTTTCAAGAGGGATCCACTGAAGAAGATCCTTATCCTTCTCCTTCCCCTTTTTCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG TTTT------------------------------------------------TCGG
AACAATATTAATACTAGAAAAATATTAATACTAGAACAATATTAATACTAGAATAATATTAATACTAGAATAATATTAATACTAGAACA---------CTAGAACAATATTAATACTAAAACAATATTAATACTAGAACAATATTAATACTAGAACAATATTAATACTAG
TAA------CACTAATAATAACACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CACTAA------CAC
AGA------CCTAGA------CCTAAA------ATTAGA------CCTAGA------CCTAGA------ACTAGA------CCTAGA------CCTAGAACAAGACCTAGA------CCT
InDel 6 InDel 7 InDel 8 InDel 9 InDel 10
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------------------------------------------AGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGTAAGAACG TCAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGCG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGAACAGGGAAGAGTG TAAATTTGAAAGACCTTTCTTTATTTTCAGACCAAGACCAGGGAAGAGTG TAAATGTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGCG TAAATTTGAAGGGTCTTTCTTTATTTTCAGACCAAGAACAGAGAAGAGTG
CTGA---G CTGATGAG CTGC---G CTGA---T CTGA---T CTAA---T CTGA---T CTGA---G CTGA---G CTGA---G
GGTCAAAGAGACCAACGAGCCCGACGATACCAAAGATACCAAAGATACCAAAGATACCAAAGAG------GAG------GAG------GA
GA------------------------------GGAAGA------------------------------GGAAGA------------------------------GGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGATACTGATACCGCAGATCAAGATCAAACAAAGGAAGAGACTGATACCGCAGATCAAGATCAAACAAAGGAAGA------------------------------AGAAGA------------------------------GGAAGA------------------------------GGAA
AAGCAAAAAGCAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAGAAG---AAAG
CTCT---------------AAAACTTTTATAACTATAACTCGAAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAACTCT---------------AAAA
InDel 11 InDel 13 InDel 14 InDel 15 InDel 16
ABECANDSTNSYNTANTONUNSBUSLYSTU
CGTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAA---------------------------------------ATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCC CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAAGTCCTAATAAAACAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAATTCGAATCG CCTAATAAAA------------CAAATAATATTAAAAAACTCGAATCG
TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTAATAATAAAAACA TCAAGAAAAAATAAAAAAAAAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTAATAATCAAAAAA TCAA------------------AAAA TCAAGAAAAAATTAATAATCAAAAAA TCAAGAAAAAATTACTAATAAAAAAA TCAAGAAAAAATTAATAATAAAAAAA TCAAGAAAAAATTACTAATAAAAAAA
TTAT------TTCGACTTTATCTTTATTTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTAT------TTCGACTTTATCTTTATTTCGACTTTATGTTTATTTCGACTTTATCTTTATTTCGACT
CTAAAAAAGTCACTTTATAATATT---------AGTAATAAAAA ATAAAAAAGTCACTTTATAATATTTATAATATTAGTAAGAAAAA ATAA------------------------------------AAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGCCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA ATAAAAAAGTCACTTTATAATATT---------AGTAAGAAAAA
InDel 17 InDel 18 InDel 19 InDel 20
ABECANDSTNSYNTANTONUNSBUSLYSTU
AATATAAATTTAAAATAGAAACTTAAAATCGAAATTTAAAAT------TTAAAAT------TTAAAAT------TTAAAAT------TTAAAATAGAAACTTAAAATAGAAACTTAAAATAGAAACTTAA
CTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTT------CAAGCTTTGAAGTTCAAGCTTT------CAAG
ATAT------ATAGACATCTACATATAGAGAT------AGAGATAT------ATAGATAT------ATAGATAT------ATAGATAT------ATAGAGAT------ATAGAGAT------AGAGAGAT------ATAG
ATAT---------TTTG ATATAGAAAATATTTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG ATAT---------TTTG
GAGAAAATTGATAAAAAAT---------AAAAGAGAAAATTGACAAAAGATAAAATTGATAAAAGATAAAATTGATAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAAGATAAAATTGAGAAAA
AACATTAATAAAAACCAAAATATCAATAAAAACAACAAC------------CAAAACATCAATAAAAACCAAAACATCAATAAAAACCAAAACA------------AAAACATCAATAAAAACAAAAAC------------CAAAAC------------CAAAAC------------CAA
AAACAAAACTT
AACTT
------CTT
AAAAAAAACTTAAACAAACCTTAAACAAAACTTAAACAAAACTTAAACAAAACTTAAACAAAAACAAAACTTAAAAACAAAACTT
InDel 21 InDel 22 InDel 23 InDel 24 InDel 25 InDel 26 InDel 27
ABECANDSTNSYNTANTONUNSBUSLYSTU
TAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAA------GAATTAAAAATAAAGAATTAAA------GAAT
TCAAGGAGAAAGAGTCAAGGAAAAAGAGTCAAAGAGAAAGAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAAGGAGAAAAAGTCAA------AGAGTCAA------AGAGTCAA------AGAG
AAAGAATTTTGATGAATCCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAA---------------------------------------------------------TCAT AAAGAATTTTGATGAATTCATTCTGCAACCGCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAATCATTTTGATGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGACAAAAATCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT AAAGAATTTTGACGAATTCATTCTGCAACCTCAAACTCAAAGAATAAATACAGAAAAAACTCAT
TAAA------GAGATAAA------GATATAATTATAACGAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GAGATAAA------GATATAAA------GATATAAA------GATA
InDel 28 InDel 29 InDel 30 InDel 31
ABECANDSTNSYNTANTONUNSBUSLYSTU
--AGATAATAATTGCATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATAGAACTCTCTTCGCTTTTCTGATTGATATATAAAATA-------------------------------------------------------------------------------------------------------------ATA--AGATAATAATTGAATAATTTAACGATTTACCTTTCATATTTGATATTGATTCGCTCACCAATCAATACGTAATGGAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--AGATAATAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA----ATC-TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTTAACGATTTCCCTTTCATATTTGATATTGATTAGCTCACCAATCAATACGTAATGGAACTCGCTTCGCTTTTCTGATTGATAGATAAAATA--------TAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATATTCTTCAATTATTATCTAATTGAACAATTTCCCTTTCCTCTTTGATATTGATTAGCTCACCAATCCATATATAACATAACTCTATTCGCTTTTCTGATTGATATAGAAAATA--AGATAATAATTGAATAATTGAACGATTTACCTTTCCTATTTGATATTGATTAGCTCACCAATCCATATGTAACGGAACTCTATTCGCTTTTCTGATTGATATATAAAATA
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ATT----------------------------------------------------------------------------------------------------------------------------------ACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAAGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTA------------------------------------------------------------------------------------------------------GTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACCATTCTTAAAACCAGCGTTCTTAAGACCAAAGAGTCGGGCGGAAGGGGGGGAAAGCCCTCCGTTCCTGGTTCTCCTGTAGTTGGATCCTCCGGAACCACAAGAATCCTTAGTTAGAATGGGATTCCAACTCAGCACC
tRNA-Ala(UGC)
InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 12
AAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGAGGAAAGTGAGGAAGAAAGAGAT
AGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
GAAGAAAAAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGATAAAGGGTGGAAAGTGAGGAAGAAAGAGAT
Figure 1 Partial multiple sequence alignment of accD clpP ndhA rpl32 rps16 sprA tRNA-Ala (UGC) tRNA-Leu(UAA) and ycf1 genesequences of ten solanaceous species showing location of InDels indicated by hyphens
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 9
INQEKINNKKKIINQEKINNKNKIINQEKIKKKKKIINQEKINNQKKIINQEKINNQKKIINQ------KKIINQEKINNQKKIINQEKITNKKKIINQEKINNKKKIINQEKITNKKKI
HFI--STL NFIFISTI HFI--STI HFI--STI HFI--STI HFI--STI HFI--STI HFIFISTI HFMFISTI HFIFISTI
LKKSLY---NISNKNSHIIKKSLYNIYNISKKNSHSIKN------------SHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKPLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHIIKKSLY---NISKKNSHI
QNKNINPFQNKNRNPFQNKNRNPFQNR--NPFQNR--NPFQNR--NPFQNR--NPFQNKNRNPFQNKNRNPFKNKNRNPF
NPF--QVNNPF--QVNNPF--QVSNPF--QVNNPF--QVNNPF--QVNNPF--QVNNPF--QVKNPFEVQVKNPF--QVK
LYI--ERNLHLHIERNLDR--ERNLYI--ERNLYI--ERNLYI--ERNLYI--ERNLDI--ERNLDR--ERNLDI--ERN
LDR---KYFLDRKYRKYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYFLDR---KYF
IIEKIDKKGIIN---KKGIIEKIDKKGIIDKIDKKGIIDKIDKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKGIIDKIEKKG
InDel 16 InDel 17 InDel 18 InDel 19 InDel 20 InDel 21 InDel 22 InDel 23
QNKNINKNQKNQNINKNNQNK----NQQNKNINKNQQNKNINKNQQNK----NKQNKNINKNKQNK----NQQNK----NQQNK----NQ
NQKQNFFNNKKNFFNQKQTFFNQKQNFFNQKQNFFNKKQNFFNKKQNFFNQKQNFFNQ--NFFNQKQNFF
SNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNK--KIKSNKKIKIKSNK--KIK
PANQGERERPASQGKRERPASQRERERPASQGEKERPASQGEKERPASQGEKERPASQGEKERPPSQ--RERPPSQ--RERPPSQ--RER
DKKNFDESILQPQTQRINTDKNHFDDKN-------------------HFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTDKNHFDDKNHFDEFILQPQTQRINTDKNHFDDKKNFDEFILQPQTQRINTEKTHFDDKKNFDEFILQPQTQRINTEKTHFGDKKNFDEFILQPQTQRINTEKTHFD
PLY--KEKPLY--KDKPLYNYNEKPLY--KEKPLY--KEKPLY--KEKPLY--KEKPLY--KDKPLY--KDKPLY--KDK
InDel 24 InDel 25 InDel 26 InDel 27 InDel 28 InDel 29
NIH--------SCSNIP--------SCSNIHSWGNPDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIHSWRNRDNSSCSNIH--------SCSNIH--------SCSNIH--------SCS
YYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYN---SYMYYNSYMSYMYYNSYMSYMYYNSYMSYM
SSN-----------------------------------------------DLESSNGSDLTIGSDLTIGSDLTNGSDLTIGSDLTIGSDLTNGSDLTIRESSNDLESSN-----------------------------------------------ESESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLESSN-----------------------------------------------DLE
AFFPLNQKAFFPLNQKAFL--NQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQKAFFPLNQK
accD
InDel 1 InDel 2 InDel 3 InDel 4
ABECANDSTNSYNTANTONUNSBUSLYSTU
-MIFQ-MIFQ-MIFQ-MIFQMMIFQ-MIFQ-MIFQ-MIFQ-MIFQ-MIFQ
NKY------LVLNKYIRSNKYLVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVLNKY------LVL
KEA------SKTKEAKKLKEASKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKTKEA------SKT
TEERVESEEERDVETEEREESEEERDVETEE-------RDVETEERVESEEERDVETEERVESEEERDVETEERVESEEEKDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVETEERVESEEERDVE
KQEQEGSTEQE--GSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGSTKQEQEGST
PYPSPSLFPYPSPSPFPYP--SLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLFPYPSPSLF
SLF-------SEESPFQEGSTEESEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEESLF-------SEE
GYNNINTSNGYKNINTSNGYNNINTSNGYNNINTSNGYNNINTSNGYN---TSNGYNNINTNKGYNNINTSNGYNNINTSNGYNNINTSN
NNN--TGNNNNNNTGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGNNNN--TGN
ycf1
InDel 1 InDel 2 InDel 3 InDel 4 InDel 5 InDel 6 InDel 7 InDel 8 InDel 9
NTONUNSBUSLYSTU
ABECANDSTNSYNTA
PHA--RVPHA--RVPHAIIRVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RVPHA--RV
IVDLVAVE----------IVDLVAIQ----------IVDLVAVQ----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDLVAVE----------IVDFVAVQGK--------IVDFVAVQGKEHGFHADLIVDFVAVQGK--------
SFF-VRQTGFFLVRQTSFF--CTPSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQTSFF-VRQT
RPNQPKFNRPNQPKFNRPNQPKFNRPNQS---RPNQS---RPNQS---RPNQS---RLNQPKFNRLNQPKFNRLNQPKFN
clpP rpl32 rps16
InDel 1 InDel 2 InDel 1 InDel 1
ABECANDSTNSYNTANTONUNSBUSLYSTU
ENK--DLFENQ--DLFENK--KFFENK--DLFENK--DLFENK--ELFENK--DLFENQ--DLFENQEQDLFENQ--DLF
SSL--------------KVNSSLNLKDLSLFSDQEQVRTNSSFNLKDLSLFSDQEQGRANSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQEQGRVNSSLNLKDLSLFSDQDQGRVNSSLNVKGLSLFSDQEQRRAKSSLNLKGLSLFSDQEQRRANSSLNLKGLSLFSDQEQRRVN
DDP-EVKE----------EVAKDPDETNE----------EVPKGS-APDE----------EVNNDP-DTKDTDTADQDQTKEVANDP-DTKDTDTADQDQTKEVANDP-NTKDTDTADQDQTKEVANDP-DTKETDTADQDQTKEVANDP----E----------EVAKDA----E----------EVANDP----E----------EVA
ETKSKKEENT-RKEENQ-RKEENK-RKEENK-RKEENK-RKEENK-RKEDNK-RKEDNK-RKEDNK-RKE
FIT-----LKVFITFITITRKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFIT-----LKVFLT-----LKIFLT-----LKIFLT-----LKI
RNK----TNNIKKFESTKNPKS-------------PKNPNK----TNNIKKFESPKNPNK----TNNIKKFESPKKPNK----TNNIKKFESPKKPNKSPNKTNNIKKFESPKNPNK----TNNIKKFESPKNPNK----TNNIKKLESPKKPNK----TNNIKKFESPKKPNK----TNNIKKLESPKK
InDel 10 InDel 12 InDel 13 InDel 14
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
ABECANDSTNSYNTANTONUNSBUSLYSTU
InDel 11 InDel 15
Figure 2 Partial multiple sequence alignment of amino acid sequences of genes namely accD clpP ndhA rpl32 rps16 sprA tRNA-Ala(UGC) tRNA-Leu(UAA) and ycf1 of ten solanaceous species showing location of InDels indicated by hyphens
protein sequences as shown in Figure 2 The accD gene hasbeen reported to be one of themost variable plastid genes andis probably under diversifying selection [26]
(2) clpP In clpP gene InDels were found both in intron andin exon regions Two major consequences were observed inthe InDels in the exon regions An insertion of 6 bp in Sbulbocastanum and S tuberosum and 30 bp in S lycopersicumat 31015840 end (exon 3) of the clpP gene resulted in shifting of stopcodon by 6 6 and 30 bp downstream in respective speciescompared to other species of Solanaceae family increasingthe length of the coding sequence and the protein product(Figures 1 and 2) An interesting feature was observed asInDel 1 in protein sequence corresponding to insertion of arepeat of ldquoIrdquo amino acids in D stramonium making exon 3region longer by 6 bp This region however corresponds to
the end of intron 2 in clpP gene in all other species Since Dstramonium chloroplast genomehas been sequenced recentlythis observation needs to be experimentally validated
(3) ndhA All the InDels found in ndhA were presentin introns while the protein-coding regions (exons) werehighly conserved This indicates high diversifying selectionon intronic region of this gene Out of the total 14 InDelsmost of the InDels were observed with respect to C annuum(InDels 1 2 5 7 and 10) InDel 10 was observed to be sharedby C annuum and S lycopersicum in full and by C annuumand A belladonna in part
(4) rpl32 In rpl32 insertion of 1 bp in D stramonium and3 bp in C annuum was found in the 31015840 region of genewhile a deletion of 4 bp was observed in D stramonium
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
10 Advances in Bioinformatics
The insertion of 3 bp in C annuum only altered the lengthof the protein by making it longer by 1 amino acid Howeverthe small insertion of 1 bp in D stramonium proved to be aframeshift mutation resulting in three changes in the aminoacid sequence near the C-terminus Moreover deletion of4 bp at the 31015840 end resulted in premature termination of proteinsynthesis The frameshift mutation and the 31015840 end deletionfinally reduced the gene product length by 1 amino acid AstheC-terminal of amino acid chain is well conserved in all theother species the effect of above mentioned variations needsto be validated experimentally
(5) rps16 In rps16 also InDels were observed in introns as wellas exons Five of the major insertions in the intron regionswere species specific Insertion of 38 bp (InDel 1) 9 bp (InDel2) 5 bp (InDel 7) 4 bp (InDel 8) and 6 bp (InDel 9) wasobserved in A belladonna S lycopersicum D stramoniumand C annuum A deletion of 5 bp was observed in all thethree Solanum species and C annuum A deletion of 9 bpwas observed in all Nicotiana species resulting in an aminoacid change (P to S) and shortening of protein by threeamino acids in the C-terminal region Similar deletion hasalso been observed by Kahlau et al [27] and was suggested tobe functionally neutral
(6) sprA sprA gene has been reported as stable noncodingRNA of unknown function This gene has been suggestedto influence 16S rRNA maturation [38 39] In many speciesthis gene seems to be present as remnant and shows largevariations in its 51015840 region The largest deletion of 109 bp wasobserved in C annuum The rest of this gene appears to bemore conservedwith a deletion towards the 31015840 end in allNico-tiana species and A belladonna The manner in which thisgene functions and the consequences of the abovementionedvariations are yet to be investigated experimentally
(7) trnA-UGC In this particular gene a long deletion of102 bpwas observed in allNicotiana species Interestingly thisdeletion was further extended to 130 bp in both directionsin A belladonna These deletions were found in the intronregion and so are unlikely to have anynegative impact on geneproduct function
(8) trnL-UAAThe trend of variation in trnL-UAAwas similarto that in ndhA as all InDels were observed in introns Thelongest species specific deletion (InDel 3) was observed in Cannuum whereas short insertion of four nucleotides a repeatof ldquoTrdquo was observed specifically in D stramonium Anotherinsertion of 6 bp was observed in two Nicotiana species thatis N sylvestris and N tabacum
(9) ycf1 Many InDels (31015840 region) were found in the fastestevolving gene that is ycf1 gene Most of the InDels werefound to be species specific Maximum InDels (InDels 2 35 7 9 16 17 19 23 24 25 and 30) were observed in Cannuum followed by D stramonium (InDels 4 6 20 26and 31) by N tomentosiformis (InDels 8 17 and 18) and byS lycopersicum (InDels 16 22 and 28) Two genus specificInDels (InDels 14 and 21) were observed in all the four
Nicotiana species However InDel 19 was also present in Dstramonium Another genus specific InDel (InDel 29) wasobserved in Solanum species Most of the InDels altered thelength of the gene product with maximum length of 1906amino acids (aa) observed in C annuum and the minimumof 1873 aa observed in D stramonium Among the Solanumspecies the length of protein (1887 aa) was conserved amongS bulbocastanum and S tuberosum However S lycopersicumwas having the amino acid sequence of 1891 aa larger by 4aa as compared to the other two species of the same genusAmong the four Nicotiana species the ycf1 gene productlength was conserved amongN sylvestrisN tabacum andNundulata having protein lengths of 1901 aa 1902 aa and 1901aa respectively HoweverN tomentosiformiswas observed tobe the most variable member of the genus Nicotiana havingprotein length of 1892 aa
35 Phylogenetic Analysis of Solanaceous Plastomes Evolu-tionary relationships between diverse plant species have beenanalyzed using several plastome markers including matKand rbcL (genes) or trnH-psbA and trnL-trnF (intergenicregions) due to sequence conservation among plant taxablended with suitable variation [40 41] However deter-mination of phylogeny based on single gene sequencesmay be inaccurate [42] Availability of complete chloroplastsequences for many species has made it possible to use manyindividual genes or concatenated gene sequences to deducephylogenetic relationships among taxa [42ndash44]
Efforts have beenmade to carry out phylogenetic analysisof solanaceous species using complete plastome sequencesby Moore et al [44] and Jansen et al [45] Evolutionarypositions of Capsicum and Datura in Solanaceae have beendetermined using a single or a few plastid genes [46 47]Recently concatenated protein-coding gene sequences fromcompletely sequenced plastomes were used to obtain rea-sonable phylogenetic relationships for solanaceous species[29] In the present investigation we also used a similarapproach to analyze the phylogenetic relationship for tensolanaceous species with completely sequenced plastomesIndividual multiple sequence alignments were concatenatedfor maximum likelihood phylogenetic tree generation Asdepicted in Figure 3 taxa were divided into two cladeswith 100 bootstrap value of 500 The first clade consistedof four Nicotiana species while the species in SolanumCapsicum Atropa and Datura were included in the secondclade These results are in line with previous phylogeneticanalyses using concatenated protein-coding gene sequencesas well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29 47] However in an analysis of 13 orfs ofsolanaceous plastomes a different arrangement was shown inwhich Atropa was shown to be separated from Solanum andwas grouped together with Nicotiana [25]
4 Conclusions
The analyses of complete plastid genomes of ten solanaceousspecies revealed overall similarity in terms of the gene contentand organization The sizes of LSC SSC and IR regions were
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 11
0051 CAR
0083 DCA
0461500
0002500
0004 NTO
0403
0003 NUN
0002500
0 NSY
0 NTA
0001468
0006 ABE
0001500
0006 DST
0001500
0007 CAN
0007500
0003 SLY
0001500
0001 SBU
0001 STU
Figure 3Maximum likelihoodphylogenetic tree derived using con-catenated nucleotide sequences of 75 protein-coding genes of tensolanaceous species and two outgroup species
found to be somewhat conserved among species but a signifi-cant variation was found between genera Most of the codingregions were well conserved However many of the featuresin few genes were observed to be typical of a particular genusand even species which can be used as molecular markers toinvestigate genetic diversity and evolutionThese typical vari-ations can be further utilized to develop more sophisticatedDNA barcoding based techniques Ten solanaceous speciesare divided into two clades on the basis of Phylogeneticanalysis using concatenated alignment of gene sequencesfrom coding regions of plastomes The first clade consistedof four Nicotiana species and the second clade consisted ofspecies of Solanum Capsicum Atropa and Datura
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
The authors are thankful to University Grants CommissionNew Delhi for providing MANF fellowship to HarpreetKaur
References
[1] M G Bausher N D Singh S-B Lee R K Jansen and HDaniell ldquoThe complete chloroplast genome sequence of Citrussinensis (L) Osbeck var ldquoRidge Pineapplerdquo organization andphylogenetic relationships to other angiospermsrdquo BMC PlantBiology vol 6 no 1 article 21 2006
[2] Z Y Hu W Hua S M Huang and H Z Wang ldquoCompletechloroplast genome sequence of rapeseed (Brassica napus L)and its evolutionary implicationsrdquo Genetic Resources and CropEvolution vol 58 no 6 pp 875ndash887 2011
[3] J D Palmer ldquoComparative organization of chloroplast geno-mesrdquoAnnual Review of Genetics vol 19 no 1 pp 325ndash354 1985
[4] RGOlmstead and J D Palmer ldquoChloroplastDNA systematicsa review of methods and data analysisrdquoTheAmerican Journal ofBotany vol 81 no 9 pp 1205ndash1224 1994
[5] R A Bungard ldquoPhotosynthetic evolution in parasitic plantsInsight from the chloroplast genomerdquo BioEssays vol 26 no 3pp 235ndash247 2004
[6] L A Raubeson and R K Jansen ldquoChloroplast genomes ofplantsrdquo in Diversity and Evolution of Plants-Genotypic andPhenotypic Variation in Higher Plants R Henry Ed pp 45ndash68CABI Publishing Wallingford UK 2005
[7] R C Haberle H M Fourcade J L Boore and R K JansenldquoExtensive rearrangements in the chloroplast genome of Tra-chelium caeruleum are associated with repeats and tRNA genesrdquoJournal of Molecular Evolution vol 66 no 4 pp 350ndash361 2008
[8] WMartin and R G Herrmann ldquoGene transfer from organellesto the nucleus how much what happens and whyrdquo PlantPhysiology vol 118 no 1 pp 9ndash17 1998
[9] H L Race R G Herrmann and W Martin ldquoWhy haveorganelles retained genomesrdquo Trends in Genetics vol 15 no9 pp 364ndash370 1999
[10] T Wakasugi T Tsudzuki and M Sugiura ldquoThe genomics ofland plant chloroplasts gene content and alteration of genomicinformation by RNA editingrdquo Photosynthesis Research vol 70no 1 pp 107ndash118 2001
[11] Y-K Kim C-W Park and K-J Kim ldquoComplete chloroplastDNA sequence from a Korean endemic genus Megaleranthissaniculifolia and its evolutionary implicationsrdquo Molecules andCells vol 27 no 3 pp 365ndash381 2009
[12] A Khan I A KhanHAsif andMKAzim ldquoCurrent trends inchloroplast genome researchrdquo African Journal of Biotechnologyvol 9 no 24 pp 3494ndash3500 2010
[13] H Shimada and M Sugiura ldquoFine structural features of thechloroplast genome comparison of the sequenced chloroplastgenomesrdquo Nucleic Acids Research vol 19 no 5 pp 983ndash9951991
[14] M Sugiura ldquoThe chloroplast genomerdquo Plant Molecular Biologyvol 19 no 1 pp 149ndash168 1992
[15] M E Cosner R K Jansen J D Palmer and S R Downie ldquoThehighly rearranged chloroplast genome of Trachelium caeruleum(Campanulaceae) multiple inversions inverted repeat expan-sion and contraction transposition insertionsdeletions andseveral repeat familiesrdquo Current Genetics vol 31 no 5 pp 419ndash429 1997
[16] L Gao X Yi Y-X Yang Y-J Su and T Wang ldquoCompletechloroplast genome sequence of a tree fern Alsophila spinulosainsights into evolutionary changes in fern chloroplast genomesrdquoBMC Evolutionary Biology vol 9 no 1 article 130 2009
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
12 Advances in Bioinformatics
[17] T Kato T Kaneko S Sato Y Nakamura and S TabataldquoComplete structure of the chloroplast genome of a legumeLotus japonicusrdquoDNA Research vol 7 no 6 pp 323ndash330 2000
[18] B Goffinet N J Wickett O Werner R M Ros A J Shaw andC J Cox ldquoDistribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta)rdquoAnnals of Botany vol 99 no 4 pp 747ndash753 2007
[19] J D Palmer JMNugent and L AHerbon ldquoUnusual structureof geranium chloroplast DNA a triple-sized inverted repeatextensive gene duplications multiple inversions and two repeatfamiliesrdquo Proceedings of the National Academy of Sciences of theUnited States of America vol 84 no 3 pp 769ndash773 1987
[20] S Greiner XWang U Rauwolf et al ldquoThe complete nucleotidesequences of the five genetically distinct plastid genomes ofOenothera subsection Oenothera I Sequence evaluation andplastome evolutionrdquo Nucleic Acids Research vol 36 no 7 pp2366ndash2378 2008
[21] J J Doyle J I Davis R J Soreng D Garvin and M JAnderson ldquoChloroplast DNA inversions and the origin of thegrass family (Poaceae)rdquo Proceedings of the National Academy ofSciences of the United States of America vol 89 no 16 pp 7722ndash7726 1992
[22] F A Michelangeli J I Davis and D W Stevenson ldquoPhy-logenetic relationships among Poaceae and related fami-lies as inferred from morphology inversions in the plastidgenome and sequence data from the mitochondrial and plastidgenomesrdquoTheAmerican Journal of Botany vol 90 no 1 pp 93ndash106 2003
[23] E Bortiri D Coleman-Derr G R Lazo O D Andersonand Y Q Gu ldquoThe complete chloroplast genome sequence ofBrachypodium distachyon sequence comparison and phyloge-netic analysis of eight grass plastomesrdquo BMC Research Notesvol 1 article 61 2008
[24] C H Leseberg and M R Duvall ldquoThe complete chloroplastgenome ofCoix lacryma-jobi and a comparative molecular evo-lutionary analysis of plastomes in cerealsrdquo Journal of MolecularEvolution vol 69 no 4 pp 311ndash318 2009
[25] H J Chung J D Jung H W Park et al ldquoThe completechloroplast genome sequences of Solanum tuberosum andcomparative analysis with Solanaceae species identified thepresence of a 241-bp deletion in cultivated potato chloroplastDNA sequencerdquoPlant Cell Reports vol 25 no 12 pp 1369ndash13792006
[26] H Daniell S-B Lee J Grevich et al ldquoComplete chloro-plast genome sequences of Solanum bulbocastanum Solanumlycopersicum and comparative analyses with other Solanaceaegenomesrdquo Theoretical and Applied Genetics vol 112 no 8 pp1503ndash1518 2006
[27] S Kahlau S Aspinall J C Gray and R Bock ldquoSequence ofthe tomato chloroplast DNA and evolutionary comparison ofsolanaceous plastid genomesrdquo Journal of Molecular Evolutionvol 63 no 2 pp 194ndash207 2006
[28] M Yukawa T Tsudzuki and M Sugiura ldquoThe chloroplastgenome of Nicotiana sylvestris and Nicotiana tomentosiformiscomplete sequencing confirms that theNicotiana sylvestris pro-genitor is the maternal genome donor of Nicotiana tabacumrdquoMolecular Genetics and Genomics vol 275 no 4 pp 367ndash3732006
[29] Y D Jo J Park J Kim et al ldquoComplete sequencing andcomparative analyses of the pepper (Capsicum annuum L)plastome revealed high frequency of tandem repeats and large
insertiondeletions on pepper plastomerdquo Plant Cell Reports vol30 no 2 pp 217ndash229 2011
[30] C Schmitz-Linneweber R Regel T G Du H Hupfer RG Herrmann and R M Maier ldquoThe plastid chromosome ofAtropa belladonna and its comparison with that of Nicotianatabacum the role of RNA editing in generating divergencein the process of plant speciationrdquo Molecular Biology andEvolution vol 19 no 9 pp 1602ndash1612 2002
[31] M Kunnimalaiyaan and B L Nielsen ldquoFine mapping of repli-cation origins (oriA and oriB) inNicotiana tabacum chloroplastDNArdquo Nucleic Acids Research vol 25 no 18 pp 3681ndash36861997
[32] G Thyssen Z Svab and P Maliga ldquoCell-to-cell movementof plastids in plantsrdquo Proceedings of the National Academy ofSciences of the United States of America vol 109 no 7 pp 2439ndash2443 2012
[33] D Gargano A Vezzi N Scotti et al ldquoThe complete nucleotidesequence genome of potato (Solanum tuberosum cv Desiree)chloroplast DNArdquo in Book of Abstracts of the 2nd SolanaceaeGenome Workshop p 107 2005
[34] K-J Kim and H-L Lee ldquoComplete chloroplast genomesequences from Korean ginseng (Panax schinseng Nees) andcomparative analysis of sequence evolution among 17 vascularplantsrdquo DNA Research vol 11 no 4 pp 247ndash261 2004
[35] D A Steane ldquoComplete nucleotide sequence of the chloroplastgenome from the Tasmanian blue gum Eucalyptus globulus(Myrtaceae)rdquo DNA Research vol 12 no 3 pp 215ndash220 2005
[36] J D Palmer ldquoCell culture and somatic cell genetics of plantsrdquo inTheMolecular Biology of Plastids R G Hermann Ed pp 5ndash53Springer Vienna Austria 1991
[37] M Sugiura K Shinozaki M Tanaka et al ldquoSplit genes andcistrans splicing in tobacco chloroplastsrdquo in Plant MolecularBiology D von Wettstein and N H Chua Eds pp 65ndash76Plenum Press New York NY USA 1987
[38] A Vera and M Sugiura ldquoA novel RNA gene in the tobaccoplastid genome its possible role in thematuration of 16S rRNArdquoThe EMBO Journal vol 13 no 9 pp 2211ndash2217 1994
[39] M Sugita Z Svab P Maliga and M Sugiura ldquoTargeteddeletion of sprA from the tobacco plastid genome indicatesthat the encoded small RNA is not essential for pre-16S rRNAmaturation in plastidsrdquo Molecular and General Genetics vol257 no 1 pp 23ndash27 1997
[40] R Lahaye M van der Bank D Bogarin et al ldquoDNA barcodingthe floras of biodiversity hotspotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 105 no8 pp 2923ndash2928 2008
[41] P Taberlet L Gielly G Pautou and J Bouvet ldquoUniversalprimers for amplification of three non-coding regions of chloro-plast DNArdquo PlantMolecular Biology vol 17 no 5 pp 1105ndash11091991
[42] X Guo S Castillo-Ramırez V Gonzalez et al ldquoRapid evolu-tionary change of common bean (Phaseolus vulgaris L) plas-tome and the genomic diversification of legume chloroplastsrdquoBMC Genomics vol 8 article 228 2007
[43] R K Jansen C Kaittanis C Saski et al ldquoPhylogenetic analysesof Vitis (Vitaceae) based on complete chloroplast genomesequences effects of taxon sampling and phylogenetic methodson resolving relationships among rosidsrdquo BMC EvolutionaryBiology vol 6 no 1 article 32 2006
[44] M J Moore P S Soltis C D Bell J G Burleigh and D ESoltis ldquoPhylogenetic analysis of 83 plastid genes further resolves
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 13
the early diversification of eudicotsrdquo Proceedings of the NationalAcademy of Sciences of the United States of America vol 107 no10 pp 4623ndash4628 2010
[45] R K Jansen Z Cai L A Raubeson et al ldquoAnalysis of 81 genesfrom 64 plastid genomes resolves relationships in angiospermsand identifies genome-scale evolutionary patternsrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 104 no 49 pp 19369ndash19374 2007
[46] L Bohs and R G Olmstead ldquoPhylogenetic relationships inSolanum (Solanaceae) based on ndhF sequencesrdquo SystematicBotany vol 22 no 1 pp 5ndash17 1997
[47] R G Olmstead L Bohs H A Migid E Santiago-ValentinV F Garcia and S M Collier ldquoA molecular phylogeny of theSolanaceaerdquo Taxon vol 57 no 4 pp 1159ndash1181 2008
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology