7
Supporting Information Janouškovec et al. 10.1073/pnas.1003335107 SI Materials and Methods Analysis of rearrangements between plastid genomes of CCMP- 3155 and P. falciparum was done using GRIMM (1). Homolo- gous genes from large single copy region and genes from in- verted repeats were analyzed altogether (circular topology) or separately (linear topology) leading to the same total of four in- versions. Ribosomal RNA sequences were aligned using arb- aligner (http://www.arb-silva.de/aligner/). Amino acid sequences were aligned using MAFFT v6.240 (2). Alignments were edited using Bioedit v7.0.9 (3) and Gblocks v 0.91b (4). The subset of 34 conserved plastid genes (Fig. 5) was selected based on the max- imum likelihood distances as inferred with TREE-PUZZLE 5.2 (5) (cutoff value was set to 0.82). Concatenated nuclear da- taset (i; see below) was analyzed using RAxML 7.1 (6) (LG+ Gamma+F model for protein genes and GTR+Gamma for rRNA genes, 1,000 bootstrap replicates) and MrBayes 3.1.2 (7); (WAG+Gamma+F and GTR+Gamma, models, two Markov chains run under default priors for 2 × 10 6 generations, rst 5 × 10 4 were excluded from consensus topology reconstruction as a burn-in). Plastid datasets (ii, iii, v, vi ) were analyzed under the CpREV+Gamma+F empirical model in RAxML (500 boot- strap replicates) and MrBayes (two Markov chains, default pri- ors, 5 × 10 5 generations, burn-in 5 × 10 4 ), and CAT mixture model in PhyloBayes 3.2 (8) and PhyML 3.0-CAT (9, 10) (C50 model for best tree, C20 for 100 replicate bootstrap analysis). Form II Rubisco dataset (iv) was analyzed using RAxML (LG+ Gamma+F, 1,000 bootstrap replicates), MrBayes and Phylo- Bayes (settings same as ii, iii, v, vi ). Some bioinformatic analyses were carried out on the freely available Bioportal (www.bio portal.uio.no). Phylogenetic Analyses Were Conducted on the Following Datasets. (i ) Dataset of nuclear genes (6 protein + 2 rRNA genes; 7,137 positions); Fig. 1 Genes: hsp90, hsp70, alpha-tubulin, beta-tubulin, biP, eF2, SSU rRNA, LSU rRNA (ii ) Dataset of plastid genes limited to genes retained in di- noagellate plastid genomes (11 genes, 4,212 amino acid posi- tions); Fig. S7A Genes: atpA, atpB, petB, petD, psaA, psaB, psbA, psbB, psbC, psbD, psbE (iii ) Dataset of plastid genes limited to the content of api- complexan plastid genomes (23 genes, 4,438 amino acid posi- tions); Fig. S7B Genes: clpC, rpl2, rpl4, rpl6, rpl11, rpl14, rpl16, rpl23, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps17, rps19, sufB, tufA (iv) Dataset of form II Rubisco (455 amino acid positions); Fig. 3 (v) Dataset of 34 conserved plastid genes (7,599 amino acid positions); Fig. 5 Genes: acsF, atpA, atpB, atpH, atpI, clpC, petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, rpl11, rpl14, rpl16, rps12, rps19, rps5, sufB, tufA (vi ) Dataset of all plastid genes present in CCMP3155 or C. velia (68 genes, 15,736 amino acid positions); Fig. S8 Genes: acsF, atpA, atpB, atpH, atpI, ccs1, ccsA, clpC, petA, petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, psbV, rpl2, rpl3, rpl4, rpl5, rpl6, rpl11, rpl14, rpl16, rpl19, rpl20, rpl23, rpl27, rpl31, rpoA, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps13, rps14, rps16, rps17, rps18, rps19, secA, secY, sufB, tatC, tufA, ycf3, ycf4 1. Tesler G (2002) GRIMM: Genome rearrangements web server. Bioinformatics 18: 492493. 2. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511518. 3. Hall T (1999) BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:9598. 4. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540552. 5. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502504. 6. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:26882690. 7. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754755. 8. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:10951109. 9. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696704. 10. Quang S, Gascuel O, Lartillot N (2008) Empirical prole mixture models for phylogenetic reconstruction. Bioinformatics 24:23172323. Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 1 of 7

Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

  • Upload
    lydang

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

Supporting InformationJanouškovec et al. 10.1073/pnas.1003335107SI Materials and MethodsAnalysis of rearrangements between plastid genomes of CCMP-3155 and P. falciparum was done using GRIMM (1). Homolo-gous genes from large single copy region and genes from in-verted repeats were analyzed altogether (circular topology) orseparately (linear topology) leading to the same total of four in-versions. Ribosomal RNA sequences were aligned using arb-aligner (http://www.arb-silva.de/aligner/). Amino acid sequenceswere aligned using MAFFT v6.240 (2). Alignments were editedusing Bioedit v7.0.9 (3) and Gblocks v 0.91b (4). The subset of 34conserved plastid genes (Fig. 5) was selected based on the max-imum likelihood distances as inferred with TREE-PUZZLE5.2 (5) (cutoff value was set to 0.82). Concatenated nuclear da-taset (i; see below) was analyzed using RAxML 7.1 (6) (LG+Gamma+F model for protein genes and GTR+Gamma forrRNA genes, 1,000 bootstrap replicates) and MrBayes 3.1.2 (7);(WAG+Gamma+F and GTR+Gamma, models, two Markovchains run under default priors for 2 × 106 generations, first 5 ×104 were excluded from consensus topology reconstruction asa burn-in). Plastid datasets (ii, iii, v, vi) were analyzed under theCpREV+Gamma+F empirical model in RAxML (500 boot-strap replicates) and MrBayes (two Markov chains, default pri-ors, 5 × 105 generations, burn-in 5 × 104), and CAT mixturemodel in PhyloBayes 3.2 (8) and PhyML 3.0-CAT (9, 10) (C50model for best tree, C20 for 100 replicate bootstrap analysis).Form II Rubisco dataset (iv) was analyzed using RAxML (LG+Gamma+F, 1,000 bootstrap replicates), MrBayes and Phylo-Bayes (settings same as ii, iii, v, vi). Some bioinformatic analyseswere carried out on the freely available Bioportal (www.bioportal.uio.no).

Phylogenetic Analyses Were Conducted on the Following Datasets.(i) Dataset of nuclear genes (6 protein + 2 rRNA genes; 7,137positions); Fig. 1Genes: hsp90, hsp70, alpha-tubulin, beta-tubulin, biP, eF2, SSU

rRNA, LSU rRNA(ii) Dataset of plastid genes limited to genes retained in di-

noflagellate plastid genomes (11 genes, 4,212 amino acid posi-tions); Fig. S7AGenes: atpA, atpB, petB, petD, psaA, psaB, psbA, psbB, psbC,

psbD, psbE(iii) Dataset of plastid genes limited to the content of api-

complexan plastid genomes (23 genes, 4,438 amino acid posi-tions); Fig. S7BGenes: clpC, rpl2, rpl4, rpl6, rpl11, rpl14, rpl16, rpl23, rpoB,

rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8, rps11, rps12, rps17,rps19, sufB, tufA(iv) Dataset of form II Rubisco (455 amino acid positions);

Fig. 3(v) Dataset of 34 conserved plastid genes (7,599 amino acid

positions); Fig. 5Genes: acsF, atpA, atpB, atpH, atpI, clpC, petB, petD, petG,

petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC, psbD, psbE,psbF, psbH, psbI, psbJ, psbK, psbN, psbT, rpl11, rpl14, rpl16, rps12,rps19, rps5, sufB, tufA(vi) Dataset of all plastid genes present in CCMP3155 or C.

velia (68 genes, 15,736 amino acid positions); Fig. S8Genes: acsF, atpA, atpB, atpH, atpI, ccs1, ccsA, clpC, petA,

petB, petD, petG, petN, psaA, psaB, psaC, psaD, psbA, psbB, psbC,psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbN, psbT, psbV, rpl2,rpl3, rpl4, rpl5, rpl6, rpl11, rpl14, rpl16, rpl19, rpl20, rpl23, rpl27,rpl31, rpoA, rpoB, rpoC1, rpoC2, rps2, rps3, rps4, rps5, rps7, rps8,rps11, rps12, rps13, rps14, rps16, rps17, rps18, rps19, secA, secY,sufB, tatC, tufA, ycf3, ycf4

1. Tesler G (2002) GRIMM: Genome rearrangements web server. Bioinformatics 18:492–493.

2. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: Improvement in accuracyof multiple sequence alignment. Nucleic Acids Res 33:511–518.

3. Hall T (1999) BioEdit: A user-friendly biological sequence alignment editor andanalysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98.

4. Castresana J (2000) Selection of conserved blocks from multiple alignments for theiruse in phylogenetic analysis. Mol Biol Evol 17:540–552.

5. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximumlikelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics18:502–504.

6. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogeneticanalyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690.

7. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetictrees. Bioinformatics 17:754–755.

8. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneitiesin the amino-acid replacement process. Mol Biol Evol 21:1095–1109.

9. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate largephylogenies by maximum likelihood. Syst Biol 52:696–704.

10. Quang S, Gascuel O, Lartillot N (2008) Empirical profile mixture models forphylogenetic reconstruction. Bioinformatics 24:2317–2323.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 1 of 7

Page 2: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

Fig. S1. Transmission electron microscopy of the CCMP3155 plastid. (A) View of a plastid lobe shows thylakoid lamellae organized in stacks of three (Scale bar,200 nm.) a feature also known from C. velia and dinoflagellates. (B) Characteristic pyrenoid with several thylakoid lamellae entering the site of carbon fixationsurrounded by a sheath of starch (Scale bar, 500 nm.). (C–E) The plastid is bounded by four membranes similarly to plastids of C. velia and apicomplexans (whitearrowheads). (Scale bars, 100 nm.)

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 2 of 7

Page 3: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

1taeper

2taeper

Chromera velia

119798 bp(~121200 bp)

1FRO)GGA(P

Absp2Hpta1)UUU(K

Aasp†Basp

†2-Bpta

2FRO

Ctat

Atep

Dtep

)CCU(G

1-Aasp

Bbsp

3FRO

Bbsp

4FROFscaAasp

†2-Aasp

Ebsp

Basp

1Cplc

5FR

O

2Cplc

Cbsp

Dbsp

6FR

O

2lpr

91sp

r

7FR

O

8FR

O4l

prVb

spKb

sp)C

AU(V

)AU

G(Y

)UGU

(T)U

AG(I

Ascc

)GGU(P

3fcy

)GAU(

L)CUU(

E

srr)UAC(

Mf

lrr

Jbsp63lpr

3lpr

Afut

9FRO31spr

3lpr †Afut †

3Cplc

01FRO11FROatpB-1psbN

2spr)AGU(S)UUG(N

2)UUU(KHpta 3)UUU(K

21FRO 7spr 21spr 31FRO Aopr 13lpr 11spr81spr

1Copr

Bopr

11lpr Ipta

41spr Aces

)U

CU(

R

)C

GU(A

)AC

U(W

4spr 2Co

pr

3spr 61l

pr

71sp

r

41lpr 5lp

r 8spr 6lp

r Yces

)AAG(F

)GUU(Q

)ACG(C

)GUG(H

)UAC(I)UCG(S

)CUG(D)AAC(L

)UAC(M 41FRO 51FRO

)GCA(R

Tbsp Casp 61FROGtep

Btep HbspApta

’1)UUU(K ’2Hpta’Absp ’)GGA(P

’1FRO

ycf4EbspFbsp

GtepNbsp

HbspApta

3fcyHptaIpta

2spr

2Copr

1Copr

Bopr1F

RO

02lpr)A

GU(S

61spr)

CC

U(G

Dtep2F

RO

AtepCbsp

BlhcCplc

Casp

11lp

r

4spr

)G

UG(

H)A

CG(

C

)AC

C(W

Kbsp

Ntep

ascc

Vbsp

41sp

rBb

spFs

caIb

sp

Basp

Aasp)CUU(E

)GGU(P

4lpr

32lpr

2lpr91spr

3spr61lpr71spr41lpr5lpr8spr6lpr

5spr

31spr 63lpr 11spr 13lpr Aopr 31lpr 21lpr 7sprAfut

Dasp )AAC(L

)GUU(Q

)AAG(F

3lpr 3FRO 81spr Bfus

Absp

)CCG(G

)CUG(D

)UCG(S

)UAC(I

)AUG(Y Bpta

)UAC(M

TbspJbsp)

UGU(T

)GA

U(L

)U

CU(

R

)CA

U(V

NLhc Llhc72lpr

)U

UU(K

)C

GU(A

)TAC(

Mf

srr dpp lrr

frr Dbsp Bt

ep

)GC

A(R

)UA

G(I

)UU

G(N

1scc

Aces

)UUG(N

)UAG(I

)GCA(R

Btep Db

sp frrlrr

dpp

srr)UAC(Mf

)CGU(A )UUU(K72lprLlhcNlhc )CAU(V

)UCU(R)GAU(L

T(UGU)psbJpsbT

1RI

2RI

CCMP3155

85535 bp expression

photosystems

biosynthesis

other & YCFs

ORFs

pseudogene

CSS

CSL

)AAU(L

*

CCMP2878

)AA

U(L

*

secY

Fig. S2. Plastid genome maps of C. velia and CCMP3155. Genes on the outside are transcribed counter clockwise. All genes are colored according to thefunctional categories (Upper Right). Asterisk next to the gene for tRNA-Leu (UAA) indicates an intron in the anticodon triplet. Crosses in C. velia genes labelpseudogenes. The plastid genome of C. velia has not been proven to map as a circle (dotted line). The relative sizes of both plastid genomes are proportional.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 3 of 7

Page 4: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

48.5

24.3

97

145.5

48.5

24.3

97

145.5

A B

Fig. S3. Size estimate of the C. velia plastid genome. (A) PFGE of genomic DNA revealed a faint band at the size of approximately 120 kb (arrow). (B) Southernhybridization of a different PFGE run. Radioactively marked psbA plastid gene probe showed hybridization signal at corresponding size. We assume that thelower smudge belongs to sheared plastid DNA. The experiment was reproduced two times with the same result.

L13

S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 L13S11 S7S9 S12

S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 S11 S7S9 S12

S12rpoAS17L23 S19L2 L16S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13S5 S11 S7S9 S12

S12L4 rpoAS17S19L2 L16S3 L14 tufAL3 secYS8 L6L5 L36 S11 S7S12

S12L4 rpoAS17S19L2 L16S3 L14 tufAL3 secYS8 L6L5 L36 L31S13S5 L13S11 S7S12

S12L4 S17L23 S19L2 L16S3 L14 S8 L6 L36S5 S11 S7S12

Haptophyte

CCMP3155

Chromera

Plasmodium

L31

L23

S12L4 rpoAL29 S17L23 S19L2 L16 L24S3L22 L14 tufAL3 secYS8 L6L5 L36 L31S13L18 S5 L13S11 S7S9 S12

tufA

Cryptophytes

Red algae

Heterokonts*

S10

S12L4 S17S19L2 L16S3 L14 tufAsecY

S8

L6L5 S13 S11 S7S12

S10

S10

S10

S13

S12L4 S17S19L2 L16S3 L14 S8

S8

S8

L6 L36S5 S11 S7S12

S12L4 S19

S19

L2 L16S3 L14 tufAL6L5

L5

L36

L36

S5 S11 S7

S7

S12S8

S12L4L4 L2 L16S3 L14 tufAL6 S5 S11 S12S8

tufAToxoplasma

S12L4 S17

S17

S19L2 L16S3 L14 L36S5 S11 S7S12 tufAEimeria

L13

Theileria

Babesia

L6

position of fusion

S10 + spc + alpha operon str operon

ORF

ORF

ORFORF ORF

ORF

Fig. S4. The plastid ribosomal superoperon gives evidence for the red algal origin of alveolate plastids. The superoperon originated by fusion of S10+spc+alpha operon cluster and str operon (Top). Genes in the superoperon are transcribed in left to right order and solid horizontal lines connect neighboringgenes (L = rpl and S = rps ribosomal protein genes). Diagonal lines show transposition of rpl31 in CCMP3155 and C. velia (solid) and additional two possibletranspositions in the ancestor of alveolates (dotted). The white type of the cryptophyte and haptophyte rpl36 gene indicates it was acquired by horizontalgene replacement from a noncyanobacterial donor. The asterisk denotes further modifications of the superoperon in heterokont algae: the presence of ycf88between rps19 and rpl22 in diatoms Odontella sinensis, Phaeodactylum tricornutum and Thalassiosira pseudonana and loss of rpl4, rpl29, and rpl18 in pe-lagophytes Aureoumbra lagunensis and Aureococcus anophagefferens. Red algae: Porphyra purpurea, Porphyra yezoensis, Gracilaria tenuistipitata, Cyani-dioschyzon merolae, Cyanidium caldarium; Cryptophytes: Guillardia theta, Rhodomonas salina; Haptophyte: Emiliania huxleyi; Heterokonts: Vaucheria litorea,Heterosigma akashiwo, T. pseudonana, P. tricornutum, O. sinensis, A. lagunensis, A. anophagefferens, Fucus vesiculosus and Ectocarpus siliculosus.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 4 of 7

Page 5: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

ribosomal superoperon

ribosomal superoperon

rps4 rpl4

rpl4

rpl4rps4

atpB

9.7 kb, mostly photosystem genes

rps4

ribosomal superoperon

H)

GU

G(

C)

AC

G(

L)

AA

U(

H)

GU

G(

C)

AC

G(C

)A

CG(

E)

CU

U(

F)

AA

G(

Q)

GU

U(

L)

AA

C(

F)

AA

G(

Q)

GU

U(

P)

GG

U(

E)

CU

U( P)

GG

U( F)

AA

G(

Q)

GU

U(

E)

CU

U( P)

GG

U(

L)

AA

U(L

)A

AU(H

)G

UG(

S)

UC

G(

D)

CU

G(

S)

UC

G(

D)

CU

G(

Y)

AU

G(

Y)

AU

G(

I)

UA

C(M)

UA

C(

M)

UA

C(

D)

CU

G(

S)

UC

G(Y)

AU

G(

M)

UA

C (

K)

UU

U(K

)U

UU(

*

*

** ** *

*

*

CCMP3155

CCMP3155

Plasmodium

ToxoplasmaEimeria

CCMP3155

ToxoplasmaEimeria

G)

CCU(

S)

AG

U(

Plasmodium

clpCpsaCrpl11 chlB psbC petA ORF

ORF

ORF

petD

psaD

ORF

ORF

rps16

G)

C CU(

S)

AG

U(clpC

G)

CCU(

S)

AG

U(clpC

rpl11

rpl11

*

L)

GA

U(R)

UC

U(

V)

CA

U (

L)

GA

U(R)

UC

U(

V)

CA

U(

ORF ORF

L)

GA

U(R)

UC

U(

V)

CA

U(

genes:

homologous genes unique to CCMP3155 and apicomplexans

homologous gene elsewhere in the CCMP3155 plastid genome

non-homologous genes

genes lost in apicomplexans

gene order:

conserved in other plastid genomes

putatively homologous

Fig. S5. Conserved gene order in alveolate plastid genomes. The plastid genomes of CCMP3155 and apicomplexans share several uniquely organized geneclusters (see the legend) not found in other plastid genomes. Genes lost in apicomplexans are mostly connected to photosynthetic function in the plastid ofCCMP3155. ORFs in apicomplexans have no homology to other plastid genes.

TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTAAAAAAATGTTATTAA TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAACTTTTTTTTT TTGGGAGTAGGTTTTCACATAGGTTAGACGTGTAAACTTTTCCCATCGTTTTTTATAACAGATATTTTAATTTTTTTTTT

DNA

cDNA-4cDNA-3cDNA-2cDNA-1

TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTACGTTTAAGTAACACTGCC TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTAATTTTTTTTTTT TCGTATTATTAGTGTTAGGAGCTGTAAAAAGGCCCCAATCGACCTAAAGATTAATGCAAAACAACCATTTTTTTTTT

DNAcDNA-1cDNA-2

psbC mRNA 3’UTR

psbB mRNA 3’UTR

DNAcDNA-1 GCACAAGTAGTGCCGTCTTATAGAAGACCTTTTTTTGTTTAAGGGCTTGTCTTCTTAGGCCCTCATTCCTTTTTTTTTTTT

GCACAAGTAGTGCCGTCTTATAGAAGACCTTTTTTTGTTTAAGGGCTTGTCTTCTTAGGCCCTCATTCCATTTGAGCATTTAGTTTA

psaA mRNA 3’UTR

Fig. S6. Polyuridylylation of plastid transcripts in C. velia. Alignment of genomic DNA sequences of three plastid genes, psbC, psbB, and psaA with corre-sponding cDNA sequences. All cDNA clones are terminated with thymidine stretches (in bold) that are absent in genomic DNA suggesting presence of transcriptpolyuridylylation in the plastid. Underlined thymidines may correspond to the 5′UTRs of the circularized transcripts. Presence of polyU tails in psbB and psbCtranscripts was validated by sequencing three clones from 3′RACE products obtained with oligo-dA and gene specific primers.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 5 of 7

Page 6: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

BA

Prochlorococcus marinus MIT 9301

Synechococcus sp. CC9311

Mesostigma viride

Nephroselmis olivacea

Arabidopsis thaliana

Chlamydomonas reinhardtii

-/92/1/1

Cyanophora paradoxa

Gracilaria tenuistipitata

Porphyra purpurea

Emiliania huxlei

Guillardia theta

Rhodomonas salina

87/100/1/1

Cyanidiosschyzon merolae

Cyanidium caldarium

Phaeodactylum tricornutum

Thalassiosira pseudonana

Odontella sinensis

Heterosigma akashiwo

Vaucheria litorea

Eimeria tenella

Plasmodium falciparum

Toxoplasma gondii100/99/1/1

98/100/1/0.96

-/97/1/-

-/98/1/1

-/98/1/1

0.5

HETEROKONTS

RED ALGAE

HACROBIANS

RED ALGAE

GREEN ALGAE AND PLANT

APICOMPLEXANSGonyaulax polyedra

Heterocapsa triquetra

Amphidinium carterae

0.1

Prochlorococcus marinus MIT 9301

Synechococcus sp. CC9311

Mesostigma viride

Nephroselmis olivacea

Arabidopsis thaliana

Chlamydomonas reinhardtii

Cyanophora paradoxa

Gracilaria tenuistipitata

Porphyra purpurea

Emiliania huxlei

Guillardia theta

Rhodomonas salina

Cyanidioschyzon merolae

Cyanidium caldarium

Phaeodactylum tricornutum

Thalassiosira pseudonana

Odontella sinensis

Heterosigma akashiwo

Vaucheria litorea

HETEROKONTS

HACROBIANS

RED ALGAE

GREEN ALGAE AND PLANT

DINOFLAGELLATES

GLAUCOPHYTE

CYANOBACTERIA

GLAUCOPHYTE

CYANOBACTERIA

CCMP3155Chromera velia

CCMP3155Chromera velia

97/64/-/0.98

-/64/-/0.99

-/66/0.98/0.98

94/-/-/-

100/100/1/-99/98/1/1

98/100/1/1

64/90/-/1

66/-/-/-90/93/1/1

90/68/1/

62/-/-/-

90/77/1/1

-/-/1/-

-/-/1/-

Fig. S7. Concatenated plastid phylogenies support the common origin of alveolate plastids. (A) Analyses of 11 plastid genes retained in dinoflagellate plastidssupports themonophyly of alveolate sequences, their relationship toheterokonts, and themonophyly of all chromalveolate plastids. (B) Analysesof 23genes retainedin apicoplasts supports sequences ofCCMP3155andC. velia as their closest sister group.Maximum likelihood treeswere constructedusingCATmodel (A andB) displayPHYML-CAT/RAxML/MrBayes/Phylobayes branch supports; solid circles indicate 100/100/1/1 supports. Supports ≥60/≥50/≥0.98/≥0.98 are shown as significant.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 6 of 7

Page 7: Supporting Information - PNAS€¦ · Supporting Information ... sbA ’ P(AGG)’ ORF1’ ... r p o A r p l 1 3 r p l 1 2 r p s 7 t u f A

Cyanophora paradoxa

Nephroselmis olivacea

Chlamydomonas reinhardtii

Arabidopsis thaliana

Mesostigma viride

Phaeodactylum tricornutum

Thalassiosira pseudonana

Vaucheria litorea

Heterosigma akashiwo

Rhodomonas salina

Guillardia theta

Emiliania huxleyi

Porphyra purpurea

Gracilaria tenuistipitata

Cyanidium caldarium

Cyanidioschyzon merolae

Odontella sinensis

Synechococcus sp. CC9311

Prochlorococcus marinus MIT 9301

Aureoumbra lagunensis

Aureococcus anophagefferens

HETEROKONTS

RED ALGAE

HACROBIANS

GLAUCOPHYTE

GREEN ALGAEAND PLANTS

CYANOBACTERIA

CCMP3155

Chromera

Fucus vesiculosus

Ectocarpus siliculosus

0.2

1/99/1/1

1/92/1/1

1/97/1/1

0.98/80/1/-

1/94/1/1

0.99/78/1/1

1/-/-/1

1/81/-/1

1/74/1/1

A + H

RP

Fig. S8. Concatenatedplastid phylogenies of all 68genes found inCCMP3155plastid (excluding rpl36). The treedisplays complete support for groupingalveolate(represented by CCMP3155) and heterokont plastids (A + H, arrow) and red algal plastids (RP, arrow). Dotted branch indicates the placement of C. velia sequence,which received complete support in all analyses. TheMaximum likelihood tree constructed using CpREV+Gamma+Fmodel is displaying PhyML-CAT(aLRT)/RAxML/MrBayes/ PhyloBayes supports over branches; solid circles indicate 1/100/1/1 supports. Only ≥0.98/≥60/≥0.98/≥0.98 supports are shown as significant.

Janouškovec et al. www.pnas.org/cgi/content/short/1003335107 7 of 7