Upload
dinhdung
View
230
Download
0
Embed Size (px)
Citation preview
ORIGINAL ARTICLE
Genome characteristics of a generalist marinebacterial lineage
Ryan J Newton1, Laura E Griffin1, Kathy M Bowles1, Christof Meile1, Scott Gifford1,Carrie E Givens1, Erinn C Howard1, Eric King1, Clinton A Oakley2, Chris R Reisch3,Johanna M Rinta-Kanto1, Shalabh Sharma1, Shulei Sun1, Vanessa Varaljay3,Maria Vila-Costa1,4, Jason R Westrich5 and Mary Ann Moran1
1Department of Marine Sciences, University of Georgia, Athens, GA, USA; 2Department of Plant Biology,University of Georgia, Athens, GA, USA; 3Department of Microbiology, University of Georgia, Athens, GA,USA; 4Group of Limnology-Department of Continental Ecology, Centre d’Estudis Avancats de Blanes-CSIS,Catalunya, Spain and 5Odum School of Ecology, University of Georgia, Athens, GA, USA
Members of the marine Roseobacter lineage have been characterized as ecological generalists,suggesting that there will be challenges in assigning well-delineated ecological roles andbiogeochemical functions to the taxon. To address this issue, genome sequences of 32 Roseobacterisolates were analyzed for patterns in genome characteristics, gene inventory, and individual gene/pathway distribution using three predictive frameworks: phylogenetic relatedness, lifestyle strategyand environmental origin of the isolate. For the first framework, a phylogeny containing five deeplybranching clades was obtained from a concatenation of 70 conserved single-copy genes. Somewhatsurprisingly, phylogenetic tree topology was not the best model for organizing genomecharacteristics or distribution patterns of individual genes/pathways, although it provided somepredictive power. The lifestyle framework, established by grouping isolates according to evidencefor heterotrophy, photoheterotrophy or autotrophy, explained more of the gene repertoire in thislineage. The environment framework had a weak predictive power for the overall genome content ofeach strain, but explained the distribution of several individual genes/pathways, including thoserelated to phosphorus acquisition, chemotaxis and aromatic compound degradation. Unassembledsequences in the Global Ocean Sampling metagenomic data independently verified this global-scalegeographical signal in some Roseobacter genes. The primary findings emerging from thiscomparative genome analysis are that members of the lineage cannot be easily collapsed intojust a few ecologically differentiated clusters (that is, there are almost as many clusters as isolates);the strongest framework for predicting genome content is trophic strategy, but no single frameworkgives robust predictions; and previously unknown homologs to genes for H2 oxidation,proteorhodopsin-based phototrophy, xanthorhodpsin-based phototrophy, and CO2 fixation by FormIC ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) expand the possible mechanismsfor energy and carbon acquisition in this remarkably versatile bacterial lineage.The ISME Journal (2010) 4, 784–798; doi:10.1038/ismej.2009.150; published online 14 January 2010Subject Category: integrated genomics and post-genomics approaches in microbial ecologyKeywords: genome; marine; phylogeny; Roseobacter
Introduction
Since the discovery of their abundance in marinebacterioplankton communities two decades ago(Gonzalez and Moran, 1997), members of the marineRoseobacter lineage have emerged as importantmodel organisms for marine microbial ecology. Thegroup spans multiple described genera (at least 45),encompasses a comparatively large sequence varia-tion among 16S rRNA genes (up to 11%) and has a
poorly resolved within-taxon phylogeny (Buchanet al., 2005; Wagner-Dobler and Biebl, 2006; Brinkhoffet al., 2008). The recent availability of genomesequences (currently, 5 closed and 27 draft) fromcultured members of the Roseobacter lineage pro-vides a detailed inventory of the metabolic andecological capabilities of each strain (albeit limitedby the accuracy of annotation), a basis for compara-tive analyses among strains, and a means to examinepredictive frameworks for the lineage.
Genome sequences of other ocean microbes havebeen used to explore niches and resource partition-ing within taxa. Multiple genome sequences androbust phylogenies for Prochlorococcus have re-vealed that the distribution of ecologically impor-tant gene systems (for example, light harvesting
Received 5 October 2009; revised 30 November 2009; accepted 2December 2009; published online 14 January 2010
Correspondence: MA Moran, Department of Marine Sciences,University of Georgia, Marine Science Building, Athens, GA30602-3636, USA.E-mail: [email protected]
The ISME Journal (2010) 4, 784–798& 2010 International Society for Microbial Ecology All rights reserved 1751-7362/10 $32.00
www.nature.com/ismej
(Garczarek et al., 2000; Hess et al., 2001; Bibby et al.,2003) and DNA repair mechanisms (Scanlan et al.,2009)) strongly correlate with the phylogeneticstructure of this genus (Rocap et al., 2003; Colemanand Chisholm, 2007). In addition, genome contentdifferences among both Prochlorococcus and Syne-chococcus strains have been linked to variations inthe environments from which the strains wereisolated (West and Scanlan, 1999; Johnson et al.,2006; Martiny et al., 2006; Palenik et al., 2006;Dufresne et al., 2008). For members of the Vibriona-ceae, small-scale differences in environmentalconditions based on microenvironment and seasonhave been shown to drive lineage adaptation (Huntet al., 2008) and presumably genome content. Thephylogenetic- and environment-based frameworksused to interpret data from studies such as thesehave facilitated the development of predictivecommunity structure models for marine microbes(Follows et al., 2007; Rabouille et al., 2007).
Members of the Roseobacter lineage have beencharacterized as ecological generalists (Moran et al.,2004, 2007; Polz et al., 2006). Although the firstcultured roseobacters were aerobic anoxygenicphototrophs (AAnPs) (Shiba et al., 1979), numerousheterotrophic strains have since been found(Blankenship et al., 1995; Shimada, 1995; Buchanet al., 2005). Cultured roseobacters have a surpris-ingly flexible suite of mechanisms for energy andcarbon acquisition, including carbon monoxide andhydrogen sulfide oxidation (King, 2003; Moranet al., 2004), and anaplerotic CO2 fixation (Sorokinet al., 2003; Moran et al., 2004; Swingley et al.,2007). Together with the observation that thislineage’s genomes are large and variable, a pictureof considerable trophic versatility among roseobac-ters has emerged (Buchan et al., 2005; Wagner-Dobler and Biebl, 2006; Brinkhoff et al., 2008).
The 32 Roseobacter genomes provide an unprece-dented opportunity to examine the scope of extantgene systems and to explore various ecological andevolutionary perspectives that might distinguishfunctionally differentiated clusters within this line-age. In this study, we consider three theoreticalecological/evolutionary frameworks as possible pre-dictors of the gene repertoires of the 32 Roseobacterstrains. As a robust Roseobacter phylogeny has yetto emerge from rRNA gene analysis (Buchan et al.,2005; Brinkhoff et al., 2008), we first develop a well-supported phylogeny for the lineage from aconcatenation of conserved, single-copy genes andwithin this phylogenetic structure we examine theevolutionary relationships as possible constraintson genome content and predictors of the geneticcapabilities of each strain. Next, we explore lifestylestrategy (heterotroph, photoheterotroph or auto-troph) as a possible driver of genome attributes (thatis, imposing or releasing bacteria from constraintson genome content). Finally, we ask whetherenvironmental conditions (defined here by thegeographical location of isolation) might best
explain the observed differences in genetic traits andthe retention or acquisition of specific gene systems.
Materials and methods
Genome sequencing, annotation and completenessindexThe 32 Roseobacter genomes publicly available asof 15 August 2008 were used in analyses (see Table 1for genome details). Sequencing and annotationmethods for Ruegeria pomeroyi DSS-3 (Moranet al., 2004; formerly Silicibacter), Ruegeria sp.TM1040, Jannaschia sp. strain CCS1 (Moran et al.,2007), Roseobacter denitrificans OCh 114 (Swingleyet al., 2007) and Dinoroseobacter shibae DFL12(Wagner-Dobler et al., 2009) are described else-where. The remaining genomes (Table 1) weresequenced and auto-annotated by the J Craig VenterInstitute as part of the Moore Foundation MicrobialGenome Sequencing Project (see http://moore.jcvi.org/moore/ for details).
In all, 5 of the 32 Roseobacter genomes have beenassigned closed genome status, and we used thesegenomes as the basis for our genome completenessindex. The protein sequences of 143 universalsingle-copy bacterial genes (Santos and Ochman,2004; Santos, personal communication) were usedin a BLASTp query against the five closed Roseo-bacter genomes; manual gene-calling based onalignment score, E-value, and contextual analysiswas used to determine the presence or absence ofgenes in the five closed genomes. Of these 143genes, 111 were determined to be unambiguouslypresent in all five genomes (see SupplementaryTable S1). The protein sequences of these 111 geneswere then used in BLASTp analysis against theremaining 27 genomes. The percent presence ofthese 111 genes in a single genome constituted thatgenome’s completeness index (Table 1).
Phylogenetic tree inferenceOut of 111 universal single-copy genes identified inthe 32 Roseobacter genomes, only genes that werecompletely sequenced in all genomes and had noambiguous start/stop sites were used in phylo-genetic analyses. These 70 genes (SupplementaryTable S1) were concatenated and aligned withClustalW in Geneious 4.0 (available from http://www.geneious.com) using Escherichia coli K12substrain MG1655 as the outgroup. The alignmentwas imported into ARB (Ludwig et al., 2004), whereit was heuristically adjusted, and a filter was createdto remove all positions containing gaps in thealignment. The resultant alignment of 25 316 posi-tions was used in subsequent phylogenetic recon-struction analyses in ARB (neighbor joining withpoint accepted mutation substitution matrix and100 bootstrap runs) and in RAxML (Stamatakiset al., 2008) at CIPRES (http://www.phylo.org;maximum likelihood analysis with 200 bootstrap
Comparison of 32 Roseobacter genomesRJ Newton et al
785
The ISME Journal
Table
1R
ose
obacte
rgen
om
ech
ara
cte
rist
ics
Org
an
ism
Cla
de
aIs
ola
tion
sou
rce
(cate
gory
)P
hoto
trop
hy
gen
esb
Gen
om
esi
ze
(Mb)
rRN
Aop
ero
ns
(16S
/23S
)c
No.
of
con
tigs
Gen
om
ecom
ple
ten
ess
(%)d
G+
Ccon
ten
t(%
)
Clo
sed
gen
om
eD
inoro
seobacte
rsh
ibae
DF
L12
5P
roro
cen
tru
mli
ma,
Bay
of
Tokyo
(E)
AA
nP
4.3
52/2
6100
65
Jan
nasc
hia
sp.
CC
S1
5B
od
ega
Head
,U
SA
surf
ace
wate
r(P
)A
An
P4.4
01/1
2100
62
Rose
obacte
rd
en
itri
fican
sO
Ch
114
2E
nte
rom
orp
ha
lin
za,
Au
stra
lia
(E)
AA
nP
4.1
31/1
5100
58
Ru
eger
iap
om
ero
yi
DS
S-3
1C
oast
al
Georg
ia,
US
Asu
rface
wate
r(A
)N
on
e4.6
03/3
2100
64
Ru
eger
iasp
.T
M1040
1P
fiest
eri
ap
iscic
da,
Ch
esa
peake
Bay
(E)
Non
e4.1
55/5
3100
60
Dra
ftgen
om
eLokta
nell
avest
fold
en
sis
SK
A53
4N
ort
hA
tlan
tic
surf
ace
wate
r(A
)A
An
P3.0
61-1
p/1
14
99
65
Mari
tim
ibacte
ralk
ali
ph
ilu
sH
TC
C2654
Non
eS
arg
ass
oS
ea
10
mw
ate
r(A
)N
on
e4.5
31/1
46
99
64
Pela
gib
aca
berm
ud
en
sis
HT
CC
2601
3S
arg
ass
oS
ea
10
mw
ate
r(A
)R
uB
isC
O5.4
34/4
103
98
66
Ocean
ibu
lbu
sin
doli
fex
HE
L-4
52
Nort
hS
ea
10
mw
ate
r(A
)N
on
e4.1
13/3
105
100
59
Ocean
icola
bats
en
sis
HT
CC
2597
3S
arg
ass
oS
ea
10
mw
ate
r(A
)N
on
e4.4
41/1
23
99
66
Ocean
icola
gra
nu
losu
sH
TC
C2516
4S
arg
ass
oS
ea
10
mw
ate
r(A
)N
on
e4.0
44/4
85
100
70
Octa
decabacte
ran
tarc
ticu
s307
4M
cM
urd
oS
ou
nd
(H)
Xan
thorh
od
op
sin
4.8
92/2
58
100
54
Octa
decabacte
rarc
ticu
s238
4O
ffsh
ore
Dead
hors
e,
Ala
ska
(H)
Xan
thorh
od
op
sin
5.3
92/2
80
96
55
Ph
aeobacte
rgall
aecie
nsi
s2.1
01
Ulv
ala
ctu
ca,
Au
stra
lian
wate
rs(E
)N
on
e4.1
64/4
33
100
59
Ph
aeobacte
rgall
aecie
nsi
sB
S107
1P
ecte
nm
axim
us,
Sp
an
ish
wate
rs(E
)N
on
e4.2
34/5
24
100
59
Rh
od
obacte
rale
sbact
eri
um
HT
CC
2083
2C
oast
al
Ore
gon
10
mw
ate
r(P
)A
An
P4.0
22/2
20
99
53
Rh
od
obacte
rale
sbact
eri
um
HT
CC
2150
Non
eC
oast
al
Ore
gon
,su
rface
wate
r(P
)N
on
e3.5
82/2
25
98
49
Rh
od
obacte
rale
sbact
eri
um
HT
CC
2255
Non
eC
oast
al
Ore
gon
,10
mw
ate
r(P
)P
rote
orh
od
op
sin
4.8
10-1
p/0
70
96
38
Rh
od
obacte
rale
sbact
eri
um
Y4I
1C
oast
al
Georg
ia,
US
Aw
ate
r(A
)N
on
e4.3
34/4
63
99
64
Rose
obacte
rli
tora
lis
Och
149
2S
eaw
eed
(E)
AA
nP
4.6
81/1
27
99
57
Rose
obacte
rsp
.A
zwK
-3b
3E
stu
ary
Mon
tere
yB
ayw
ate
r(P
)A
An
P4.1
82/2
31
100
61
Rose
obacte
rsp
.C
CS
24
Bod
ega
Head
,U
SA
surf
ace
wate
r(P
)A
An
P3.5
01/1
11
99
55
Rose
obacte
rsp
.G
AI1
01
2C
oast
al
Georg
ia,
US
Aw
ate
r(A
)N
on
e4.2
54/4
67
99
58
Rose
obacte
rsp
.M
ED
193
1N
WM
ed
iterr
an
ean
1m
wate
r(A
)N
on
e4.6
51-1
p/1
-4p
19
100
57
Rose
obacte
rsp
.S
K209-2
-61
Ara
bia
nS
ea
O2-m
in.,
267
mw
ate
r(I
)N
on
e4.5
65/5
29
100
57
Rose
ovari
us
nu
bin
hib
en
sIS
M3
Cari
bbean
Sea
surf
ace
wate
r(A
)N
on
e3.6
72/2
10
100
63
Rose
ovari
us
sp.
217
3C
oast
al
En
gla
nd
surf
ace
wate
r(A
)A
An
P4.7
61-1
p/1
-1p
37
100
60
Rose
ovari
us
sp.
TM
1035
3P
fiest
eri
ap
iscic
da,
Ch
esa
peake
Bay
(E)
AA
nP
4.2
13/3
15
100
60
Ru
eger
iasp
.R
11
1D
eli
sea
pu
lch
rap
ath
ogen
,A
ust
rali
a(E
)N
on
e3.8
24/4
17
98
60
Sagit
tula
stell
ata
E-3
73
Coast
al
Georg
ia,
US
Aw
ate
r(A
)N
on
e5.2
62/2
39
98
65
Su
lfit
obacte
rN
AS
-14.1
2C
oast
al
Georg
ia,
US
Asu
rface
wate
r(A
)N
on
e4.0
04/4
27
100
60
Su
lfit
obacte
rsp
.E
E-3
62
Nort
hA
tlan
tic
surf
ace
wate
r(A
)N
on
e3.5
44/4
15
100
60
Abbre
via
tion
s:A
,A
tlan
tic
Ocean
;E
,E
ukary
ote
ass
ocia
ted
;H
,p
ola
rocean
(hig
hla
titu
de;
Arc
tic
or
An
tarc
tic);
I,In
dia
nO
cean
;P,
Pacif
icO
cean
.aD
efi
ned
by
the
ph
ylo
gen
eti
ctr
ee
show
nin
Fig
ure
1.
bG
en
om
es
were
defi
ned
as
aero
bic
an
oxygen
icp
hoto
trop
h(A
An
P)
ifth
egen
om
econ
tain
ed
the
pu
fop
ero
nan
dbacte
rioch
loro
ph
yll
a;as
non
eif
the
gen
om
econ
tain
ed
no
ligh
t-h
arv
est
ing
gen
es;
as
xan
tho/p
rote
orh
od
op
sin
ifth
egen
om
econ
tain
ed
the
specif
icrh
od
op
sin
gen
e;
an
das
Ru
Bis
CO
ifth
egen
om
econ
tain
ed
aR
uB
isC
Oh
om
olo
gan
dh
om
olo
gs
toth
eC
alv
in–B
en
son
–B
ass
ham
cycle
.cF
or
exam
ple
,1-1
pin
dic
ate
sth
at
this
org
an
ism
has
on
efu
lly
sequ
en
ced
rRN
Agen
ean
don
ep
art
iall
yse
qu
en
ced
rRN
Agen
e.
dG
en
om
ecom
ple
ten
ess
was
defi
ned
by
exam
inin
g111
un
ivers
al
gen
es
fou
nd
inall
clo
sed
Rose
obacte
rgen
om
es
(see
Mate
rials
an
dm
eth
od
s).
Comparison of 32 Roseobacter genomesRJ Newton et al
786
The ISME Journal
runs and Jones–Taylor–Thornton (JTT) substitutionmodel). The best-fit maximum likelihood tree isreported along with bootstrap values from eachphylogenetic inference method.
Identification of orthologs and ecologically relevantgenesOrthologs among the 32 genomes were identifiedby sequential two-way reciprocal best-hit (RBH)analysis, beginning with the R. pomeroyi andRhodobacterales HTCC2255 genome comparisonand continuing by adding each of the remaining30 genomes one at a time. The RBH Basic LocalAlignment Search Tool (BLAST) thresholds wereset at E-value o10�5 and amino acid identity430%. The RBH results were subsequently com-piled into a single matrix containing the distributionof all shared genes and used for the genome contentcomparisons described below (See SupplementaryTable S2 for matrix). This relaxed ortholog defini-tion was used because an all-way RBH requirementwas unworkable for the large number of genomes,each containing gene families represented by multi-ple members. We tested whether the order of thesequential best-hit analysis resulted in substantialchanges in the ortholog matrix or the outcome of theanalyses (that is, by using a different order of addinggenomes in the pair-wise RBH), and found it did not.
In addition to whole-genome ortholog identifica-tion, a select group of ecologically relevant genes/genepathways was also identified using representativeprotein sequences of the target genes from a Roseo-bacter for which the gene functions had beenexperimentally verified. If no Roseobacter met thiscriterion, then a protein sequence was obtainedfrom the closest Roseobacter relative containing thedesired experimentally verified gene. All query proteinsequences were used in BLASTp analysis against theRoseobacter genome database (http://www.roseobase.org). BLAST E-values, gene neighborhoods andclusters of orthologous group assignments (Tatusovet al., 2003) were manually examined and used todetermine the presence or absence of these genesand pathways in each of the 32 genomes.
Classification schemesThe 32 isolate genomes were sorted into groupswithin each of the three frameworks. First, fivedeeply branching nodes in our phylogenetic infer-ence best-fit tree were chosen to distinguish isolategroups based on shared ancestry and were desig-nated Clades 1–5. Next, we categorized isolates intolifestyles based on their trophic status: hetero-trophic, photoheterotrophic (that is, heterotrophicbut likely subsidized by aerobic anoxygenic photo-trophy or rhodopsin-based phototrophy) or auto-trophic. Organisms were considered AAnPs basedon the presence of the puf operon and genes forthe synthesis of bacteriochlorophyll a; they were
considered rhodopsin-supplemented photohetero-trophs based on the presence of gene orthologs forproteorhodopsin or xanthorhodopsin; and they wereconsidered autotrophs based on the presence ofRuBisCO and the Calvin–Benson–Bassham path-way. Although these designations were made fromdraft genome sequences for many strains, the highgenome completeness index suggests they arelargely correct. Finally, isolates were classified intoone of five broad environmental categories based onthe source of isolation: Pacific Ocean, AtlanticOcean, Indian Ocean, polar oceans or eukaryote-associated (Table 1).
Identification of genes in the Global Ocean SamplingGlobal Ocean Sampling (GOS) sample sites, sam-pling procedures and sequencing methods aredescribed elsewhere (Rusch et al., 2007; Yoosephet al., 2007). A subset of Roseobacter proteinsequences representing each of the major biogeo-chemical pathways and processes that we examined(Supplementary Table S3) was used in a BLASTpquery against the unassembled GOS data set at theCommunity Cyberinfrastructure for AdvancedMarine Microbial Ecology Research and Analysis(CAMERA) (Seshadri et al., 2007). Gene hits fromthe GOS samples were retained for further analysisas potential orthologs at E-value cutoffs rangingbetween 10�80 and 10�20, depending on the gene.Paired reads were then removed before the resultantmatches were used in BLAST analysis against theAll Prokaryotic Proteins (P) database. Only genematches that had a best hit to a gene in a Roseobactergenome were retained for further analysis. Finally,protein sequences from the Roseobacter-like GOSmatches underwent a BLASTp query at GenBank,and were eliminated if their top alignment scoreswere to proteins with a different annotated functionthan the original query protein.
To compare gene counts between oceans, theRoseobacter-like metagenomic sequences obtainedfrom the GOS data underwent several normaliza-tions. Counts for functional genes retrieved at eachsample location were size-normalized to the lengthof the recA gene from E. coli K12 substrain MG1655to account for effects of size on the probability ofsampling (Howard et al., 2008). The number ofRoseobacter genome equivalents for each samplelocation was then calculated by averaging size-normalized Roseobacter-like gene counts of theuniversal single-copy genes recA and rpoB. Toestimate per-cell frequency for each examinedRoseobacter gene (listed in Supplementary TableS3), the sample gene counts were summed by oceanbasin (Atlantic, Pacific and Indian) and divided bythe number of Roseobacter genome equivalents forthat basin. Only coastal and open ocean GOS samplesites were considered, with estuaries, embayments,lagoon reefs, fringing reefs, freshwater, mangroves,coral reefs, hypersaline lagoons, warm seeps and
Comparison of 32 Roseobacter genomesRJ Newton et al
787
The ISME Journal
harbors excluded from the analyses (SupplementaryTable S4).
The clade distribution among ocean basins wasdetermined using the recA gene sequence. Afterretrieving Roseobacter-like RecA sequences byBLASTp analysis against the GOS database atCAMERA (as described above), each individualprotein sequence was used as a query sequence ina subsequent BLASTp analysis against all 32genomes at Roseobase (http://www.roseobase.org).The Roseobacter-like RecA sequences from GOSwere then assigned to a clade according to their bestmatch among the 32 genomes, and the occurrencesof each clade were summed across samples in eachocean basin.
Statistical analysesPatterns of ortholog distribution among the 32genomes were evaluated using the Bray–CurtisIndex of Similarity (Legendre and Legendre, 1998).The 32 genomes contained a total of 31 874orthologs. Similarities between genomes includeall orthologs in this matrix, so that both the sharedpresence and shared absence of a gene are taken intoaccount in the similarity calculation. This similaritymatrix was used to create a hierarchical clusteringdendrogram based on complete linkage grouping(that is, furthest neighbor analysis). An analysis ofsimilarity (ANOSIM) was used to test for significantdifferences among a priori assigned genome groupsbased on phylogenetic clade, trophic strategy orgeographical isolation location. The multivariateanalyses were performed using the statistical pack-age PRIMER 5 for Windows v. 5.2.7.
The average nucleotide identity was obtained fortwo Roseobacter strain comparisons, Phaeobactergallaeciensis 2.10 with P. gallaeciensis BS107 andD. shibae DFL12 with P. gallaeciensis 2.10 accordingto the method described by Goris et al. (2007). Thesecomparisons were chosen to bracket the amount ofsequence heterogeneity observed and to providecontext for our ortholog similarity comparisons.
Significance of gene distributions between anytwo assigned groups (for example, between twoclades, between two trophic strategies or betweentwo ocean basins) was assessed with a binomialdistribution d-score test (Markowitz et al., 2008).
Results and discussion
Because many of the 32 Roseobacter genomes are indraft status, we developed a completeness indexbased on the presence or absence of 111 universalsingle-copy genes. The lowest genome completenessindex obtained was 96% (for Octadecabacter arcticus238 and Rhodobacterales bacterium HTCC2255; 107out of 111 presumed universal genes were repre-sented), and 18 of 32 genomes had a completenessindex of 100% (Table 1). We therefore considered all
draft genomes to be good representations of theseorganisms’ gene content.
Phylogenetic inference and clade distributionPrevious phylogenetic reconstructions of the Roseo-bacter lineage using 16S rRNA gene relationshipshave led to the identification of subgroups withinthe lineage (Buchan et al., 2005; Brinkhoff et al.,2008). However, many of the nodes, especially thosedistinguishing deep branching points in thesephylogenies, do not have statistical support, andtherefore do not provide clear phylogenetic relation-ships for the members of this lineage. We tookadvantage of the genome sequence data to constructan alignment from the concatenation of 70 con-served single-copy genes; this alignment was sub-sequently used in phylogenetic tree inference (seeSupplementary Table S1 for gene list and Supple-mentary Figure S1 for concatenated gene, 16S rRNAgene and 23S rRNA gene tree comparisons). Theresultant tree topology suggested there are fivedeeply branching clades within the Roseobacterlineage (Figure 1). Three of the presumed roseo-bacters, Maritimibacter alkaliphilus HTCC2654,Rhodobacterales HTCC2150 and RhodobacteralesHTCC2255, fell outside these clades. Most memberswithin a single genus clustered together on the tree,although the placement of two members of thegenus Oceanicola into different clades suggeststhat a taxonomic reclassification may be neededfor some isolates.
Buchan et al. (2005) identified 13 major sequenceclusters based on 16S rRNA gene sequences withinthe Roseobacter lineage. Twelve of the 16S rRNA-based clusters can be mapped onto our 70-genephylogeny (data not shown). Clade 1 contains the16S rRNA gene sequence clusters RGALL, RATLand TM1040. Clade 2 contains sequence clustersANT9093, OBULB, SPON and AS-21. Clade 3contains sequence cluster CHAB-I-5. Clade 4 con-tains sequence clusters AS-26, DG1128, DC5-80-3(RCA cluster) and OCT. Clade 5 contains nopreviously identified sequence clusters, and clusterNAC11-7 is not covered by any of the clades inour study.
Two of the most abundant Roseobacter 16S rRNAgene sequence clusters recovered from marinehabitats do not have closely associated sequencedgenomes (Buchan et al., 2005), and thus are notincluded in the 70-gene phylogenetic tree. The first,the DC5-80-3 or RCA cluster, has often beenobserved as the most abundant Roseobacter groupin polar and temperate oceans (Brinkhoff et al.,2008). 16S rRNA genes from RCA distantly groupwith those from genomes in Clade 4 (Figure 1), aclade that harbors all the sequenced polar Roseo-bacter isolates thus far. A second abundant marinesequence cluster, NAC11-7, is frequently the domi-nant Roseobacter taxon found during phytoplanktonblooms (Buchan et al., 2005; West et al., 2008). 16S
Comparison of 32 Roseobacter genomesRJ Newton et al
788
The ISME Journal
rRNA genes from the NAC11-7 group did not clusterwith those of any clade, and were most related tothat of Rhodobacterales HTCC2255 (data notshown), which also fell outside the five cladesestablished by the 70-gene phylogeny (Figure 1).
Roseobacter-like recA genes, a robust marker forbacterial phylogeny (Eisen, 1995), were obtainedfrom the GOS data set by BLASTp analysis (seeMaterials and methods) to ascertain which isolategenomes are most representative of wild roseo-bacters in surface ocean water. When the set ofRoseobacter-like RecA GOS sequences was used ina best-match BLASTp query against the 32 genomes,
hits to all five clades were found throughout themajor ocean habitats surveyed (Figure 2). In general,the distribution of clades is not remarkably differentbetween the Atlantic and Pacific, or Indian oceans(Figure 2). A large percentage of the RoseobacterRecA sequences from the GOS appear most closelyrelated to one of the three singleton genomes (that is,not belonging to one of the five defined clades). Thisfinding, along with the lack of genomic data for theRCA and NAC11-7 sequence clusters, suggests thatrepresentation of oceanic Roseobacter genomescould be improved with additional genomesequences.
Maritimibacter alkaliphilus HTCC2654
Roseobacter sp. AzwK-3b
Roseovarius nubinhibins ISM
Pelagibaca bermudensis HTCC2601
Loktanella vestfoldensis SKA53
Roseobacter sp. CCS2
Octadecabacter antarcticus 307
Octadecabacter arcticus 238
Oceanicola granulosus HTCC2516
Dinoroseobacter shibae DFL12
Jannaschia sp. CCS1
Rhodobacterales bacterium HTCC2150
Rhodobacterales bacterium HTCC2255
Escherichia coli K12
0.1
100 (100)
100 (100)
100(100)
80 (70)
100 (100)
86 (100)
100 (100)
100 (100)100 (100)
100 (100)
100 (100)
100 (100)
87 (67)
100 (86)
100 (100)100 (100)
100 (100)
100 (100)
100 (92)
100 (60)
100 (77)
100 (100)
100 (100)
100 (100)
100 (100)
100 (84)
77 (70)
100 (*)
100(100)100 (100)
Phaeobacter gallaeciensis 2.10
Phaeobacter gallaeciensis BS107
Ruegeria sp. R11
Roseobacter sp. MED193
Roseobacter sp. SK209-2-6
Rhodobacterales bacterium Y4I
Ruegeria sp. TM1040
Ruegeria pomeroyi DSS-3
Sulfitobacter sp. NAS-14.1
Sulfitobacter sp. EE-36
Roseobacter sp. GAI101
Oceanibulbus indolifex HEL-45
Roseobacter denitrificans OCh 114
Roseobacter litoralis Och 149Rhodobacterales bacterium HTCC2083
Roseovarius sp. TM1035
Roseovarius sp. 217
Sagittula stellata E-37
Oceanicola batsensis HTCC2597
Clade 1
Clade 2
Clade 3
Clade 4
Clade 5
Figure 1 A consensus maximum likelihood tree of the 32 sequenced Roseobacter genomes. The alignment for tree inference was createdfrom a concatenation of 70 universal single-copy genes contained in each of the Roseobacter genomes and in E. coli K12, which was usedas an outgroup. Bootstrap values of 450% for the maximum likelihood best-fit tree (200 iterations) and neighbor-joining tree (100iterations) are listed at each node. The neighbor-joining bootstrap values are listed in parentheses. (*) demarcates nodes where theneighbor-joining tree did not agree with the maximum likelihood tree. Designated Clades 1–5 are listed to the right of the tree. The scalebar represents 10% sequence divergence.
Comparison of 32 Roseobacter genomesRJ Newton et al
789
The ISME Journal
Genome content related to phylogenyWe examined genome characteristics (for example,GþC content, rRNA copy number, genome size;Table 1) and gene content of the 32 roseobacters
within the context of the five-clade phylogeneticframework. Whole genomic content comparisons(based on distribution patterns of 31 874 orthologs;see Materials and methods; Supplementary TableS2) indicate weak but significant genome clusteringby clade (ANOSIM R¼ 0.410, Pp0.001, threeroseobacters not assigned to clades were excludedfrom the statistical test), with the within-cladesimilarity in gene repertoire for Clades 1, 2 and 3driving this pattern (Figure 3). Generally, neither theexamined genome characteristics nor the examinedgene distributions segregate strongly based onphylogenetic relatedness (Table 1 and Figure 4).Some exceptions include a greater mean rRNAoperon copy number for Clade 1 than for otherclades (t-test, Pp0.01); a strictly heterotrophiccomposition of Clade 1; a genetic potential forbiotin synthesis in Clade 1 (vitamin synthesisin bacteria has been identified as important inbacterial–phytoplankton relationships; Croft et al.,2005; Wagner-Dobler et al., 2009); a lack of Lux-typequorum sensing genes in Clades 4 and 5; a geneticpotential for H2 oxidation unique to Clade 3; theabsence of sulfur oxidation genes in Clade 4genomes; and absence of the ppk1 gene for polypho-sphate biosynthesis in Clade 1 (whereas all isolatesoutside Clade 1 have this gene).
The lack of a strong segregation by phylogeneticassignment for genome content (Figure 3) or ecolo-gically relevant gene systems (Figure 4) suggests theimportance of gene acquisition by horizontal trans-fer originating either within or outside the lineage.Other evolutionary processes known to shapegenome content (selective gene loss, gene duplica-tion, gene genesis; Snel et al., 2002) are no doubtimportant in this lineage, but are mechanisms lesslikely to produce the observed patchy distributionof ecologically relevant genes in the Roseobacterisolates relative to their phylogenetic reconstruc-tion. Although the rates of gene transfer within theRoseobacter lineage is not known, the occurrence in30 of 32 genomes of gene transfer agent operons(Figure 4), an unusual system for moving chromo-somal fragments to close relatives (Biers et al., 2008;Zhao et al., 2009), suggests a mechanism for shapingRoseobacter gene content through frequent within-lineage gene transfers.
Between-genome similarities were generally higherfor Roseobacter isolates in the same genus (forexample, P. gallaeciensis BS107 and P. gallaeciensis2.10; R. denitrificans Och114 and Roseobacterlitoralis Och149; Figure 3) than for roseobactersbelonging to different genera. Nonetheless,blurred gene content boundaries among deeplybranching clades would impose a requirementof dozens of taxonomically shallow groups (forexample, species level) to accurately representRoseobacter contributions to ecosystem functions,thus making the phylogenetic framework acumbersome approach for defining ecologicalsubgroups.
Atlantic
10%4%
23%
21%
14%
28%
Indian4%
14%
10%
12%
31%
29%
Pacific2% 2%
27%
6%
23%
23%8%
8%
Clade 1
Clade 2 Clade 4
Clade 3 Clade 5
Maritimibacter alkaliphilus HTCC2654
Rhodobacterales bacterium HTCC2150
Rhodobacterales bacterium HTCC22255
Figure 2 Clade distribution among ocean basins calculated fromBLASTp best match of Roseobacter recA genes from the GlobalOcean Sampling (GOS). Atlantic n¼ 71, Pacific n¼48 and Indiann¼ 51.
Comparison of 32 Roseobacter genomesRJ Newton et al
790
The ISME Journal
Genome content related to trophic strategyOwing to the significant versatility in mechanismsfor obtaining carbon and energy previously observedfor this group (Buchan et al., 2005; Moran et al.,2007), we hypothesized that an organism’s trophicstrategy could impose or remove constraints ongenome content. For example, the ability to usesunlight for energy generation, which is widelydistributed within the Roseobacter lineage, mightmitigate an organism’s energy limitations in theoligotrophic marine environment, while imposingrequirements for metals and cofactors specific tophototrophy. Similarly, the ability to fix inorganiccarbon might reduce an organism’s requirements forsubstrate transporters. If such interplay betweentrophic strategy and functional gene repertoireexists, then significant and predictable differencesin genome content should be evident betweenlifestyle categories.
Thirteen of the 32 roseobacters have genes forphotoheterotrophy (10 AAnPs, 3 rhodopsin-contain-ing), whereas one has RuBisCO. The remaining 18are considered heterotrophs here (although somemay obtain energy from inorganic compounds suchas CO and H2S; Moran et al., 2007) (Table 1).
Genome ortholog comparisons suggest moderateand significant differences in genome contentamong these groups (ANOSIM R¼ 0.545,Pp0.001). The strength of these differences doesnot stem solely from the very unique rhodopsin-containing genomes (Figure 3). The differences alsoare not solely due to the presence of light-harvest-ing-related genes shared by the AAnP genomes orrhodopsin-containing genomes, as removal of therhodopsin genes and 29 genes specific for AAnPlight harvesting resulted in a similar level ofclustering by trophic strategy (ANOSIM R¼ 0.522,Pp0.001). The lifestyle framework accurately pre-dicts the gene repertoire groupings at similaritylevels X58% (Figure 3; red dashed line), whichrepresents the gene content relationships for 19 ofthe 32 genomes and is the best predictor of the threeframeworks analyzed.
The majority of non-light-harvesting gene orpathway-related differences among strains can betraced to hypothetical proteins unique to the AAnPsor heterotrophs, as well as to a number of genesencoding transcriptional regulators and amino aciduptake and synthesis systems (Figure 5). Althoughtrophic strategy was a good predictor of an isolate’s
Sim
ilarit
y
100
80
60
40
20
AtlanticIndian
PacificPolar
Euk. Assoc.
HeterotrophAAnP
RhodopsinRuBisCO
Clade 1Clade 2Clade 3Clade 4Clade 5
No Clade
Oct
adec
abac
ter
arct
icus
238
Oct
adec
abac
ter
anta
rctic
us 3
07
Rho
doba
cter
ales
HT
CC
2255
Mar
itim
ibac
ter
alka
liphi
lus
HT
CC
2654
Pel
agib
aca
berm
uden
sis
HT
CC
2601
Sag
ittul
a st
ella
ta E
-37
Ros
eova
rius
nubi
nhib
ins
ISM
Ros
eoba
cter
sp.
Azw
K-3
b
Ros
eova
rius
sp. 2
17
Ros
eova
rius
sp. T
M10
35
Sili
ciba
cter
sp.
TM
1040
Rue
geria
sp.
R11
Pha
eoba
cter
gal
laec
iens
is 2
.10
Pha
eoba
cter
gal
laec
iens
is B
S10
7
Sili
ciba
cter
pom
eroy
i DS
S-3
Rho
doba
cter
ales
Y4I
Ros
eoba
cter
sp.
SK
209-
2-6
Oce
anib
ulbu
s in
dolif
ex H
EL-
45
Ros
eoba
cter
sp.
ME
D19
3
Ros
eoba
cter
sp.
GA
I101
Sul
fitob
acte
r sp
. EE
-36
Sul
fitob
acte
r sp
. NA
S14
.1
Oce
anic
ola
bats
ensi
s H
TC
C25
97
Oce
anic
ola
gran
ulos
us H
TC
C25
16
Din
oros
eoba
cter
shi
bae
DF
L12
Ros
eoba
cter
den
itrifi
cans
OC
h114
Ros
eoba
cter
lito
ralis
Och
149
Rho
doba
cter
ales
HT
CC
2083
Rho
doba
cter
ales
HT
CC
2150
Jann
asch
ia s
p. C
CS
1
Lokt
anel
la v
estfo
lden
sis
SK
A53
Ros
eoba
cter
sp.
CC
S2
Figure 3 Complete linkage (that is, furthest neighbor) cluster analysis illustrating the gene content similarities among the genomes.Bray–Curtis similarities between all genome pairs were calculated from a matrix containing all 31 874 genes identified in the 32 genomes.In this manner, the similarity calculation was based on both the shared presence and shared absence of genes. For context, theP. gallaeciensis 2.10 to P. gallaeciensis BS107 comparison is 87.4% similar in this analysis compared with an average nucleotide identity(ANI) (Goris et al., 2007) of 97.0%. The D. shibae DFL12 to P. gallaeciensis 2.10 comparison is 46.7% similar here compared to an ANI of70.4%. The three framework groups of each isolate are illustrated by the shape and color pattern depicted at the tips of the clusterdiagram. The phylogenetic clade framework is represented by squares; the lifestyle framework is represented by triangles; and theenvironment framework is represented by circles. Unique combinations of these three frameworks are illustrated with colored boxesaround the names of isolates. Breaks in the three framework groupings are noted by an asterisk next to the strain name. Nodes below thedashed red line indicate groups with X58% similarity.
Comparison of 32 Roseobacter genomesRJ Newton et al
791
The ISME Journal
Pha
eoba
cter
gal
laec
iens
is 2
.10
Pha
eoba
cter
gal
laec
iens
is B
S10
7
Rue
geria
sp.
R11
Ros
eoba
cter
sp.
ME
D19
3R
oseo
bact
er s
p. S
K20
9-2-
6
Rho
doba
cter
ales
Y4I
Sili
ciba
cter
sp.
TM
1040
Sili
ciba
cter
pom
eroy
i DS
S-3
Sul
fitob
acte
r sp
. NA
S14
.1
Sul
fitob
acte
r sp
. EE
-36
Ros
eoba
cter
sp.
GA
I101
Oec
anib
ulbu
s in
dolif
ex H
EL-
45
Ros
eoba
cter
den
itrifi
cans
OC
h114
Ros
eoba
cter
lito
ralis
Och
149
Rho
doba
cter
ales
HT
CC
2083
Ros
eova
rius
sp. T
M10
35
Ros
eova
rius
sp. 2
17
Ros
eoba
cter
sp.
Azw
K-3
b
Ros
eova
rious
nub
inhi
bins
ISM
Pel
agib
aca
berm
uden
sis
HT
CC
2601
Sag
ittul
a st
ella
ta E
-37
Oce
anic
ola
bats
ensi
s H
TC
C25
97
Lokt
anel
la v
estfo
lden
sis
SK
A53
Ros
eoba
cter
sp.
CC
S2
Oct
adec
abac
ter
arct
icus
238
Oct
adec
abac
ter
anta
rctic
us 3
07
Oce
anic
ola
gran
ulos
us H
TC
C25
16
Din
oros
eoba
cter
shi
bae
DF
L12
Jann
asch
ia s
p. C
CS
1
Mar
itim
ibac
ter
alka
liphi
lus
HT
CC
2654
Rho
doba
cter
ales
HT
CC
2150
Rho
doba
cter
ales
HT
CC
2255
Isolation Source E E E A I A E A A A A A E E P E A P A A A A A P H H A E P A P PRelated to Defined Trophic Strategy
Photosynthetic enter (puf) • • • • • • • • • •Proteorhodopsin •Xanthorhodopsin • •CO2 Fixation •
Gene Transfer Agents • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •H2 Oxidation • • •Aromatics Degradation
B-ketoadipate • • • • • • • • • • • • • • • • • • • • • • • •Gentisate pathway • • • • •Benzoate • • • • •Phenylacetic acid • • • • • • • • • • • • • • • • • •Homoprotocatechuate • • • • • • • • • • • • • •Homogenisate pathway • • • • • • • • • • • • • • • • • • • • • •
Carbon Monoxide UtilizationHigh affinity (OMP-type) • • • • • • • • • • •Low-affinity (BMS-type) • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
C1 Compound UtilizationC1 incorporation (serine) • • • • • • • • • • • • •MeOH oxidation • • • • • • •TMA oxidation • • • • • • • • • • •Formaldehyde oxidation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Formate oxidation • • • • • • • • • • • • • • • • • • • • • • • • • •
C2 Compound UtilizationEthylmalonyl pathway • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Glyoxylate shunt •
Motility, Sensing, and AttachmentChemotactic ability • • • • • • • • • • • • • • • •Motility • • • • • • • • • • • • • • • • • • • • • •Flp pilus (Type IV) • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Lux quorum sensing • • • • • • • • • • • • • • • •VirB system – Type IV secretion • • • • • • • • • • • • • •
Secondary Metabolite ProductionAntibiotic production (TDA) • • •NRPS/PKS • • • • • • • • • • • • • • •
Vitamin RelatedBiotin synthesis • • • • • •Biotin uptake • • • • • • • • • • • • • • • • • • • • • • • • • •Cobalamin synthesis • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Thiamine synthesis • • • • • • • • • • • • • • • • • • • • • •
Iron RelatedHeme uptake • • • • • • • • • • • • • • • • •Siderophore uptake • • • • • • • • • • • • • •Fe
3+ uptake • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Fe-S synthesis • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Nitrogen Related
Dissimilatory nitrate reduction • • •Periplasmic dissim. nitrate reduction • •Assimilatory nitrate reduction • • • • • • • • • • • • • •Dissimilatory nitrite reduction • • • • • • • • • • •Assimilatory nitrite reduction • • • • • • • • • • • • • •Nitric oxide reduction • • • • • • • • • •Nitrous oxide reduction • • • • • • • •Hydroxylamine oxidation •Nitroalkane denitrification • • • •Carbamate kinase • • •Glutamine synthetase • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Glut/leu/phenyl/val dehydrogenase • • • • • • • • • • • •Urease • • • • • • • • • • • • • • • • • • • • • • • • • • •
Phosphorus RelatedAlkaline phosphatase • • • • • • • • • • • • • • • • • • •High affinity Pi transport • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Low affinity Pi transport • • • • • • • • • • • • • • • • • • • • • • • • • • •Phosphonate utilization • • • • • • • • • • • • • • • • • • • • • • • •Polyphosphate storage (ppk1) • • • • • • • • • • • • • • • • • • • • • • •
Sulfur RelatedDMSP demethylase(dmdA) • • • • • • • • • • • • • • • • • • • • • •DMSP lyase (dddL) • • • • • • •DMSP cleavage (dddD) • • • •DMSP cleavage (dddP) • • • • • • • • • • • • • • • • • • • •Sulfur oxidation (sox) • • • • • • • • • • • • • • • • • • • • • • •
Clade 1
Clade 2
Clade 3
Clade 4
Clade 5
Figure 4 A matrix depicting the presence of select genes or gene pathways in the 32 Roseobacter genomes arranged and color-coded by clade. Acolored box containing a dot indicates the presence of the gene/pathway. An ultrametric tree has been placed above the gene matrix for reference.Isolation source indicates the region where the Roseobacter strain was isolated and is coded as: A¼Atlantic Ocean, E¼Eukaryote Associated,H¼polar oceans (high latitude), I¼ Indian Ocean and P¼Pacific Ocean. Gene/pathway abbreviations are as follows: NRPS/PKS, non-ribosomalpeptide synthetase/polyketide synthase; Glut/leu/phenyl/val dehydrogenase, glutamate/luecine/phenylalanine/valine dehydrogenase.
Comparison of 32 Roseobacter genomesRJ Newton et al
792
The ISME Journal
gene repertoire, only a few of the ecologicallyrelevant gene systems we examined were strictlydifferentiated according to this framework. The fiveC1 utilization pathways identified in Roseobactergenomes (serine cycle, methanol oxidation, tri-methylamine oxidation, formaldehyde oxidationand formate oxidation) had a 70% occurrence ratein AAnP genomes (that is, 35 out of the 50 possibleoccurrences if all 10 AAnP genomes had all fivepathways), but only a 42% occurrence rate in the 19heterotrophs (only 40 out of 95 possible occur-rences; d-score, Pp0.01). The heterotrophs tendedto have more genes for six identified aromaticdegradation pathways (b-ketoadipate, gentisate,benzoate, phenylacetic acid, homoprotocatechuateand homogenisate) with a 60% occurrence rate (68out of 114 possible occurrences) compared with33% in AAnPs (20 of 60 possible occurrences;d-score, Pp0.01). Compared with the genomes ofthe other groups, the rhodopsin-containing genomesshared few orthologs that distinguished them as acoherent group (data not shown).
As noted previously (Buchan et al., 2005; Moranet al., 2007) and strongly reinforced in this analysis,Roseobacter genomes exhibit a remarkably versatilesuite of mechanisms for energy and carbon acquisi-tion. Along with the presence of genes for oxidizingcarbon monoxide and hydrogen sulfide (King, 2003;Moran et al., 2004), we found evidence for energygeneration by H2 oxidation in Roseovarius sp.TM1035, Pelagibaca bermudensis HTCC2601, andin Sagittula stellata E-37, proteorhodopsin- (Rhodo-bacterales sp. HTCC2255) and xanthorhodopsin-based (O. antarcticus 307 and O. arcticus 238)phototrophy, and CO2 fixation based on the presenceof a Form IC RuBisCO and homologs to Calvin–Benson–Bassham cycle genes in P. bermudensisHTCC2601 (Figure 4). This emerging picture ofhigh trophic versatility among cultured roseobacters
(Buchan et al., 2005; Wagner-Dobler and Biebl,2006; Brinkhoff et al., 2008) is in accord withrecent shifts away from a perception of marinebacterioplankton communities consisting largely ofcanonical photosynthetic and heterotrophic cells(Karl, 2002).
The rhodopsin-containing genomesThe three rhodopsin-containing genomes harboredthe most unique genome content of any of theisolates (Figure 3). The proteorhodopsin-containingRhodobacterales bacterium HTCC2255 gene contentwas unique because it consisted of only 2197 genes,far fewer than any the other genome. The twoxanthorhodpsin-containing isolates, O. arcticus andO. antarcticus (which are also the only two polarocean isolates), are clearly part of the Roseobacterlineage (Figure 1) but possess the greatest number ofunique genes among all isolates (2230 genes for O.arcticus and 1822 genes for O. antarcticus; 32 isolatemean¼ 617 and s.d.¼ 437). The majority of theseunique genes were annotated as phage or transpo-sase genes with 65 and 52 phage gene annotationsand 935 and 574 transposase gene annotations for O.arcticus 238 and O. antarcticus 307, respectively;these numbers are extremely high compared withthose in the other Roseobacter genomes (phagemean¼ 21 and s.d.¼ 10; transposase mean¼ 52and s.d.¼ 38).
Genome content related to ocean environmentEnvironmental properties are potential drivers ofmarine bacterial genome evolution by selecting forniche-specific genetic capabilities. For example,the marine cyanobacteria Procholorococcus andSynechococcus exhibit gene content patterns thatcorrelate well with the geographical locations from
AAnP
Hypothetical Hypothetical
OtherOther
MotilityInorganic Ion
Coenzyme Lipids Energy Secondary Metabolites
TRAP Transporters
n=140
Transcription
Amino Acid
Aerobic Anoxygenic Photosynthesis
Heterotroph
Secondary Metabolites
Signal Transduction
Posttranslation Modification
TRAP Transporters Replication/Recombination/Repair
Energy Coenzyme
Cell wall/Membrane
n=122
Transcription
Phenylacetic acid degradation
Lipids
Amino Acid
Figure 5 Relative abundance of ortholog groups that are overrepresented in the isolate genomes from a particular lifestyle strategy. Anortholog was considered overrepresented when it was X50% more prevalent in the genomes from one lifestyle strategy than the other.The overrepresented orthologs were grouped into functional categories whose relative percent abundance is depicted. The rhodopsin-containing and RuBisCO-containing lifestyle groupings were not considered because of the low number of genomes in these categories.
Comparison of 32 Roseobacter genomesRJ Newton et al
793
The ISME Journal
which they were isolated (Rocap et al., 2003; Paleniket al., 2006; Zwirglmaier et al., 2008; Martiny et al.,2009). For the third framework, we divided theRoseobacter strains into five broad environmentalcategories that resulted in the following distribution:15 isolates from the Atlantic Ocean, 1 isolate fromthe Indian Ocean, 6 isolates from the Pacific Ocean,2 isolates from the polar oceans and 8 isolates thatwere cultured in association with eukaryotic organ-isms (Table 1). We hypothesized that characteristicsdistinguishing major environments (for example,access to nutrients, differences in temperature)would exert a detectable influence on gene patternsin the Roseobacter genomes.
There was a significant but very weak relationshipbetween genome content and environmental originof the isolate (ANOSIM R¼ 0.296, P-value p0.002),although if only the Atlantic Ocean and PacificOcean isolates were compared, this relationship wasstronger (ANOSIM R¼ 0.398, P-value p0.002).Examination of ortholog patterns suggests that therelationship is based on the genomic distribution ofmotility genes, with 33% occurrence in PacificOcean isolates compared with 73% in AtlanticOcean isolates; chemotaxis genes, with 17% occur-rence in Pacific Ocean isolates compared with 47%in Atlantic Ocean isolates; denitrification systems,with 5% in Pacific Ocean isolates compared with29% in Atlantic Ocean isolates; phosphorus uptakesystems known to function at low phosphateconcentrations (alkaline phosphatases, high-affinityphosphate uptake and phosphonate uptake), with50% occurrence (12 out of 24 possible) in PacificOcean isolates compared with 85% (51 out of60 possible) for Atlantic Ocean isolates; andaromatic carbon degradation pathways (mixedocean basin patterns depending on the specificpathway) (Figure 4, all comparisons d-score P-valuep0.01). There were also a number of unique carbonand ion transporters, amino-acid metabolism genes,and transcription regulators restricted to each oceanbasin, and a large suite of unique genes shared bythe two polar isolates (data not shown). We testedwhether the higher frequency of coastal strainsamong the Pacific isolates compared with theAtlantic (Table 1) was the basis for the apparentocean basin pattern, but found it not to be thecase whether comparing whole-genome orthologpatterns (ANOSIM R¼ 0.031, P¼ 0.23 for coastal vsopen ocean isolate comparison) or individual genesystems (Supplementary Figure S2).
Geographical patterns in the GOS data setDespite the many factors that might obscure large-scale environmental imprints (including variedisolation methods, isolation dates spanning severaldecades and sparse spatial coverage), the oceanbasin of isolation seemingly had predictive powerfor the distribution of select genes/pathways amongthe Roseobacter genomes. To determine whether this
apparent grouping of genome content by geographi-cal origin applies broadly to populations of roseo-bacters in the world oceans, we probed the GOS dataset for similar environmental patterns. As the othertwo frameworks (phylogeny and lifestyle strategy)require assembled genomes, it is not possible to testfor these among the GOS Roseobacter populations.
Homologs to genes listed in Supplementary TableS3 were identified in the GOS peptide sequencedatabase (which currently does not include polarocean metagenomic data). They were designated asRoseobacter homologs if they had greatest similarityto a gene in a Roseobacter genome in subsequentBLASTp query analysis against all available bacterialgenome sequences (the CAMERA ‘All ProkaryoticProteins (P) database’). Many of the same patternsin gene distribution found for cultured roseo-bacters were evident in the metagenomic analysis(Figure 6a). Most notable was that all phosphorusacquisition systems known to function at lowphosphate concentrations (alkaline phosphatases,high-affinity phosphate uptake and phosphonateuptake) were much more abundant in wild roseo-bacters from the Atlantic Ocean, where the meanphosphate concentration is lower, than for either theIndian or Pacific Ocean (mean phosphate concen-tration is 0.06 mM for the Atlantic vs 0.15 mM for theIndian vs 0.53 mM for the Pacific; see Martiny et al.,2009 for details). The phosphate uptake system(pitA), which operates at high phosphate concentra-tions, had the opposite pattern, being more abun-dant in the Indian and Pacific Oceans (Figure 6a).Recently, other studies have noted similar trends forphosphorus gene distribution in the Prochlorococcusand SAR11 lineages (Rusch et al., 2007; Martinyet al., 2009), indicating that phosphorus concentra-tion may impart a strong selective force on marinebacterial genomes. Of particular note were Roseo-bacter genes encoding for phosphonate uptake andassimilation, which exhibited a very large bias indistribution toward the Atlantic Ocean (Figure 6a).
Most representative genes we examined weremore prevalent in the isolate genomes than in ourper-genome-equivalent calculations for the GOSsamples (Figure 6b), an observation that cannot beattributed to sampling disparities as 158 Roseobactergenome equivalents were sampled in the GOS (seeMaterials and methods). Compared with the roseo-bacters represented in culture, natural Roseobacterpopulations in the ocean are more likely to havegenes for processing DMSP and utilization of C1carbon compounds, but less likely to have genesinvolved in motility, adhesion, quorum sensing,gene transfer and iron uptake (Figure 6b). The higherprevalence of selected genes in the isolate genomescompared with GOS samples may indicate that thereare fewer genes per genome in wild cells, could beindicative of the differences in sampling locationsbetween the GOS samples and the isolates, or mightreflect a bias during our analysis in selecting genespreviously noted in cultured Roseobacter genomes.
Comparison of 32 Roseobacter genomesRJ Newton et al
794
The ISME Journal
As the GOS samples were passed through a 0.8-mM
filter before sequencing (Rusch et al., 2007), there isalso poor representation of sequences from particle-associated cells.
Conclusions
Comparative genomic analysis of a bacterial lineageis a powerful approach for revealing ecological andevolutionary forces that influence genome content,and might form the basis for delineating ecologicallydifferentiated clusters in nature. The substantial 16SrRNA sequence divergence within the roseobacters(11%; Buchan et al., 2005), currently spanning aminimum of 45 described genera, makes this thebroadest marine bacterial lineage for which acomparative genomic analysis has yet been under-taken. This taxonomic level is consistent, however,with current methodological resolution in microbialecology, including target groups for 16S rRNAprobes and primers (Alonso-Saez et al., 2007; Lamiet al., 2009), and efforts to assign taxon-specificbiogeochemical roles (Alonso and Pernthaler, 2006;Mou et al., 2008; Poretsky et al., 2010).
The three predictive frameworks examined herefor the Roseobacter genomes have previously beenshown to correlate with the genome content inbacterial taxa, including phylogenetic relatedness inProchlorococcus (Garczarek et al., 2000; Bibby et al.,
2003; Rocap et al., 2003), environmental resourcepartitioning in Vibrionaceae (a lineage with similar16S rRNA divergence as the roseobacters; Huntet al., 2008), and trophic strategies in bacterialendosymbionts and aquatic bacterioplankton(Moran and Baumann, 2000; Lauro et al., 2009).For the Roseobacter lineage, whole-genome contentanalysis of the 32 genomes produced 23 genomeclusters (Figure 3) representing 20 unique combina-tions of clade, trophic strategy and environmentalsource. New sequences of Roseobacter strains maywell increase the number of known genome clusters,particularly because two environmentally abundant16S rRNA clades do not yet have reference genomesequences. While all three frameworks had statisti-cally significant predictive power, none emergedas the potential overriding force imprinting Roseo-bacter genome content. Although other possibleexplanatory frameworks might have been consid-ered here, all but two of the 23 genome clustershave unique clade–trophy–environment assignments(Figure 3), suggesting that these three frameworkstogether acceptably classify most of the variability ingenome content.
The finding that trophic strategy correlates betterthan phylogeny or environment with Roseobactergene inventories (ANOSIM, R¼ 0.545 vs 0.410 vs0.296) was not anticipated at the outset of ouranalysis, at least in part because it is not a corre-late that has been widely examined for marine
pitA
Pacific
90%
hmuR
50%
30%
10%90%
70%
50%
30%
10%
90%70%50%30%10%
70%
Indian
tadBluxI
Atlantic
phnJ
catB bioF
hpcC dddP
boxBsgaA
coxG
soxBpaaJ coxL
mesacon
gdhA
gtdA
pcaGtmd
dmdApufC
cbbLfdnH ureC
hmgA
phoXppk1
phoRthiO
thiFphnM
100
80
60
40
20
Gen
e %
in G
OS
(O
cean
) S
amp
les
narKtdaH napA
nirDhoxA norCdddL luxR
mcpQcheB
Gene % in Ocean isolates’ Genomes
fliC1
20 40 60 80 100
npdarcC dddD
atzFmoxF
aceAbioFcatB
cbbLnosR
ccpA
luxInasANRPS
hmuR
mxcBcobB
irpA flgEgta_HK97
gta_orf6
hmgA thiOthiF
coxLtadBgtdA
pufC
hpcCcoxG
boxB gdhA
paaJ
phnJpcaG mesacon
phnM
phoXsoxB ppk1
phoRpitAtmd
dddP
sgaA dmdA fdnH ureC
Figure 6 (a) Ocean basin (Atlantic, Indian and Pacific) three-way comparison of Roseobacter genes/gene pathways (see Figure 4). Theposition of each dot indicates the relative abundance of a gene in the Global Ocean Sampling (GOS) data set, based on 61, 52 and 45Roseobacter genome equivalents in the Atlantic, Indian and Pacific Ocean data sets, respectively. Only genes present at a frequencyX10% of genome equivalents in any one ocean basin are depicted. Triangle vertices indicate 100% relative abundance of that particulargene in the representative ocean. Three lines creating an inverted triangle have been drawn to aid in visual interpretation and indicate arelative abundance¼ 50% in a single ocean basin. (b) Gene occurrence percentage of Roseobacter genome equivalents in the GOS data set(n¼ 158) vs isolate genomes (n¼ 32). Gene occurrence percentage 4100% (that is, more than one copy per genome equivalent) in theGOS samples are represented as 100%. Gene descriptions are listed in Supplementary Table S2. GOS samples included in thecomparison are listed in Supplementary Table S3.
Comparison of 32 Roseobacter genomesRJ Newton et al
795
The ISME Journal
bacterial genomes. Nevertheless, the past decadehas uncovered remarkable flexibility in the trophicstrategies of marine bacterioplankton, suggesting thatacquisition of alternate mechanisms for obtainingcarbon and energy may be a strong evolutionaryforce in the ocean. The occurrence of several distincttrophic schemes within the taxonomically broadRoseobacter lineage provided an ideal opportunityto explore whether a bacterium’s strategy forobtaining carbon and energy predicts other aspectsof genome content. Differences in gene contentamong trophic groups were unfortunately domi-nated by hypothetical proteins, which providelittle biological insight, although C1 and aromaticcarbon oxidation genes and amino-acid transportand metabolism genes contributed to the signal.This concept of lifestyle imprinting of genomecontent, which has been explored in great detailfor bacterial endosymbionts (for example, Moranand Baumann, 2000), may therefore also be impor-tant for understanding gene inventories of oceanmicrobes.
Roseobacter-like genes in the GOS data setshowed significant variation in frequency acrossocean basins, although only a fraction of all possiblegenes and gene systems appear to be shaped at thisgrand scale (Figure 6a). The GOS data set was alsovaluable for determining how well the genomesfrom the cultured roseobacters represent the reper-toire and stoichiometry of genes in ocean-dwelling‘wild’ roseobacters, an important perspective forassessing the relevance of this isolate-based genomeanalysis. Although the mismatch in frequency ofsome examined genes between isolates and the GOSdata set suggests that the currently cultured strainsmay not yet provide a faithful representation of theprevalent natural Roseobacter populations, manygenes and gene systems were indeed present atcomparable frequencies (Figure 6b).
Overall, our analysis has firmly establishedroseobacters as ecological generalists, harboringlarge gene inventories and a remarkable suite ofmechanisms by which to obtain carbon and energy.Further, this comparative analysis has illustratedthat members of the lineage cannot be easilycondensed into a few ecologically differentiatedclusters; rather, each genome is largely unique in itsassortment of genes for acquisition and transforma-tion of carbon and nutrients. The fact that the bestframework for predicting genome content is lifestylestrategy, not phylogeny, indicates that horizontalgene transfer and homologous recombination maybe particularly dominant evolutionary forces in thismarine bacterial lineage (possibly facilitated by anunusual gene transfer agent system that is prevalent;Biers et al., 2008; Zhao et al., 2009). Further insightsinto correlates of genome content, coupled withcontinued efforts to identify Roseobacter genes thatare common in the world oceans, will betterelucidate the functional roles of roseobacters inmarine ecosystems.
Acknowledgements
This project was supported by grants from the Gordonand Betty Moore Foundation and the National ScienceFoundation (OCE0724017 and MCB0702125). We thankDr S Santos for providing a list of universal genes.
References
Alonso C, Pernthaler J. (2006). Concentration-dependentpatterns of leucine incorporation by coastal pico-plankton. Appl Environ Microbiol 72: 2141–2147.
Alonso-Saez L, Balague V, Sa EL, Sanchez O, Gonzalez JM,Pinhassi J et al. (2007). Seasonality in bacterialdiversity in north-west Mediterranean coastal waters:assessment through clone libraries, fingerprinting andFISH. FEMS Microbiol Ecol 60: 98–112.
Bibby TS, Mary I, Nield J, Partensky F, Barber J. (2003).Low-light-adapted Prochlorococcus species possessspecific antennae for each photosystem. Nature 424:1051–1054.
Biers EJ, Wang K, Pennington C, Belas R, Chen F, MoranMA. (2008). Occurrence and expression of genetransfer agent genes in marine bacterioplankton. ApplEnviron Microbiol 74: 2933–2939.
Blankenship RE, Madigan MT, Bauer CE. (1995). Anoxy-genic Photosynthetic Bacteria. Kluwer AcademicPublishers: Dordrecht, Boston.
Brinkhoff T, Giebel HA, Simon M. (2008). Diversity,ecology, and genomics of the Roseobacter clade: ashort overview. Arch Microbiol 189: 531–539.
Buchan A, Gonzalez JM, Moran MA. (2005). Overview ofthe marine Roseobacter lineage. Appl Environ Micro-biol 71: 5665–5677.
Coleman ML, Chisholm SW. (2007). Code and context:Prochlorococcus as a model for cross-scale biology.Trends Microbiol 15: 398–407.
Croft MT, Lawrence AD, Raux-Deery E, Warren MJ, SmithAG. (2005). Algae acquire vitamin B12 through asymbiotic relationship with bacteria. Nature 438:90–93.
Dufresne A, Ostrowski M, Scanlan DJ, Garczarek L,Mazard S, Palenik BP et al. (2008). Unraveling thegenomic mosaic of a ubiquitous genus of marinecyanobacteria. Genome Biol 9: R90. .1-R90.16.
Eisen JA. (1995). The RecA protein as a model moleculefor molecular systematic studies of bacteria: Compar-ison of trees of RecAs and 16S rRNAs from the samespecies. J Mol Evol 41: 1105–1123.
Follows MJ, Dutkiewicz S, Grant S, Chisholm SW. (2007).Emergent biogeography of microbial communities in amodel ocean. Science 315: 1843–1846.
Garczarek L, Hess WR, Holtzendorff J, van der StaayGWM, Partensky F. (2000). Multiplication of antennagenes as a major adaptation to low light in a marineprokaryote. Proc Natl Acad Sci USA 97: 4098–4101.
Gonzalez JM, Moran MA. (1997). Numerical dominance ofa group of marine bacteria in the alpha-subclass of theclass Proteobacteria in coastal seawater. Appl EnvironMicrobiol 63: 4237–4242.
Goris J, Konstantinos KT, Klappenbach JA, Coenye T,Vandamme P, Tiedje JM. (2007). DNA-DNA hybridiza-tion values and their relationship to whole-genomesequence similarities. Int J Syst Evol Microbiol 57:81–91.
Comparison of 32 Roseobacter genomesRJ Newton et al
796
The ISME Journal
Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S,Lamerdin J et al. (2001). The photosynthetic apparatusof Prochlorococcus: insights through comparativegenomics. Photosynth Res 70: 53–71.
Howard EC, Sun SL, Biers EJ, Moran MA. (2008).Abundant and diverse bacteria involved in DMSPdegradation in marine surface waters. Environ Micro-biol 10: 2397–2410.
Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, PolzMF. (2008). Resource partitioning and sympatricdifferentiation among closely related bacterioplank-ton. Science 320: 1081–1085.
Johnson ZI, Zinser ER, Coe A, McNulty NP, WoodwardEMS, Chisholm SW. (2006). Niche partitioning amongProchlorococcus ecotypes along ocean-scale environ-mental gradients. Science 311: 1737–1740.
Karl DM. (2002). Nutrient dynamics in the deep blue sea.Trends Microbiol 10: 410–418.
King GA. (2003). Molecular and culture-based analyses ofaerobic carbon monoxide oxidizer diversity. ApplEnviron Microbiol 69: 7257–7265.
Lami R, Ghiglione J-F, Desdevises Y, West NJ, Lebaron P.(2009). Annual patterns of presence and activity ofmarine bacteria monitored by 16S rDNA–16S rRNAfingerprints in the coastal NW Mediterranean Sea.Aquat Microb Ecol 54: 199–210.
Lauro FM, McDouglad D, Thomas T, Williams TJ, Egan S,Rice S et al. (2009). The genomic basis of trophicstrategy in marine bacteria. Proc Natl Acad Sci USA106: 15527–15533.
Legendre P, Legendre L. (1998). Numerical Ecology.Second English edn. Elsevier Science, BV, Elsevier:Amsterdam, Netherlands.
Ludwig W, Strunk O, Westram R, Richter L, Meier H,Kumar Yet al. (2004). ARB: a software environment forsequence data. Nucleic Acids Res 32: 1363–1371.
Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, ChuK, Dalevi D et al. (2008). IMG/M: a data managementand analysis system for metagenomes. Nucleic AcidsRes 36: D534–D538.
Martiny AC, Coleman ML, Chisholm SW. (2006). Phos-phate acquisition genes in Prochlorococcus ecotypes:evidence for genome-wide adaptation. Proc Natl AcadSci USA 103: 12552–12557.
Martiny AC, Huang Y, Li WZ. (2009). Occurrence ofphosphate acquisition genes in Prochlorococcus cellsfrom different ocean regions. Environ Microbiol 11:1340–1347.
Moran MA, Belas R, Schell MA, Gonzalez JM, Sun F, SunS et al. (2007). Ecological genomics of marineroseobacters. Appl Environ Microbiol 73: 4559–4569.
Moran MA, Buchan A, Gonzalez JM, Heidelberg JF,Whitman WB, Kiene RP et al. (2004). Genomesequence of Silicibacter pomeroyi reveals adaptationsto the marine environment. Nature 432: 910–913.
Moran NA, Baumann P. (2000). Bacterial endosymbiontsin animals. Curr Opin Microbiol 3: 270–275.
Mou X, Sun S, Edwards RA, Hodson RE, Moran MA.(2008). Bacterial carbon processing by generalistspecies in the coastal ocean. Nature 451: 708–711.
Palenik B, Ren QH, Dupont CL, Myers GS, Heidelberg JF,Badger JH et al. (2006). Genome sequence ofSynechococcus CC9311: insights into adaptation toa coastal environment. Proc Natl Acad Sci USA 103:13555–13559.
Polz MF, Hunt DE, Preheim SP, Weinreich DM. (2006).Patterns and mechanisms of genetic and phenotypic
differentiation in marine microbes. Phil Trans R Soc B361: 2009–2021.
Poretsky RS, Sun S, Mou X, Moran MA. (2010). Transpor-ter genes expressed by coastal bacterioplankton inresponse to dissolved organic carbon. Environ Micro-biol. Advance online publication doi:10.1111/j.1462–2920.2009.02102.x.
Rabouille S, Edwards CA, Zehr JP. (2007). Modeling thevertical distribution of Prochlorococcus and Synecho-coccus in the North Pacific Subtropical Ocean.Environ Microbiol 9: 2588–2602.
Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P,Ahlgren NA et al. (2003). Genome divergence intwo Prochlorococcus ecotypes reflects oceanic nichedifferentiation. Nature 424: 1042–1047.
Rusch DB, Halpern A, Sutton G, Heidelberg KB,Williamson S, Yooseph S et al. (2007). The SorcererII Global Ocean Sampling expedition: NorthwestAtlantic through Eastern Tropical Pacific. PLoS Biol5: 398–431.
Santos SR, Ochman H. (2004). Identification and phylo-genetic sorting of bacterial lineages with universallyconserved genes and proteins. Environ Microbiol 6:754–759.
Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczar-ek L, Hess WR et al. (2009). Ecological genomics ofmarine picocyanobacteria. Microbiol Mol Bio Rev 73:249–299.
Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M.(2007). CAMERA: a community resource for meta-genomics. PLoS Biol 5: 394–397.
Shiba T, Simidu U, Taga N. (1979). Distribution of aerobic-bacteria which contain bacteriochlorophyll-a. ApplEnviron Microbiol 38: 43–45.
Shimada K. (1995). Aerobic anoxygenic phototrophs. In:Blankenship RE, Madigan MT, Bauer CE (eds).Anoxygenic Photosynthetic Bacteria. Vol. 2. SpringerNetherlands, Dordrecht; Boston: Kluwer AcademicPublishers, pp 105–122.
Snel B, Bork P, Huynen MA. (2002). Genomes in flux:the evolution of archaeal and proteobacterial genecontent. Genome Res 12: 17–25.
Sorokin DY, Banciu H, van Loosdrecht M, Kuenen JG.(2003). Growth physiology and competitive interac-tion of obligately chemolithoautotrophic, haloalkali-philic, sulfur-oxidizing bacteria from soda lakes.Extremophiles 7: 195–203.
Stamatakis A, Hoover P, Rougemont J. (2008). A rapidbootstrap algorithm for the RAxML web servers. SystBiol 57: 758–771.
Swingley WD, Sadekar S, Mastrian SD, Matthies HJ, Hao J,Ramos H et al. (2007). The complete genome sequenceof Roseobacter denitrificans reveals a mixotrophicrather than photosynthetic metabolism. J Bacteriol189: 683–690.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, KiryutinB, Koonin EV et al. (2003). The COG database: anupdated version includes eukaryotes. BMC Bioinfor-matics 4: 1–14.
Wagner-Dobler I, Biebl H. (2006). Environmental biologyof the marine Roseobacter lineage. Annu Rev Micro-biol 60: 255–280.
Wagner-Dobler I, Ballhausen B, Berger M, Brinkhoff T,Buchholz I, Bunk B et al. (2009). The completegenome sequence of the algal symbiont Dinoroseobactershibae: a hitchiker’s guide to life in the sea. ISME J 4:61–77.
Comparison of 32 Roseobacter genomesRJ Newton et al
797
The ISME Journal
West NJ, Obernosterer I, Zemb O, Lebaron P. (2008). Majordifferences of bacterial diversity and activity insideand outside of a natural iron-fertilized phytoplanktonbloom in the Southern Ocean. Environ Microbiol 10:738–756.
West NJ, Scanlan DJ. (1999). Niche-partitioning ofProchlorococcus populations in a stratified watercolumn in the eastern North Atlantic Ocean. ApplEnviron Microbiol 65: 2585–2591.
Yooseph S, Sutton G, Rusch DB, Halpern AL, WilliamsonSJ, Remington K et al. (2007). The Sorcerer II Global
Ocean Sampling expedition: expanding the universeof protein families. PLoS Biol 5: 432–466.
Zhao YL, Wang K, Budinoff C, Buchan A, Lang A, Jiao NZet al. (2009). Gene transfer agent (GTA) genes revealdiverse and dynamic Roseobacter and Rhodobacterpopulations in the Chesapeake Bay. ISME J 3: 364–373.
Zwirglmaier K, Jardillier L, Ostrowski M, Mazard S,Garczarek L, Vaulot D et al. (2008). Global phylogeo-graphy of marine Synechococcus and Prochlorococcusreveals a distinct partitioning of lineages amongoceanic biomes. Environ Microbiol 10: 147–161.
Supplementary Information accompanies the paper on The ISME Journal website (http://www.nature.com/ismej)
Comparison of 32 Roseobacter genomesRJ Newton et al
798
The ISME Journal