Upload
frans-j
View
213
Download
0
Embed Size (px)
Citation preview
Chapter 33
Soil Metagenomic Explorationof the Rare Biosphere
Tom O. Delmont, Laure Franqueville, Samuel Jacquiod,Pascal Simonet, and Timothy M. Vogel
33.1 INTRODUCTION
Microorganisms can be considered the fundamentaldriving force of the biosphere and have dominated lifeon Earth for over 3 billion years. They have colonizedall ecological niches, from caves [Pasic et al., 2010] tothe stratosphere [Wainwright et al., 2003] and from deepocean thermal vents [Huber et al., 2007] to deserts [Point-ing et al., 2009] and polar snow [Larose et al., 2010; seealso Vol. II]. Other forms of life are completely depen-dent upon these microscopic organisms. For example,humans cannot survive without the rich microbial florainhabiting their own bodies [Turnbaugh et al., 2006].When examining the importance of different ecosystems,soil stands out as the habitat on Earth that harbors byfar the largest microbial diversity [Torsvik et al., 1990,2002; Whitman et al., 1998; van Elsas et al., 2006; seealso Chapter 9, Vol. II]. The genetic resources in a singlegram of soil possess 3 million times more sequencesthan in the human genome; and only three grams of soilcontain more bacteria than the Earth does humans.
Soil microbial communities are known to drive majorgeochemical cycles [Falkowski et al., 2001], to supporthealthy plant growth [Ortız-Castro et al., 2009], and todegrade organic matter and pollutants [Singh et al., 2008].However, little is known about the vulnerability of theirkey functions and how they respond to human-inducedenvironmental perturbations, such as climate changeand land use. Cultivation-based approaches, althoughlimited in scope, have already shown that soil harborsdiverse antibiosis-related functions [Adesina et al., 2007],pollutant degrading bacteria [Richard and Vogel, 1999],
Handbook of Molecular Microbial Ecology, Volume I: Metagenomics and Complementary Approaches, First Edition. Edited by Frans J. de Bruijn.© 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.
plant growth promoting bacteria [Leveau, 2007], bacteriaresistant to heavy metals and antibiotics [Baker-Austinet al., 2006; Dcosta et al., 2007; Demaneche et al.,2008], and bacteria capable of surviving in extremeenvironments [Dib et al., 2008; Hery et al., 2003].
However, only a fraction (less than 0.5%) of themicrobiota in soil has been cultivated using any of anarray of techniques [Amann et al., 1995; Davis et al.,2005; Stalay and Konopka, 1995], thus limiting consid-erably our understanding of these microorganisms. Inorder to access this tremendous biodiversity, researchershave recently developed sophisticated molecular tech-niques that can explore and exploit a wide range of soilbiodiversity. These techniques derived their pertinencefrom the direct analysis and exploration of microbialcommunity DNA (metagenomics) [Courtois et al., 2003;Demaneche et al., 2008; Ginolhac et al., 2004]. In thiscontext, soil metagenomics is the study and exploitationof the collective genome of all organisms present in aparticular soil sample [Handelsman et al., 1998]. Thecritical technical issue today is the nonbias access to theentire soil microbial diversity.
To date, there have been only superficial, althoughambitious, attempts to explore the soil metagenome. Duein part to incomplete genomic extraction techniques andincomplete sequencing, today scientists are only exploringa minute fraction of soil metagenomes. Potential errors indata interpretation due to the limited vision of these smallcommunity fractions might be driving hypotheses awayfrom incorporating functional redundancies and the impor-tance of the minor members of the soil community (rarebiosphere). Recent advances in high-throughput cloning
287
288 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
and new generation sequencing (see Chapter 18, Vol. I)might improve the prospect of completely sequencingthe soil metagenome and thus obtaining a more completepicture of the soil microbial community function.
33.2 SOIL METAGENOMIC LIMITS
The soil microbial community is relatively diverse [Cur-tis et al., 2002, Robe et al., 2003] and appears to havethe highest prokaryotic diversity of any environment [vanElsas et al., 2006; Roesch et al., 2007]. One gram ofsoil is reported to contain up to 10 billion microorgan-isms and thousands of different species [Knietch et al.,2003]. Thus, the potential access to the genomes of thesemicroorganisms requires diverse approaches due in partto their differential location in the soil microstructure,their differences in membrane consistency, and their dif-ferences in relative numbers. In order to overcome some ofthese difficulties, different metagenomic associated meth-ods have been developed. All of these methods themselveshave some type of shortcoming; however, when pooledtogether, they will provide greater depth than any tech-nique by itself. For example, DNA purification techniqueshave numerous biases that limit the quantity of the so-called metagenomic DNA [Delmont et al., 2011]. Second,the proportion of the genomes is nonuniform [Ranjardand Richaume, 2001]; thus, access to the less representedgenomes is reduced statistically. In any case, these type oftechniques must be applied in order to overcome cell cul-ture limitations [Amann et al., 1995]; but because of theirown limits, a complete soil metagenomic study remainsan important challenge.
Classically, different approaches have been applied toestimate soil biodiversity richness. The typical method isby the amplification, cloning, and then sequencing of theribosomal small unit RNA genes (“16S”; see Chapter 16,Vol. I). Rarefaction curves can be created based on geneticdiversity and sequence proportions in order to estimatethe total soil microbial diversity with (and unfortunatelywithin) a limited sample. The 16S clone libraries con-structed by Tringe et al. [2005] and Morales et al. [2009]were extrapolated to determine soil microbial diversitiesbetween 3000 and 4000 different species (based on 97%similarity of the 16S sequence for species identity) aftersequencing 1700 and 5000 clones, respectively. Roeschet al. [2007] did 16S amplification and direct pyrosequenc-ing (without the cloning step) of a part of this gene. Withthis approach and 55,000 16S sequences, they evaluatedthe diversity to be 6000 different species based on 97%similarity of the 16S sequence for species identity.
A completely different approach based on genomereassociation kinetics measures the rate of rehybridiza-tion of metagenomic DNA. The kinetic measurementsare evaluated based on different hypothetical species
distributions. For example, Torsvik et al. [1990]; see alsoChapter 2, Vol. I estimated soil diversity to be 10,000different species assuming that all species were equalin number. More recently, Gans et al. [2005] estimatedthe diversity to be 8 × 106 species in the tested soil byassuming a log-normal species number distribution.
One explanation for this wide range of predictedspecies numbers in soil can be the metagenomic DNAdiversity represented in the 16S amplification mix usedfor their analyses. Most researchers amplify the 16S genefrom a minor (and possibly nonrepresentative) fractionof the metagenomic DNA present in a gram of soil.Thus, the amplification mix contains only about 10 ngs(less than 1% of extracted DNA), which represent only106 copies of prokaryote genomes, from about severalmicrograms of extracted DNA per gram of soil (quantityis dependent on the soil type and sample depth). Toreduce this limitation, Roesch et al. (2007) used a totalof 10 μgs of DNA (96 different 16S amplifications).However, the diversity estimate were not drasticallydifferent from the results obtained by Morales et al.,[2009] with 5000 16S clones using one 16S variableregion of the DNA extracted. These observations doemphasize the existence of a much more important limit.The sequence redundancy due to nonuniform speciesproportions leads to the dilution of the less representedones during the first cycles of 16S amplification, thuslimiting considerably the diversity obtained. This possiblelimitation seems more prevalent and likely than that dueto the low initial amount of DNA in the PCR mix.
Another important factor that often leads to under-estimated biodiversity results is the DNA extractionprotocol. Most studies focus their efforts on assessingmicrobial diversity with only one extraction technique.The microbial diversity uncovered with different methodsdemonstrates the danger of such a narrow approachif the goal is to optimize access to biodiversity. TheDNA extraction appears to be a crucial step in soilmetagenomics because it defines in part the extent ofthe information available. To overcome this bias, a broadDNA recovery strategy is needed [Delmont et al., 2011;see Chapters 10 and 11, vol. II].
33.3 SOIL METAGENOMICADVANCES
Soil metagenomic studies have rapidly expanded since1998 (Fig. 33.1) and an important number of tools andmethods have been developed to recover and studyprokaryotic DNA diversities [Delmont et al., 2011;Rajendhran and Gunasekaran, 2008; see Chapter 10,Vol. II]. The two major metagenomic advances werehigh-throughput cloning and new-generation sequencingapproaches. These new technologies became precious
33.4 An International Partnership to Open the Black Box 289
45
40
35
30
25
20
15
10
5
02000
Num
ber
of p
ublic
atio
ns in
Pub
Med
com
port
ing
the
wor
ds “
soil”
and
“m
etag
enom
ic”
2001 2002 2003 2004 2005 2006 2007 2008 2009
Figure 33.1 Number of publications in Pubmed with thewords “soil” and “metagenomic” from 2000 to 2009.
tools for the discovery of new species, genes andfunctions of interest. While there is a tendency to explorethe metagenomics of a soil through sequencing, the clonelibrary approach often provides better validation of newlydiscovered genes.
Genomic studies stand at the vanguard of scienceas technological advances are providing access to thefunctioning of different biological systems. After theHuman Genome Project (3 billion base pairs) and theSargasso Sea marine sequencing effort (6 billion basepairs), soil exploration constitutes a new and ambitiouschallenge as it harbors about 10 trillion base pairs (if104 species are present) to 1000 trillion base pairs ofdiversity (if 106 species are present) per gram of soil.Just as knowledge of the human genome promises torevolutionize medical science, the application of genomictechnologies to microbial evolution and environmentalbiology promises to revolutionize microbiology. Thesoil microbial community represents a true goldmine forgenes that encode novel biocatalysts involved in eitherbiosynthetic or biodegradation processes, including theproduction of polyketide synthases [Ginolhac et al., 2004,2005] and the degradation of human-made pollutants[Handelsman et al., 2002; Boubakri et al., 2006; Galvaoet al., 2005; see also Sections 6 and 7, Vol. II]. In spiteof new possibilities to study metagenomic DNA, only aminor fraction of soil metagenomes have been sequencedto date and only predominant species were discoveredand studied. The majority of soil genetic richness andfunction still remains to be discovered.
33.4 AN INTERNATIONALPARTNERSHIPTO OPEN THE BLACK BOX
As new sequencing technologies are increasing both insequence length and number of sequences [Shendure
and Ji, 2008], massive DNA sequencing is becominga serious possibility. As a consequence, large nationaland international projects were launched to sequenceenvironments such as the ocean and the human gut. Yet,soil nucleic explorations have been relatively limited inpart due to the enormous soil heterogeneity, the vastbiodiversity, and the difficulty to access this biodiversitylocked up in the soil matrix. Soil microbiologists,microbial ecologists, geneticists, molecular biologists,and bioinformaticians have agreed to collaborate toprovide the first complete metagenomic sequence(s)of a soil. This metagenome sequence data will con-stitute a “reference” to which other soils around theworld could be compared. Thus, other metagenomicprojects devoted to sequencing parts of different soilgenomes through the world will be able to use thiscomplete metagenome as a scaffold for annotation of“core” genes representing common soil microorganismsin other soils and as a basis for estimating differ-ences in diversity, completeness, and richness betweensoils. The combination of metagenomics approachesand broad-scale sequencing will open a totally newera in soil microbiology with advances ranging fromdetection of climatic indicators and greenhouse gasproduction to drug discovery, as well as from cor-relating biodiversity and function to predicting thebiosphere’s resilience to human-induced perturbation.The soil system chosen for investigation, Park Grass,Rothamsted (UK), is a “charismatic and internationallyrecognized resource” [Silvertown et al., 2006]. Thisunique long-term ecological site includes ongoingexperiments that have been running for over 150 years.The research center at Rothamsted provides a historyof soil biology and chemistry as well as an archiveof soil samples from detailed studies of different plottreatments (http://www.rothamsted.ac.uk/). Metadata areavailable concerning climate, soil use, and chemicalinputs.
290 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
The success of soil metagenomics depends to a largeextent on intelligent decision-making concerning sampleselection, DNA extraction methods, cloning strategies,screening methods, technological advances in sequencingapproaches, and data management and sharing. Recentprogress in methods to capture the vast scale of geneticdiversity within soil microbial communities will enabledeep metagenomic sequencing [Morales et al., 2009;Delmont et al., 2011]. These advances include methodsto dissect the community using DNA or cell extractionseparation methods that can be optimized to detectabundant and rare members of the community. Thesepromising new metagenomic approaches, which arecurrently being applied to the Rothamsted soil, rely onmassive parallel high-throughput sequencing of DNAextracted from soil microorganisms. Furthermore, DNAmicroarrays loaded with soil-dedicated probes have beenused in order to identify the diversity and relative abun-dance of phylogenetic and functional genes in differentDNA fractions [Gebert et al., 2008; Huyghe et al., 2008;Sanguin et al., 2006; Ward et al., 2007]. Cloning oflarge DNA fragments from soil to create metagenomiclibraries enables the examination of genetic pathwaysand contiguous sequences (e.g., operons). In addition, thenew-generation sequencing methods, including enhanced454 Titanium with up to 1000-bp reads, will complementthe cloning-based approaches and avoid the associatedcloning bias (DNA weight and cloning step). Datagenerated will provide insight into which soil organismsare actually involved in soil processes, especially whenRNA is extracted. These approaches, coupled with novelbioinformatics methods (e.g., MG-RAST [Meyer et al.,2008; see Chapter 37, Vol. I]), provide an in-depthanalysis of both generated sequence and array data. Onebioinformatics limit stands out in the interpretation ofthis data; there are insufficient sequenced genomes orsequences that have been correctly annotated; therefore,around 50% of soil metagenome sequences cannot beused due to the lack of homologues in the databases.
33.5 THE METASOIL PROJECT
The Metasoil project, funded by the French NationalResearch Agency, is initiating the Park Grass metagenomesequencing and cloning as well as the international con-sortium (Terragenome) [Vogel et al., 2009]. The firstpart of this project is a 100-Gbp soil metagenomic DNAsequencing (length up to 500 bp) with up to 100 titaniumpyrosequencing runs. This sequencing effort correspondsto the equivalent of more than 20,000 prokaryoticgenomes. Moreover, in parallel with the in-depth soilpyrosequencing, Metasoil includes the construction ofa two million fosmid (40-kb inserts) library to serve
the international scientific community for studying thissoil metagenome and for searching for new genes ofinterest. The fosmid library and pyrosequencing runswill be done using various metagenomic DNA samplesextracted from the same soil, but with a panel of differentmethods (soil, cells, and DNA separations) to access amaximum diversity [Delmont et al., 2011]. Due to DNAextraction biases, such as uneven species proportion andsoil heterogeneity, a mixed DNA acquisition strategyis being evaluated in order to enhance complete soilmetagenome exploration. This strategy is based onvariable metagenomic DNA separation methods and aimsto increase the soil DNA diversity recovery significantly.Moreover, the strategy should standardize the relativespecies proportion at the nucleic level as much as possibleand, thus, increase access to the rare biosphere.
33.6 SOIL METAGENOMIC DNARECOVERING STRATEGY
33.6.1 Soil FractionationThe Park Grass untreated control plot (3d) selected for theproject possesses a 249-m2 surface (13.28 × 18.75 m).The sampling strategy consists of a randomized soil sam-pling in different areas of the plot [horizontal sampling]and at different depths (vertical sampling). Preliminaryprokaryote diversity studies were carried out using RISA(ribosomal intergenic spacer analysis fingerprint afterextracting DNA with MP Bead beating Fast Prep protocol[Griffiths et al., 2000]) and showed a stable diversitypattern horizontally. But significant differences havebeen discovered as a function of depth (data not shown).Further experiments were done based on different depthfraction (0–3 cm, 3–6 cm, and 18–21 cm) using aphylochip technology (Fig. 33.2; see Chapter 58, Vol. I).
Results show that some families are detected only inone soil fraction and that the proportion of the differentfamilies can fluctuate strongly with depth. Differences insoil depth appear to provide access to different metage-nomic DNA fractions and can be helpful for increasingthe diversity recovered and for limiting sequenceredundancy.
33.6.2 Prokaryote Cells ExtractionAfter soil fractionation, we compared different methodsfor DNA recovery. In order to obtain metagenomicDNA, cells can be lysed within the soil sample [Orgamet al., 1987; van Elsas et al., 1997] or first removedfrom the matrix and then lysed [Berry et al., 2003]. Thissecond method is commonly called indirect extractionand attempts to separate prokaryotic cells [the cell ring]
33.6 Soil Metagenomic DNA Recovering Strategy 291
25
20
15
10
5
0
Ace
toba
cter
ocea
eA
cido
bact
eria
ceae
Aer
omon
adac
eae
Bac
illac
eae
Bra
dyrh
izob
iace
aeB
ruce
llace
aeB
urkh
olde
riace
aeC
aulo
bact
erac
eae
Chl
orob
iace
aeC
lost
ridia
ceae
Clo
strid
iale
s F
amily
XV
IIC
orio
bact
eria
ceae
Cox
iella
ceae
Die
tzia
ceae
Ent
erob
acte
riace
aeE
ubac
teria
ceae
Fla
voba
cter
iace
aeF
lexi
bact
erac
eae
Geo
bact
erac
eae
Geo
derm
atop
hila
ceae
Gly
com
ycet
acea
eH
elic
obac
tera
ceae
Intr
aspo
rang
iace
aeLe
gion
ella
ceae
Leuc
onos
toca
ceae
Noc
ardi
acea
eN
ocar
dioi
dace
aeN
ocar
diop
sace
aeN
osto
cace
aeP
aeni
baci
llace
aeP
aste
urel
lace
aeP
hyllo
bact
eria
ceae
Pla
ncto
myc
etac
eae
Pse
udom
onad
acea
eP
seud
onoc
ardi
acea
eR
hizo
biac
eae
Rho
doba
cter
acea
eR
hodo
cycl
acea
eS
hew
anel
lace
aeS
phin
gom
onad
acea
eS
yntr
opho
mon
adac
eae
Syn
trop
horh
abda
ceae
The
rmoa
naer
obac
tera
ceae
The
rmoa
naer
obac
tera
les
Fam
ily II
IT
herm
ofila
ceae
The
rmom
onos
pora
ceae
The
rmot
ogac
eae
Vib
riona
ceae
Xan
thom
onad
acea
e
Fra
ncis
ella
ceae
0–3 cm3–6 cm18–21 cm
Figure 33.2 Relative proportion of families as a function of soil depth. The intensity corresponds to the microarray hybridization fluorescencesignal.
from eukaryotic cells via a Nycodenz density gradient[Courtois et al., 2001; Lefevre et al., 2008]. To comparethe two methods, DNA was extracted with the same pro-tocol (MP Bead beating Fast Prep [Griffiths et al., 2000],but with direct (horizontal and vertical soil sampling)or indirect (with different centrifugation speeds duringthe cell ring formation) extraction approaches. The RISAprofiles were then compared in a principal componentanalysis (PCA) (Fig. 33.3). Results demonstrate thedifferences between the direct and the indirect extractionmethods in terms of metagenomic DNA. The RISAprofiles are clearly different between the two methods,showing different peaks and intensities (data not shown).However, varying the centrifugation speed does notsignificantly change the DNA within the Nycodenz ring.
33.6.3 Prokaryote Cells SeparationUsing the indirect extraction method, a prokaryote cellring appears inside the density gradient. But interestingly,some cells stay above the ring and can be separated fromthe others based on their centrifugation velocity differ-ences. The number of these cells above the Nycodenzring was limited: Only a few nanograms of DNA couldbe extracted, but the relative species proportions varystrongly during the gradient (Fig. 33.4).
This physical cell separation increases considerablythe number of detectable species, including those thatwould normally be undetected, due to their separationfrom the regular prokaryotic cell ring. Moreover, centrifu-gation speed can be modified because it changes relativespecies proportions in less dense fractions without affect-ing the cell ring.
33.6.4 Cells Lysis StringencyFractionationAfter separating soil and cells, extractable DNA has beenfurther separated using cells lysis stringency. The processvaries between protocols in terms of physical, chemi-cal, and enzymatic lyses. Experiments were carried outdirectly from soil, or from the cell ring, in suspensionor in agarose plugs. These approaches accessed differentdiversities with different species proportions representedin the extracted DNA (Fig. 33.5).
Moreover, because this step is essential in any DNAextraction protocol, it produces considerable biases insoil DNA recovering. Every DNA extraction protocolis biased to some degree; therefore, the choice ofone unique approach is dependent on the researcher.An alternative would be to use several different lyses
292 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
15.1% d = 10
25.9%vertical
horizontal
cell-ring
Figure 33.3 Principal component analysis [showing thefirst and second components] of the matrix data containingthe RISA electropherograms corresponding to verticalsampling (seven fractions of three centimeters between 0and 21 cms), horizontal sampling (fraction 3–6 cm for fourdistinct areas of the Park Grass 3d plot), and cell ringsformed with different centrifugation speeds (1000g ,2000g , 5000g , and 9000g).
Aci
doba
cter
iace
aeA
cido
ther
mac
eae
Act
inos
ynne
mat
acea
eA
licyc
loba
cilla
ceae
Alte
rom
onad
acea
eB
acte
roid
acea
eB
ogor
iella
ceae
Bru
cella
ceae
Con
exib
acte
race
aeC
oryn
ebac
teria
ceae
Crib
lam
ydia
ceae
Cyc
loba
cter
iace
aeD
eino
cocc
acea
eD
esul
farc
ulac
eae
Des
ulfo
bact
erac
eae
Des
ulfu
rella
ceae
Dro
soph
ilida
eE
rysi
pelo
tric
hace
aeF
lavo
bact
eria
ceae
Fra
ncis
ella
ceae
Fus
obac
teria
ceae
Gal
lione
llace
aeG
eode
rmat
ophi
lace
aeG
lyco
myc
etac
eae
Gor
doni
acea
eH
alia
ngia
ceae
Hol
opha
gace
aeH
ypho
mic
robi
acea
eJo
nesi
acea
eK
oprii
mon
adac
eae
Met
hano
cocc
acea
eM
etha
nom
icro
biac
eae
Met
hano
saet
acea
eM
ethe
rmic
occa
ceae
Met
hylo
bact
eria
ceae
Mor
itella
ceae
Myc
obac
teria
ceae
Oce
anos
piril
lace
aeO
palin
idae
Pae
niba
cilla
ceae
Par
achl
amyd
iace
aeP
atul
ibac
tera
ceae
Pis
ciric
ketts
iace
aeP
lanc
tom
ycet
acea
eP
revo
tella
ceae
Pro
chlo
rotr
icha
ceae
Syn
trop
hace
aeT
herm
oana
erob
acte
race
aeT
herm
ococ
cace
aeT
herm
opro
teac
eae
35
30
25
20
15
10
5
0
Fraction 4 2000g
Fraction 1 9000g
Cell ring 9000g
Cell ring 2000g
Figure 33.4 Relative proportion of families as a function of a density gradient. The intensity corresponds to the hybridization fluorescencesignals.
methods before starting metagenomic studies [Delmontet al., 2011].
33.6.5 Molecular DNA WeightSeparationFinally, metagenomic DNA (agarose plug protocol) wereseparated as a function of DNA molecular weight in a
pulsed field gel electrophoresis (PFGE). Phylochip anal-yses showed some prokaryote diversity differences whenextracting two bands at different points of the DNA smear(50 and 250 kbp) (Fig. 33.6).
In particular, some families were detected only in oneof these two bands. These results demonstrate that nucleicdiversity separation occurred during the soil metagenomicDNA migration. The diversity differences appear to be
33.7 Metagenome Comparisons 293
MP BIO101 0–3cm direct lysis MP BIO101 cell ring
Gram-positive cell ring plug lysis 1 cell ring
plug lysis 2 cell ring plug lysis 3 cell ring
Aci
doba
cter
iace
ae
40
35
30
25
20
15
10
5
0
Aer
omon
adac
eae
Bac
illac
eae
Bra
dyrh
izob
iace
aeB
ruce
llace
aeB
urkh
olde
riace
aeC
aulo
bact
erac
eae
Chl
orob
iace
aeC
hrom
atia
ceae
Clo
strid
iace
aeC
lost
ridia
les
Fam
ily X
VII
Cor
ioba
cter
iace
aeC
oxie
llace
aeD
ietz
iace
aeE
nter
obac
teria
ceae
Ent
eroc
occa
ceae
Eub
acte
riace
aeF
lavo
bact
eria
ceae
Fle
xiba
cter
acea
eF
ranc
isel
lace
aeG
eode
rmat
ophi
lace
aeG
lyco
myc
etac
eae
Intr
aspo
rang
iace
aeLe
gion
ella
ceae
Leuc
onos
toca
ceae
Mic
roba
cter
iace
aeN
ocar
diac
eae
Noc
ardi
oida
ceae
Noc
ardi
opsa
ceae
Nos
toca
ceae
Pae
niba
cilla
ceae
Pas
teur
ella
ceae
Phy
lloba
cter
iace
aeP
lano
cocc
acea
eP
ropi
onib
acte
riace
aeP
seud
omon
adac
eae
Rhi
zobi
acea
eR
hodo
bact
erac
eae
Sph
ingo
mon
adac
eae
Syn
trop
hom
onad
acea
eS
yntr
opho
rhab
dace
aeT
herm
oana
erob
acte
race
aeT
herm
oana
erob
acte
rale
s F
amily
...T
herm
ofila
ceae
The
rmom
onos
pora
ceae
The
rmot
ogac
eae
Thi
otric
hace
aeV
eillo
nella
ceae
Vib
riona
ceae
Xan
thom
onad
acea
e
Figure 33.5 Relative proportion of families as a function of the different lyses. The intensity corresponds to the hybridization fluorescencesignals.
less than for the three other DNA separation methods (soil,cells, and lysis stringency variables), but they suggesta bias involved in fosmid library constructions as DNAneeds to have a molecular length between 25 and 40 kbp.
33.7 METAGENOMECOMPARISONS
In spite of its ambitious objectives, Metasoil is just oneof a multitude of environmental sequencing projects ongo-ing in laboratories from around the world [Dinsdale et al.,2008; Willner et al., 2009]. Thus, in addition to data pro-duction, there will be future requirements for analyzinglarge datasets. In the future, bioinformatics will be a crit-ical tool to describe the distribution and function of the1030 prokaryotes on earth. However, the specific prob-lems, biases, and limits of these comparisons need tobe well understood in order to accurately assess thesedata. Moreover, this understanding is crucial to choose thebest normalization to compare the distribution of speciesand functions. A nonexhaustive list of 10 variables thatmight limit strong metagenomic data comparisons is sug-gested here:
1. The metagenome diversity can vary tremendouslybetween environments. This diversity is known to behigh in sediments and soils but limited in animalsand very low in some extreme environments likeacid mine biofilms.
2. The average genomes sizes change as a function ofthe environment.
3. The metagenome variability at a fine scale can beimportant in specific environments.
4. The metagenome variability over time can be high insome environments and low in stable ecosystems.
5. The metagenomic sequencing depth is unequal due tovariable efforts in different projects.
6. The DNA extraction approaches used are often differ-ent, thus leading to sequencing different diversities.
7. The length of sequences varies as a function of thesequencing technology used and can influence thenumber of functions and species detected as well asthe quality of the sequence match.
8. The phylogenetic distribution of the completesequences used for sequence annotations is biased.
294 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
12
10
8
6
4
2
0
Ace
toba
cter
acea
eA
erom
onad
acea
eA
ltero
mon
adac
eae
Bac
illac
eae
Bra
dyhi
zobi
acea
eB
ruce
llace
aeB
urkh
olde
riace
aeC
aulo
bact
erac
eae
Chr
omat
iace
aeC
lost
ridia
ceae
Clo
strid
iale
s F
amily
XV
IIC
orio
bact
eria
ceae
Cox
iella
ceae
Die
tzia
ceae
Ect
othi
orho
dosp
irace
aeE
nter
obac
teria
ceae
Eub
acte
riace
aeF
lavo
bact
eria
ceae
Fle
xiba
cter
acea
eF
ranc
isel
lace
aeG
eode
rmat
ophi
lace
aeG
lyco
myc
etac
eae
Hel
icob
acte
race
aeH
olos
pora
ceae
Intr
aspo
rang
iace
aeLe
gion
ella
ceae
Leuc
onos
toca
ceae
Mic
roba
cter
iace
aeM
icro
cocc
acea
eN
ocar
dioi
dace
aeN
ocar
diop
sace
aeN
osto
cace
aeO
xalo
bact
erac
eae
Pae
niba
cilla
ceae
Pas
teur
ella
ceae
Phy
lloba
cter
iace
aeP
lanc
tom
ycet
acea
eP
seud
omon
adac
eae
Rhi
zobi
acea
eR
hodo
bact
erac
eae
Sph
ingo
mon
adac
eae
Syn
trop
hom
onad
acea
eS
yntr
opho
rhab
dace
aeT
herm
oana
erob
acte
race
aeT
herm
oana
erob
acte
rale
s F
amily
...T
herm
omon
ospo
race
aeT
herm
otog
acea
eT
hiot
richa
ceae
Vib
riona
ceae
Xan
thom
onad
acea
e
High-molecular-weight DNA
Low-molecular-weight DNA
Figure 33.6 Relative proportion of families as a function of the DNA molecular weight. The intensity corresponds to the hybridizationfluorescence signals.
Table 33.1 Five Characteristics Corresponding to Seven Metagenomes from Oceans, Soils, and Human Feces
Average of Number of Number of Uploading FunctionsMetagenome Sequences Sequences Base Pairs Date Annotated (%)
Ocean1 (4441573.3) 1012.13 317180 321026307 Tue Oct 21 15:52:12 2008 72.18Ocean2 (4441574.3) 1007.74 368835 371688861 Tue Oct 21 15:52:47 2008 69.71Ocean3 (4443713.3) 238.87 217549 51966974 Wed Jun 3 16:11:04 2009 39.32Soil1 (4445203.3) 533.8 609811 325515132 Mon Oct 26 11:57:26 2009 38.03Soil2 (4445205.3) 538.93 552268 297632387 Mon Oct 26 12:06:43 2009 32.3Human feces1 (4440825.3) 190.39 495865 94405318 Fri May 2 13:38:05 2008 25.64Human feces2 (4440613.3) 335.01 302780 101434082 Thu Feb 14 10:45:23 2008 32.71
9. The databases evolved in time.
10. The percentage of annotated sequences can be highlyvaried as a function of the metagenomes.
Of course, these 10 variables do not all have thesame influence on metagenomic comparisons, but itis difficult to know which are critical and which arenot. Moreover, some of these variables are only due tothe inconsistency of the metagenomes, and thus theyare difficult to avoid. In order to determine if thesevariables are incompatible with metagenomic compar-isons, seven metagenomes were analyzed (MG-RAST
online server, http://metagenomics.nmpdr.org/) and theannotated function distribution was compared (E value<10−5). As expected, the functional characteristics ofthe metagenomes are clearly different (Table 33.1).
Based on either the number of base pairs, the numberof sequences, or the percentage of annotated sequences,three data normalizations can be compared based onmetabolic system distributions. A PCA corresponding tothe functional frequency of these metagenomes was pro-duced as a function of these normalizations (Fig. 33.7).The normalization based on the total number of annotated
(A)
31.0
7%d
= 1
0
39.8
8%
ocea
n3
ocea
n1oc
ean2
Hum
an1
Hum
an2
soil1 so
il2
(B)
25.0
5%d
= 1
0
49.9
4%oc
ean1
ocea
n2
Hum
an2
Hum
an1
ocea
n3
soil2
soil1
(C)
30.7
9%d
= 1
0
43.1
7%H
uman
1H
uman
2
soil1
soil2
ocea
n1oc
ean2oc
ean3
Fig
ure
33.7
PCA
corr
espo
ndin
gto
the
func
tion
aldi
stri
buti
onof
seve
nm
etag
enom
esse
quen
ced
from
ocea
ns,
soil
,an
dhu
man
fece
sas
afu
ncti
onof
thre
eno
rmal
izat
ion
poss
ibili
ties.
(A)
Nor
mal
izat
ion
asa
func
tion
ofth
enu
mbe
rof
base
pair
sse
quen
ced;
(B)
norm
aliz
atio
nas
afu
ncti
onof
the
num
ber
ofse
quen
ces;
(C)
norm
aliz
atio
nas
afu
ncti
onof
the
anno
tate
dse
quen
ces.
295
296 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
sequences appears to be the most consistent method of thethree in order to reduce the impact of variation related tothe different sequencing projects.
This successful comparison suggests that globalcomparisons of different ecosystems are possible and thatthe different biases are insignificant when compared tothe large differences in metabolic systems between theseecosystems. This might not be the case when comparingwithin ecosystems.
33.8 CONCLUSION
In order to fully assess the biodiversity in one soil, deepsequencing is required. The possible sequencing depth isa function in part of the DNA extraction methods andhow rare DNA can be extracted and sequenced. An ini-tiative to sequence the Rothamsted soil was started witha French national project (Metasoil), which will provide100 Gpb of pyrosequences and two million fosmid (40-kb inserts) clones. By pushing back the frontiers of onesoil, other soil sequencing projects will be able to assessthe likelihood of novel discoveries with their soils. If therare biosphere exists in all ecosystems like in soil, thenin the future these other environments need to be deeplysequenced to discover new active molecules and for globalmetagenomic comparisons.
For successful deep sequencing, DNA extractionmethods need to be developed in the different environ-ments. In fact, to date, only yield and purity optimizationefforts are generally used by researchers [e.g., Lemarc-hand et al., 2005; Krasova-Wade and Neyra, 2007] toaccess metagenomic DNA. Other parameters like molec-ular weight DNA can be optimized for specific projects(e.g., fosmid library construction) [Bertrand et al., 2005].Nevertheless, results presented in this chapter underlinethe incapacity of this kind of approach to access completesoil metagenomes. The international community needsto improve diversity recovery optimization strategies andparticularly for deep sequencing metagenome projects,such as we have done for Metasoil. This DNA acquisitioneffort will decrease biases involved in metagenomicapproaches and increase our understanding about the1030 prokaryotes present on Earth and improve our rateof discovery of genes of interests.
In addition, it is important to enhance metatranscrip-tomic projects (see also Chapter 62–65, Vol. I) to studynot only the genetic potential of an environment, but alsothe active genes. In this way, the importance of the dif-ferent ecosystems (especially soil) for different services(nutrient cycling, climate change buffering, crop produc-tion and protection, drug discovery) can be accuratelyassessed.
AcknowledgmentsWork described in this chapter that was conducted inthe authors’ laboratory has been supported by the FrenchNational Research Agency (ANR GMGE Metasoil), andTD was financed by the Rhone–Alpes region.
REFERENCES
Adesina MF, Lembke A, Costa R, Speksnijder A, Smalla K. 2007.Screening of bacterial isolates from various European soils for invitro antagonistic activity towards Rhizoctonia solani and Fusariumoxysporum: Site-dependent composition and diversity revealed. SoilBiol. Biochem . 39:2818–2828.
Amann RI, Ludwig W, Schleifer KH. 1995. Phylogenetic identi-fication and in situ detection of individual microbial cells withoutcultivation. Microbiol. Rev . 59:143–169.
Baker-Austin C, Wright MS., Stepanauskas R, McArthur JV.2006. Co-selection of antibiotic and metal resistance. Trends. Micro-biol . 14:176–182.
Berry AE, Chiocchini C, Selby T, Sosio M, Wellington EM. 2003.Isolation of high molecular weight DNA from soil for cloning intoBAC vectors. FEMS Microbiol. Lett . 223:15–20.
Bertrand H, Poly F, Van VT, Lombard N, Nalin R, Vogel TM,Simonet P. 2005. High molecular weight DNA recovery from soilsprerequisite for biotechnological metagenomic library construction. J.Microbiol. Methods 62:1–11.
Boubakri H, Beuf M, Simonet P, Vogel TM. 2006. Developmentof metagenomic DNA shuffling for the construction of a xenobioticgene. Gene 375:87–94.
Courtois S, FrostegAArd A, Goransson P, Depret G, JeanninP, Simonet P. 2001. Quantification of bacterial subgroups in soil:Comparison of DNA extracted directly from soil or from cells previ-ously released by density gradient centrifugation. Environ. Microbiol .3:431–439.
Courtois S, Cappellano CM, Ball M, et al., 2003. Recombi-nant environmental libraries provide access to microbial diversityfor drug discovery from natural products. Appl. Environ. Microbiol .69:49–55.
Curtis TP, Sloan WT, Scannell JW. 2002. Estimating prokaryoticdiversity and its limits. Proc. Natl. Acad. Sci. USA 99:10494–10499.
Davis KE, Joseph SJ, Janssen PH. 2005. Effects of growth medium,inoculum size, and incubation time on culturability and isolation ofsoil bacteria. Appl. Environ. Microbiol . 71:826–834.
Dcosta VM, Griffiths E, Wright GD. 2007. Expanding the soilantibiotic resistome: Exploring environmental diversity. Curr. Opin.Microbiol . 10:481–489.
Delmont TO, Robe P, Cecillon S, Clark IM, Constancias F,Simonet P, Hirsch PR, Vogel TM. 2011. Accessing the soilmetagenome for studies of microbial diversity. Appl. Environ. Micro-biol. 77:1315–1324.
Demaneche S, Sanguin H, Pote J, Navarro E, Bernillon D,Mavingui P et al., 2008. Antibiotic-resistant soil bacteria in trans-genic plant fields. Proc. Natl. Acad. Sci. USA 105:3957–3962.
Dib J, Motok J, Zenoff VF, Ordonez O, Farias ME. 2008. Occur-rence of resistance to antibiotics, UV-B, and arsenic in bacteriaisolated from extreme environments in high-altitude [above 4400m]Andean wetlands. Curr. Microbiol . 56:510–517.
Dinsdale, EA, Edwards RA, Hall D et al., 2008. Functionalmetagenomic comparison profiling of nine biomes. Nature 452:629–632.
References 297
Falkowski PG. 2001. Biogeochemical cycles. Encyclopedia Biodivers .1:437–453.
Galvao TC, Mohn WW, de Lorenzo V. 2005. Exploring the micro-bial biodegradation and biotransformation gene pool. Trends Biotech-nol . 23:497–506.
Gans J, Wolinsky M, Dunbar J. 2005. Computational improvementsreveal great bacterial diversity and high metal toxicity in soil. Science309:1387–1390.
Gebert J, Stralis-Pavese N, Alawi M, Bodrossy L. 2008. Analysisof methanotrophic communities in landfill biofilters using diagnosticmicroarray. Environ. Microbiol . 10:1175–1188.
Ginolhac A, Jarrin C, Gillet B, Robe P, et al., 2004. Phylogeneticanalysis of polyketide synthase I domains from soil metagenomiclibraries allows selection of promising clones. Appl. Environ. Micro-biol . 70:5522–5527.
Ginolhac A, et al., 2005. Type I polyketide synthases may haveevolved through horizontal gene transfer. J. Mol. Evol . 60:716–725.
Griffiths RI, Whitely AS, O’Donnell AG, Bailey MJ. 2000. Rapidmethod for co-extraction of DNA and RNA from natural environ-ments for analysis of ribosomal DNA- and rRNA-based microbialcommunity composition. Appl. Environ. Microbiol . 66:5488–5491.
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM.1998. Molecular biological access to the chemistry of unknownsoil microbes: A new frontier for natural products. Chem. Biol . 5:R245–R249.
Handelsman J, Wackett LP. 2002. Ecology and industrial microbi-ology: Microbial diversity—sustaining the Earth and industry. Curr.Opin. Microbiol . 5:237–239.
Hery M, Nazaret S, Jaffre T, Normand P, Navarro E, 2003. Adap-tation to nickel spiking of bacterial communities in neocaledoniansoils. Environ. Microbiol . 5:3–12.
Huber JA, Mark Welch DB, Morrison HG, Huse SM, Neal PR,Butterfield DA, Sogin ML. 2007. Microbial population structuresin the deep marine biosphere. Science 318:97–100.
Huyghe A et al. 2008. Novel microarray design strategy tostudy complex bacterial communities. Appl. Environ. Microbiol .74:1876–1885.
Knietch A, Waschkowitz T, Bowien S, Henne A and Daniel R.2003. Metagenomes of complex microbial consortia derived fromdifferent soils as sources for nobel genes conferring formation ofcarbonyls from short-chain polyols on Echerichia coli . J. Microbiol.Biotechnol . 5:46–56.
Krasova-Wade T, Neyra M. 2007. Optimization of DNA isolationfrom legume nodules. Lett. Appl. Microbiol . 45:95–99.
Larose C, Berger S, Ferrari C, Navarro E, Dommergue A, Schnei-der D, Vogel TM. 2010. Microbial sequences retrieved from envi-ronmental samples from seasonal Arctic snow and meltwater fromSvalbard, Norway. Extremophiles 14:205–212.
Lefevre F, Robe P, Jarrin C, Ginolhac A, Zago C, Auriol D,et al. 2008. Drugs from hidden bugs: their discovery via untappedresources. Res. Microbiol . 159:153–161.
Lemarchand K, Berthiaume F, Maynard C, Harel J, PaymentP, Bayardelle P, Masson L, Brousseau R. 2005. Optimization ofmicrobial DNA extraction and purification from raw wastewater sam-ples for downstream pathogen detection by microarrays. J. Microbiol.Methods 63:115–126.
Leveau JHJ. 2007. The magic and menace of metagenomics: Prospectsfor the study of plant growth-promoting rhizobacteria. Eur. J. PlantPathology 11:279–300.
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, et al.2008. The metagenomics RAST server—A public resource for theautomatic phylogenetic and functional analysis of metagenomes BMCBioinformatics 9:386.
Morales SE, Cosart TF, Johnson JV, Holben WE. 2009. Extensivephylogenetic analysis of a soil bacterial community illustrates extreme
taxon evenness and the effects of amplicon length, degree of coverage,and DNA fractionation on classification and ecological parameters.Appl. Environn. Microbiol . 75:668–675.
Orgam A, Sayler GS, and Barbay T. 1987. The extraction andpurification of microbial DNA from sediments. J. Microb. Methods7:57–66.
Ortız-Castro R, Contreras-Cornejo HA, Macıas-Rodrıguez L,Lopez-Bucio J. 2009. The role of microbial signals in plant growthand development. Plant Signal Behav . 4:701–712.
Pasic L, Kovce B, Sket B, Herzog-Velikonja B. 2010. Diversityof microbial communities colonizing the walls of a Karstic cave inSlovenia. FEMS Microbiol. Ecol . 71:50–60.
Pointing SB, Chan Y, Lacap DC, Lau MC, Jurgens JA, FarrellRL. 2009. Highly specialized microbial diversity in hyper-arid polardesert. Proc. Natl. Acad. Sci. USA 106:19964–19969.
Pyrosequencing enumerates and contrasts soil microbial diversity. ISMEJournal . 1:283–290.
Rajendhran J, Gunasekaran P. 2008. Strategies for accessing soilmetagenome for desired applications. Biotech. Adv . 26:576–590.
Ranjard L, Richaume AS. 2001. Quantitative and qualita-tive microscale distribution of bacteria in soil. Res. Microbiol .152:707–716.
Richard JY, Vogel TM. 1999. Characterization of a soil bacterial con-sortium capable of degrading diesel fuel. Intern. Biodet. Biodegrad .44:93–100.
Robe P, Nalin R, Capellano C, Vogel TM, Simonet P. 2003.Extraction of DNA from soil. Euc. J. Soil Biol . 39:183–190.
Roesch LL, Fulthorpe RR, Riva A, Casella G, Hadwin AKM,Kent AD, et al., 2007.
Sanguin H et al., 2006. Development and validation of a prototype 16SrRNA-based taxonomic microarray for Alphaproteobacteria. Environ.Microbiol . 8:289–307.
Shendure J, Ji H. 2008. Next-generation DNA sequencing. Nat.Biotech . 26:1135–1145.
Silvertown J, et al., 2006. The Park Grass Experiment 1856–2006:Its contribution to ecology. Pyrosequencing enumerates and contrastssoil microbial diversity. ISME J . 1:283–290.
Singh S, Kang SH, Mulchandani A, Chen W. 2008. Bioremediation:Environmental clean-up through pathway engineering. Curr. Opin.Biotechnol . 19:437–444.
Staley JT, Konopka A. 1995. Measurement of in situ activities ofnonphotosynthetic microorganisms in aquatic and terrestrial habitats.Annu. Rev. Microbiol . 39:321–346.
Torsvik V, Goksoyr J, Daae FL. 1990. High diversity in DNA of soilbacteria. Appl. Environ. Microbiol . 56:782–787.
Torsvik V, Ovreas L, Thingstad TF. 2002. Prokaryoticdiversity—Magnitude, dynamics, and controlling factors. Science296:1064–1066.
Tringe SG, Mering CV, Kobayashi A, Salamov AA, Chen K,Chang HW et al., 2005. Comparative metagenomics of microbialcommunities. Science 308:554–557.
Turnbaugh PJ et al., 2006. An obesity-associated gut microbiomewith increased capacity for energy harvest. Nature 444:1027–1031.
van Elsas JD, Mantynen V, Wolters AC. 1997. Soil DNA extrac-tion and assessment of the fate of Mycobacterium cholorophenolicumstrain PC-1 in different soils by 16S ribosomal gene sequence basedmost probable number PCR and immunofluorescence. Biol. Fertil.Soils 24:188–195.
van Elsas JD, Jansson JK, Trevors JT. 2006. Modern Soil Microbi-ology II , Boca Raton, FL: CRC Press.
Vogel TM, Simonet P, Jansson JK, Hirsch PR, Tiedje JM, vanElsas JD, et al. 2009. TerraGenome: A consortium for the sequencingof a soil metagenome. Nat. Rev. Microbiol . 7:2.
298 Chapter 33 Soil Metagenomic Exploration of the Rare Biosphere
Wainwright M, Wickramasinghe NC, Narlikar JV, RajaratnamP. 2003. Microorganisms cultured from stratospheric air samplesobtained at 41km. FEMS Microbiol Lett . 218:161–165.
Ward BB et al. 2007. Ammonia-oxidizing bacterial community com-position in estuarine and oceanic environments assessed using a func-tional gene microarray. Environ. Microbiol . 9:2522–2538.
Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: Theunseen majority. Proc. Natl. Acad. Sci. USA 95:6578–6583.
Willner D, Thurber RV, Rohwer F. 2009. Metagenomic signa-tures of 86 microbial and viral metagenomes. Environ. Microbiol .11:1752–1766.