Agricultural and Biosystems Engineering
RefSoil – Continuous Progress
in Soil Reference DatabaseJinlyung Choi, Erick Cardenas,
Ryan Williams, Adina Howe
October 21, 2015 11:40 am
7th Annual Argonne Soil Metagenomics
Meeting
Agricultural and Biosystems Engineering
Soil metagenomic challenges
• The amount we know…
• Incredible microbial diversity
• Spatial heterogeneity
• Complex dynamics
• Lack of reference genomes (bacteria,
archaea, fungal)
Agricultural and Biosystems Engineering
Do we have a useful dictionary for soil?
Agricultural and Biosystems Engineering
Significance of a soil-specific reference
• Need standardized resource to
connect sequencing data at different
levels
• Integrate sequencing data towards
soil health and productivity
• Broadly enable “connecting the
dots”
Genes
Organisms
Communities
Ecosystems
Agricultural and Biosystems Engineering
HMP: A good example of curated DB
Agricultural and Biosystems Engineering
HMP: A good example of curated DB
Rhizosphere
/ Bulk Soil
Agricultural and Biosystems Engineering
Lessons from HMP• Take advantage of high throughput technologies to
characterize human microbiome of large number of
samples
• Determine whether associations between changes
in the microbiome and health disease
• Provide a standardized data resource and new
technological approaches to enable such studies to
be undertaken broadly in scientific community
• Identify elusive organisms and identify a most
wanted list
Agricultural and Biosystems Engineering
Proteobacteria, 267
Firmicutes, 92
Actinobacteria, 75
Bacteroidetes, 12
Cyanobacteria, 7
Tenericutes, 5
Acidobacteria, 5 Other, 29
Some initial efforts• Bacterial genomes
retrieved from Gold
database, JGI-IMG,
and selected those
associated with soil
habitats
• Manually curated to
exclude obligated
human pathogens and
extremophiles
• Databases are biased
and redundant
RefSoil (2011)Erick Cardenas, Aaron Garoutte,
Adina Howe, Jim Tiedje
492 organisms
19 phyla
Agricultural and Biosystems Engineering
What is included and not included
Included
• Have Full genome
sequence available
• Listed in RefSeq
• Associated with soil
habitats
Not included
• Human pathogens
• Extremophile
Agricultural and Biosystems Engineering
Here are Updates
• Increase DB size
: • Organisms:
• 492 1043
• Phyla:
• 19 21
• evaluate the
breadth and
diversity of this
database and its
ability to inform
soil microbiology Proteobacteria, 428
Firmicutes, 177
Actinobacteria, 103
Cyanobacteria, 25
Bacteroidetes, 19
Euryarchaeota, 18
Spirochaetes, 8
Other, 25
Agricultural and Biosystems Engineering
What could we use it for?• Soil specific database to identify “who” and “what” -
linked to organisms where strains are available for
future efforts
• Making data analysis easier for soil scientists
• Identify targets for culturing efforts
Agricultural and Biosystems Engineering
What could we use it for?• Soil specific database to identify “who” and “what” -
linked to organisms where strains are available for
future efforts
• Making data analysis easier for soil scientists
• Identify targets for culturing efforts
Agricultural and Biosystems Engineering
RefSoil can be used to Identify functional
information
Antibiotic Resistant Genes and
integrases found in RefSoilWilliams, Choi et al.
In preparation
Agricultural and Biosystems Engineering
RefSoil can be used to Identify functional
information
KEGG Metabolic pathway mapping
Glycan Biosynthesis
Metabolism of
Terpenoid and Polyketides
Agricultural and Biosystems Engineering
RefSoil can be used to identify
differences in soil
-3
0
3
6
-5 0 5 10
PC1: 40% variance
PC
2: 1
3%
varia
nce
group
CC
P
PF
46
81
216
gi|225847840|ref|NC_012438.1|
group
no
rmaliz
ed
co
unt
CC P PF
60
80
10
01
40
gi|260752245|ref|NC_013355.1|
group
no
rmaliz
ed
co
unt
CC P PF
50
060
07
00
90
0gi|407936729|ref|NC_018708.1|
group
no
rmaliz
ed
co
unt
CC P PF
10
15
20
30
40
gi|347755961|ref|NC_016025.1|
group
no
rmaliz
ed
co
unt
CC P PF
Sulfurihydrogenibium azorense Zymomonas mobilis subsp.
Acidovorax sp.Candidatus Chloracidobacterium
thermophilum B
Corn PrairieFertilized
PrairieCorn Prairie
Fertilized
Prairie
Corn PrairieFertilized
PrairieCorn Prairie
Fertilized
Prairie
Corn
Prairie
Fertilized
Prairie
Corn
Prairie
Fertilized
Prairie
Agricultural and Biosystems Engineering
What could we use it for?• Soil specific database to identify “who” and “what” -
linked to organisms where strains are available for
future efforts
• Making data analysis easier for soil scientists
• Identify targets for culturing efforts
Agricultural and Biosystems Engineering
Defined Soil-specified resources
makes your analysis easier
Agricultural and Biosystems Engineering
What could we use it for?• Soil specific database to identify “who” and “what” -
linked to organisms where strains are available for
future efforts
• Making data analysis easier for soil scientists
• Identify targets for culturing efforts
Agricultural and Biosystems Engineering
Earth Microbiome Project (EMP)
16s rRNA amplicon project
14 k samples
3 k samples in soil
Agricultural and Biosystems Engineering
Tree of Soil specific in EMP
Soil specific EMP
1.8 million nodes
Agricultural and Biosystems Engineering
What are we missing?
– 1) by placement in tree
Some genus missing
Phylum missing
Most genera
missing
Agricultural and Biosystems Engineering
What are we missing?
– 1) by placement in tree
Some genus missing
Phylum missing
Most genera
missing
Agricultural and Biosystems Engineering
What are we missing?
– 1) by placement in tree
Some genus missing
Phylum missing
Most genera
missing
Agricultural and Biosystems Engineering
What are we missing?
– 1) by placement in tree
Some genus missing
Phylum missing
Most genera
missing
Parvarchaeota, AC1, AD3, AncK6, BHI80-139, BRC1,
Caldiserica, Caldithrix, CD12, Chlamydiae,Deferribacteres,
Elusimicrobia, FBP, FCPU426, Fibrobacteres, Fusobacteria,
GAL15, GN01, GN02, GN04, GOUTA4, H-178, Hyd24-12,
Kazan-3B-28, KSB3, LCP-89, LD1, Lentisphaerae, MVP-21, ,
MVS-104, NC10, NKB19, OC31, , OD1, OP11, OP1, OP3,
OP8, OP9, , PAUC34f, Poribacteria, SAR406, SBR1093, SC4,
SR1, TA06, TM6, TM7, TPD-58, VHS-B3-43, WPS-2, WS1,
WS2, WS3, WS4, WS5, WS6, WWE1, ZB3
Agricultural and Biosystems Engineering
What are we missing? – 2) by abundance
Highly presence in the soil but not listed in the Reference
OTU ID Counts Taxonomy assignment
4457032 8007453 Verrucomicrobia.Spartobacteria.Chthoniobacterales
4471583 2937242 Verrucomicrobia.Spartobacteria.Chthoniobacterales
Novel.OTU.5 2025859 Proteobacteria.Gammaproteobacteria.Xanthomonadales
101868 1828706 Verrucomicrobia.Spartobacteria.Chthoniobacterales
Novel.OTU.22 1214770 Cyanobacteria.Chloroplast.Streptophyta
204154 994864 Acidobacteria.Chloracidobacteria.RB41
284946 903641 Bacteroidetes.Bacteroidia.Bacteroidales
807954 875988 Bacteroidetes.Saprospirae.Saprospirales
86097 868055 Acidobacteria.Chloracidobacteria.RB41
4423681 689209 Gemmatimonadetes.Gemmatimonadetes
Agricultural and Biosystems Engineering
Summary
Agricultural and Biosystems Engineering
Summary
Agricultural and Biosystems Engineering
acknowledgement