32
The BARCODE Data Standard as a Cross-Cultural Bridge David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution [email protected] ; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938

Schindel i evobio norman ok - jun 11

Embed Size (px)

DESCRIPTION

DNA Barcode Data Standards presentation at the iEvoBio (Informatics for Evolutionary Biology) meeting in Norman, OK, 22 June 2011

Citation preview

Page 1: Schindel   i evobio norman ok - jun 11

The BARCODE Data Standard as a

Cross-Cultural Bridge

David E. Schindel, Executive SecretaryNational Museum of Natural History

Smithsonian [email protected]; http://www.barcoding.si.edu

202/633-0812; fax 202/633-2938

Page 2: Schindel   i evobio norman ok - jun 11

Gaining Large Scale Through Standards

Are our data meant only for small segregated communities of practice or bigger audiences?

Accelerate progress, Economies of scale– Re-use and new use of data, synthesis,

comparative analysis– Shared hardware and software– Standardized protocols, easier training and

technical assistance– Applications by non-specialists (regulatory

agencies, citizen scientists, K-12 classroom)

Page 3: Schindel   i evobio norman ok - jun 11

www.e-biosphere09.org

Page 4: Schindel   i evobio norman ok - jun 11

Species Identification MattersBasic research:– One more character set, but digital and calibrated– Standardized yardstick for measuring variability

and divergence– Objective comparison across taxa, distance– Links to Linnean names– Triage by non-specialists for species discovery– Ecology of juveniles, gut contents, fecal matter– Shallow phylogenies showing history of

community assemblages– Subject to weaknesses of any single character

(convergence, pseudogenes, introgression, etc.)

Page 5: Schindel   i evobio norman ok - jun 11

Species Identification MattersApplied research/regulation by non-specialistsAgricultural pests/beneficial speciesEndangered/protected species Disease vectors/pathogensEnvironmental quality indicatorsInvasive species (e.g., in ballast water)Managing for sustainable harvestingConsumer protection, ensuring food qualityFidelity of seedbanks, culture collections

Page 6: Schindel   i evobio norman ok - jun 11

6

Page 7: Schindel   i evobio norman ok - jun 11
Page 8: Schindel   i evobio norman ok - jun 11

An Internal ID System for All Animals

Typical Animal Cell

Mitochondrion

DNA

mtDNA

D-Loop

ND5

H-strand

ND4

ND4L

ND3COIII

L-strand

ND6

ND2

ND1

COII

Small ribosomal RNA

ATPase subunit 8

ATPase subunit 6

Cytochrome b

COICOI

The Mitochondrial Genome

Page 9: Schindel   i evobio norman ok - jun 11

Non-COI regions for other taxaLand plants:– Chloroplast matK and rbcL approved Nov 09

– 70-75% resolving ability, higher in angiosperms– Non-coding plastid and nuclear regions being

explored

Fungi:– CBOL Working Group met this week in Amsterdam– Agreed to recommend ITS; 72% effective

Protists:– CBOL Working Group July meeting, Berlin

Page 10: Schindel   i evobio norman ok - jun 11

How Barcoding Works

PHASE 1: Build a barcode reference library:– Well-identified specimen– Tissue subsample– DNA extraction, PCR amplification– DNA sequencing– Data submission to GenBank

PHASE 2: Identify unknowns:– Any unidentified juvenile, adult, fragment, product– Tissue sample, DNA, sequencing– Comparison with sequences in reference library

Page 11: Schindel   i evobio norman ok - jun 11

• Promote barcoding as a global standard

• Build participation• Working Groups• BARCODE standard• International

Conferences• Increase production

of public BARCODE records

Networks, Projects, Organizations

Barcode of Life Community1,264,000 specimens already barcoded from 104,500 species

Page 12: Schindel   i evobio norman ok - jun 11

Barcode of Life Data Systems (BOLD)University of Guelph

Workbench with 1.27M records, 105K species/OTUs

Page 13: Schindel   i evobio norman ok - jun 11

USER

/GenBank

Key

Mirroring

Update Channel

Private Records

BARCODE Record Flow Chart

Page 14: Schindel   i evobio norman ok - jun 11

BARCODE Records in GenBank

Page 15: Schindel   i evobio norman ok - jun 11

Submission of BARCODE Records to EBI and DDBJ

Page 16: Schindel   i evobio norman ok - jun 11

Canad

a

Unite

d Sta

tes

Europ

e

China

Austra

lia

Mex

ico

South

Afri

ca

Brazil

New Z

eala

nd

Norway

Argen

tina

Indi

a

Costa

Rica

Mad

agas

car

Panam

aPer

u

Pakist

an

Russia

Kenya

South

Kor

ea

Colom

bia

Saudi

Ara

bia

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

iBOL Barcodes By NodeB

arco

des

Page 17: Schindel   i evobio norman ok - jun 11

Barcode Sequence

Voucher Specimen

Species Name

Specimen Metadata

Literature(link to content or

citation)

BARCODE Records in INSDC

Indices - Catalogue of Life - GBIF/ECAT

Nomenclators - Zoo Record - IPNI - NameBank

Publication links - New species

GeoreferenceHabitat

Character setsImages

BehaviorOther genes

Trace filesOther

DatabasesPhylogenetic

Pop’n GeneticsEcological

Primers

Databases - Provisional sp.

Page 18: Schindel   i evobio norman ok - jun 11

Traditional Taxonomy

GSC Minimum Standards

(MI*)

Traditional GenBank

Voucher specimen ID XXX XXXSpecies ID XXX X X

Identified by XXXDNA sequence XXX XXXGene region XXXGeographic origin (country, ocean) XXX XLatitude/Longitude XXX XXX

Collection date, collector name XXX XXX

Trace files XXX XXPrimer information X XX

Page 19: Schindel   i evobio norman ok - jun 11
Page 20: Schindel   i evobio norman ok - jun 11
Page 21: Schindel   i evobio norman ok - jun 11

Linkout from GenBank to BOLD

Page 22: Schindel   i evobio norman ok - jun 11
Page 23: Schindel   i evobio norman ok - jun 11

ISBER: 13 May 2009

Linkout from GenBank to Taxonomy

Page 24: Schindel   i evobio norman ok - jun 11
Page 25: Schindel   i evobio norman ok - jun 11

ISBER: 13 May 2009

Link from GenBank to Museums

Page 26: Schindel   i evobio norman ok - jun 11

Darwin Core TripletStructured Link to Vouchers

Institutional Acronym

Collection Code

Catalog ID

: :

Page 27: Schindel   i evobio norman ok - jun 11

Structured Link to Vouchers

NHM LEP 123456: :

personal DHJanzen SRNP12345: :

Page 28: Schindel   i evobio norman ok - jun 11

NCBI’s Biorepository List

Compiled from Index Herbariorum, literature sources, GenBank submissions

6,936 records

1,177 records with non-unique acronyms

517 homonymous acronyms

374 shared by two records

143 shared by three records

Page 29: Schindel   i evobio norman ok - jun 11

AMNHIcelandic Institute of Natural History, Akureyri Division Akureyri Iceland

AMNH American Museum of Natural History New York USA

UNL Universidad Autónoma de Nuevo León Monterrey, Nuevo León Mexico

UNL University of Nebraska State Museum Lincoln, Nebraska USA

UNLCentro de Estratigrafia e Paleobiologia da Universidade Nova de Lisboa Monte de Caparica Portugal

ZMK Zoological Musem, Kristiania Oslo Norway

ZMK Zoologisches Museum der Universität Kiel Kiel Germany

ZMK Zoological Museum, Copenhagen Copenhagen Denmark

Page 30: Schindel   i evobio norman ok - jun 11

CBOL/GBIF/NCBI Registry of Biorepositories

www.biorepositories.org

Page 31: Schindel   i evobio norman ok - jun 11

Collecting events,

specimens

Specimen clustering

Formal naming

Comparisons, concept

validation

Taxon concept formation, refinement

BARCODE data release with provisional nomenclature (PLoS)

Specimen data release (GBIF)

Collaborative consensus-building of taxon concepts (CATE)

Accessibility

Two Taxonomic Research Processes

Sharing of non-BARCODE data (ScratchPads)

Page 32: Schindel   i evobio norman ok - jun 11

Long-term data curationof BARCODE records

Data records assembled in

BOLD

IDs consistent with other records?

Compliant with BARCODE standards?

Data records released on

INSDC

Data records published in

BOLD

Community feedback

Update records

(audit trail of species names

retained)

CBOL control of BARCODE

flag

GenBank adds BARCODE flag