59
DNA Barcoding Johannes Bergsten Swedish Museum of Natural History Department of Entomology E-mail: [email protected] Biodiversity Informatics Course, 14-24 September, 2009 Swedish Museum of Natural History, Image credit: Barcoding institute of ontario

Johannes Bergsten Dna Barcoding

Embed Size (px)

DESCRIPTION

Johannes Bergsten lecture on Thursday, Sept 17, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.

Citation preview

Page 1: Johannes Bergsten Dna Barcoding

DNA BarcodingDNA Barcoding

Johannes BergstenSwedish Museum of Natural History

Department of EntomologyE-mail: [email protected]

Biodiversity Informatics Course, 14-24 September, 2009Swedish Museum of Natural History, Stockholm, Sweden

Ima

ge

cre

dit:

Ba

rco

din

g in

stitu

te o

f o

nta

rio

Page 2: Johannes Bergsten Dna Barcoding

How it all started in 2003How it all started in 2003

Propose a CO1-based (~650bp of the 5’ end)global identification system of animals, and show the success (96.4-100%) of assigningtest specimens to the correct phyla, order and species(Lepidoptera from Guelph) through a CO1-profile.

98% of congeneric species in 11 animal phyla showed>2% sequence divergence in CO1

Page 3: Johannes Bergsten Dna Barcoding

What is DNA Barcoding?What is DNA Barcoding?

• A way of identifying samples to species based on a short standardised gene-region

• Keywords:

• Identify

• Samples

• Species

• Gene

• Short

• Standardised

Page 4: Johannes Bergsten Dna Barcoding

2 main uses of DNA Barcoding2 main uses of DNA Barcoding

• identify specimens – a global identification system

• discover new species – aid and speed up the discovery of the remaining biodiversity

Page 5: Johannes Bergsten Dna Barcoding

Credit for slide: Paul Hebert

Page 6: Johannes Bergsten Dna Barcoding

Why DNA Barcoding?-the applications

Why DNA Barcoding?-the applications

• Identification of all life stages, eggs, larvae, nymphs, pupa, adults• Identification of fragments or products of organisms• Identification of stomach contents, trace ecological food-chains• Identification of cryptic look-alike species• Food control• Customs control• Invasive species control• Disease vector control• Police • Agriculture• Forestry• Conservation• Education• Etc

Page 7: Johannes Bergsten Dna Barcoding

ExamplesExamples

Credit for slide: David E. Schindel

What is the filletserved on your plate, on a market or in a package?

What are the eggsor molt in the ballast waterof ships? Are they non-native invasive species?

Page 8: Johannes Bergsten Dna Barcoding

Further examplesFurther examples

Illegally traded bushmeat, sharkfins, skins

Do the products comefrom protected or banned-for-tradespecies?

Page 9: Johannes Bergsten Dna Barcoding

Why DNA Barcoding?The biodiversity-taxonomy crisis

Why DNA Barcoding?The biodiversity-taxonomy crisis

• The Biodiversity crisis

• We have yet to discover and describe maybe 90% of the biodiversity

• Humans are responsible for a mass extinction that is going fast!

• Traditional taxonomy is too slow!

• Taxonomic expertise is vanishing and training new taxonomists is too expensive

• Democratizing taxonomic knowledge

Page 10: Johannes Bergsten Dna Barcoding

The crisis-illustratedThe crisis-illustrated

This is where we stand today! Cre

dit :

Dav

id E

. Sch

inde

l

Page 11: Johannes Bergsten Dna Barcoding

Sequencing is getting cheapSequencing is getting cheap

Credit for slide: David E. Schindel

Page 12: Johannes Bergsten Dna Barcoding

The VisionThe Vision

Credit: iBOL

Ima

ge

cre

dit:

Ba

rco

din

g in

stitu

te o

f o

nta

rio

Page 13: Johannes Bergsten Dna Barcoding

“- Mum is this a grizzly bear or a black bear?”

“- Well Johnnie why don’t you go poke your barcoder

into it and find out.”

(Cameron et al Syst. Biol: 2006)

Criticism

Page 14: Johannes Bergsten Dna Barcoding

The Barcoding Movement The Barcoding Movement

• CBOL: a consortium of 200 member institutions/organizations from 50 countries that promote and standardize DNA Barcoding

• iBOL: an alliance of 16 nations trying to get the big bucks to do the job.

Page 15: Johannes Bergsten Dna Barcoding

The chosen gene for MetazoansThe chosen gene for Metazoans

• Cytochrome Oxidase subunit I• Mitochondrial• Easy to amplify• Relatively fast

evolving

Credit: iBOL

Page 16: Johannes Bergsten Dna Barcoding

The chosen genes for plantsThe chosen genes for plants

Plastid genes rbcL and matK form a 2-locus plant barcode

Page 17: Johannes Bergsten Dna Barcoding

What are you waiting for?What are you waiting for?

Credit: iBOL

Page 18: Johannes Bergsten Dna Barcoding

BOLD - project managmentBOLD - project managment

Page 19: Johannes Bergsten Dna Barcoding

ProjectsProjects

Page 20: Johannes Bergsten Dna Barcoding
Page 21: Johannes Bergsten Dna Barcoding
Page 22: Johannes Bergsten Dna Barcoding
Page 23: Johannes Bergsten Dna Barcoding
Page 24: Johannes Bergsten Dna Barcoding
Page 25: Johannes Bergsten Dna Barcoding
Page 26: Johannes Bergsten Dna Barcoding
Page 27: Johannes Bergsten Dna Barcoding

BOLD – identification engineBOLD – identification engine

Page 28: Johannes Bergsten Dna Barcoding

No matchNo match

Page 29: Johannes Bergsten Dna Barcoding
Page 30: Johannes Bergsten Dna Barcoding
Page 31: Johannes Bergsten Dna Barcoding
Page 32: Johannes Bergsten Dna Barcoding

Read Publication on BOLDRead Publication on BOLD

Page 33: Johannes Bergsten Dna Barcoding

DNA Barcode standardsDNA Barcode standards

• The standards include three components: 1) Creation of a reserved keyword (”BARCODE”). NCBI and its collaborators will add the BARCODE ’Flag’ to new submissions that meet the standards established in consultation with CBOL. Data records that meet these criteria will be known as BARCODE records in INSDC (BRIs);

Page 34: Johannes Bergsten Dna Barcoding

Required data elementsRequired data elements

• 2) Required data elements.

• To provide the user community with reliable, retrievable and verifiable information concerning the barcode sequence itself, the specimen from which it was obtained, and the species name that was applied by the submitter.

Page 35: Johannes Bergsten Dna Barcoding

Data on the specimenData on the specimen

• a) Include a link to a voucher specimen using a structured field* specified by CBOL and NCBI, and to the metadata associated with that specimen and contained in the public database of the voucher specimen’s repository.

• b) Include a link to a documented species name found in one of the sources specified by CBOL and NCBI;

• c) Include Country-Code, using the controlled vocabulary used by GenBank;

*(institution|collection|item) e.g. NHRS:ENT-LEPI:AA008745

Page 36: Johannes Bergsten Dna Barcoding

The Barcode regionThe Barcode region

• d) Come from a gene region accepted by CBOL as an effective barcode. Initially, only cytochrome c oxidase 1 is approved as a barcode region, defined relative to the mouse mitochondrial genome as the 648 bp region that starts at position 58 and stops at position 705.

• (For plants matK and rbcL is expected to get the same status very soon)

• CBOL has procedures for applying for other generegions to be given barcode status

Page 37: Johannes Bergsten Dna Barcoding

Quality of sequenceQuality of sequence

• e) Include at least 500 contiguous unambiguous base-pairs from bidirectional sequencing within the approved barcode region. However, if requested, GenBank could assign the BARCODE flag to records with shorter sequences

• f) Include no more than 1% ambiguous sites for the entire submitted sequence;

• g) Include the name of the gene region used; • h) Be associated with trace file submitted to the NCBI Trace Archive or

the Ensembl Trace Server;• i) Include the sequences of all forward and reverse primers used. For

records in which the contiguous sequence was assembled from more than one amplicon or when a cocktail of multiple primers was used for amplification, multiple sets of primer pairs must be provided. In addition, submission of the names of the forward and reverse primers with the primer sequences is strongly recommended.

Page 38: Johannes Bergsten Dna Barcoding

Strongly recommended data elements.

Strongly recommended data elements.

• Strongly recommended data elements. The following data elements have been added to the INSDC at CBOL’s request for validation of the voucher specimen, and will be strongly recommended but not required:

• j) Latitude and longitude;

• k) Name of the identifier;

• l) Name of the collector;

• m) Date of collection

Page 39: Johannes Bergsten Dna Barcoding

Governance rules.Governance rules.

• 3) Governance rules. The INSDC provides an archive of records that can only be changed by the submitter. In the case of BRIs, the following modifications are implemented:

• CBOL can allow <500bp sequences to get barcode status (e.g. types, extinct spp.)

• CBOL maintains a process by which alternative generegions can attain barcode status

• BRIs submitted via BOLD are jointly submitted by the researcher and BOLD and can be edited by both.

• CBOL can recommend the BARCODE status to be removed from sequences submitted to INSDC by an individual researcher.

• A system for attaching third-party comments, criticism and suggested corrections to BRIs will be installed.

Page 40: Johannes Bergsten Dna Barcoding

Credit for slide: David E. Schindel

Page 41: Johannes Bergsten Dna Barcoding
Page 42: Johannes Bergsten Dna Barcoding

Voucher repository linkout from genbank

Voucher repository linkout from genbank

Page 43: Johannes Bergsten Dna Barcoding
Page 44: Johannes Bergsten Dna Barcoding

Linkout from Genbank to taxonomy databases

Linkout from Genbank to taxonomy databases

Page 45: Johannes Bergsten Dna Barcoding

BOLD linkout from genbankBOLD linkout from genbank

Page 46: Johannes Bergsten Dna Barcoding

Trace archivesTrace archives

Page 47: Johannes Bergsten Dna Barcoding

Recommended data elementsRecommended data elements

Page 48: Johannes Bergsten Dna Barcoding

How to submit dataHow to submit data

Page 49: Johannes Bergsten Dna Barcoding

Will DNA Barcoding work?Will DNA Barcoding work?

Image credit: Barcoding institute of ontario

Page 50: Johannes Bergsten Dna Barcoding

Barcoding rest on the idea that between species genetic distance is larger, than within species variation.

Genetic distance

The Barcoding gapThe Barcoding gap

1%

Page 51: Johannes Bergsten Dna Barcoding

OrganismDistribution

Geographical sampling species

sampled Prop.

ind/sp.

intrasp var.

intersp div.

Id. success paper

Spiders WorldLocal (Canada) 40,000 168

0.0042 3 1.40% 16.40% 100%

Barrett & Hebert (2005)

Birds WorldRegional (N. Am.) 9000 260 0.028 2 0.43% 7.93% 100%

Hebert et al (2004)

Lepidopt. 3 sup fam World

Local (Guelph) 91700 200

0.0022 1.7 0.25% 6.80% 100%

Hebert et al (2003)

mayflies WorldRegional (N. Am.) 2,500 80 0.032 1.9 1.10% 18.10% 99.00%

Ball et al (2005)

Differ by >an order of magnitude= Barcoding Gap

Supporting data for the Barcoding Gap

Critique:Well sampled?

Page 52: Johannes Bergsten Dna Barcoding

Sisterspecies vs congenersSisterspecies vs congeners

Panthera leo (lejon)

Panthera tigris (tiger)Motacilla flava (gulärla) Motacilla alba (sädesärla)Carabus nitens (guldlöpare) Carabus coriaceus (läderlöpare)Salix herbacea (dvärgvide)

Salix caprea (sälg)

Sisterspecies vs congeners

Agabus elongatus

A. congener A. lapponicus

A. thomsoni

A. moestus

A. levanderi

A. clypealis

A. pseudoclypealis

Sylvia minula (ökenärtsångare)

Sylvia curucca (ärtsångare)Eupeodes luniger

Eupeodes latilunulatus

Sisterspecies vs congeners

Carex rostrata (flaskstarr)Carex vesicaria (blåsstarr)

Pipistrellus pipistrellus (Pipistrell)

Pipistrellus pygmaeus (dvärgfladdermus)

Page 53: Johannes Bergsten Dna Barcoding

Overlap in cowriesOverlap in cowries

Meyer and Paulay, PLoS Biology (2006)

Page 54: Johannes Bergsten Dna Barcoding

Overlap the realityOverlap the reality

Page 55: Johannes Bergsten Dna Barcoding

How DNA barcodes should not be used

How DNA barcodes should not be used

• “It is expected that DNA barcodes will contribute to the discovery and formal recognition of new species. However, DNA barcodes should not be used as the sole criterion for description of new species, which instead require analysis of diverse data, including morphology, ecology, and behavior, as well as genetics.”

From draft conference report: Taxonomy, DNA, and the Barcode of Life, 2003

Page 56: Johannes Bergsten Dna Barcoding

How not to be usedHow not to be used

• ”We were interested to see whether Xus exemplaris would be considered a species under standard DNA barcoding protocol”

• ”Using the DNA Barcoding protocol…..therefore under a 3% threshold and a 10x mean intraspecific threshold Xus exemplaris would be considered a good species.

• ”However if we use the smallest among-species divergence as recomended by Meier et al (2008) Xus exemplaris would not be considered a good species under the protocol.”

Page 57: Johannes Bergsten Dna Barcoding

Barcodes are very useful for species discovery

Barcodes are very useful for species discovery

• For poorly known groups DNA delimitation can be a good starting point for species discovery

• There are alternatives to an artifical 1, 2 or 3% sequence divergence as a threshold

• E.g. GMYC General Mixed Yule Coalescence method (Pons et al, 2006)

Page 58: Johannes Bergsten Dna Barcoding

Aulonogyrus cristatusAulonogyrus goudoti

Gyrinus madagascariensis

Dineutes subspinosus

Dineutes sinuosipennis

Dineutes proximus

Gyrinus ignitus

Orectogyrus cyanicollis

Orectogyrus pallidocinctus

Orectogyrus vestitus

Orectogyrus sedilloti

GMYC model (Pons et al, 2006)

Andasibe

Ranomafana

Mont. D’Ambre

Antsabe

likelihood

574

576

578

580

582

584

586

588

590

592

-1 4 9 14 19 24 29 34 39 44 49

likelihood

P<0.01

Page 59: Johannes Bergsten Dna Barcoding

Large inventories of the unknown