93
Next Generation Sequencing Activities at NIST NGS Workshop Mid-Atlantic Association of Forensic Science May 19, 2014 Katherine Butler Gettings, Ph.D. Research Biologist, Applied Genetics Group National Institute of Standards and Technology

Next Generation Sequencing Activities at NIST · 2020. 5. 5. · •HID-Ion Ampliseq Identity Panel (IISNP) –120 markers in a single PCR reaction –Amplified regions 33 bp to 192

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Next Generation Sequencing Activities at NIST

    NGS Workshop Mid-Atlantic Association of Forensic Science

    May 19, 2014

    Katherine Butler Gettings, Ph.D. Research Biologist, Applied Genetics Group

    National Institute of Standards and Technology

  • I will mention commercial STR kit names and information, but I am in no way attempting to endorse any specific products.

    NIST Disclaimer: Certain commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose.

    Points of view are mine and do not necessarily represent the official position of the National Institute of Standards and Technology or the U.S. Department of Justice. Our group receives or

    has received funding from the FBI Laboratory and the National Institute of Justice.

    Disclaimer

    Mid-Atlantic Association of Forensic Science Next Generation Sequencing Workshop

    May 19, 2014

  • Outline

    Background

    NGS of Forensic DNA markers

    – STRs

    – mtDNA

    – Single Nucleotide Polymorphisms (SNPs)

    NGS on the PGM- Ampliseq workflow

    Experimental data

    – HID-Ion Ampliseq Identity Panel

    – HID-Ion Ampliseq Ancestry Panel

  • What’s in a name???

    Next-generation sequencing

    Massively parallel sequencing

    Second-generation sequencing

    Third-generation sequencing

    NGS

    High-throughput sequencing

    Next-generation genomics

    Whole-genome sequencing

  • Parallel Sequencing ‘A million capillary Sanger sequencer’

  • Parallel Sequencing

    ‘A million capillary Sanger sequencer’

    • Clonal vs population amplification

    • Shorter reads (Range 75 to 400)

    • Errors are more ‘detectable’

    • High coverage 100 – 1000 - 10,000x

    • Rely more on informatics to assemble millions of short reads

  • http://electronicsbyexamples.blogspot.com/2013/03/milestones-in-digital-electronics.html

  • http://blog.trentonsystems.com/moores-law-pushing-processor-technology-to-14-nanometers/

  • Forensic NGS Applications

    • Short Tandem Repeats (STRs)

    – PCR fragment-length polymorphisms

    • Mitochondrial DNA (mtDNA)

    – Sanger sequencing

    • Single Nucleotide Polymorphisms (SNPs)

    Capillary electrophoresis electropherogram

  • NGS of Forensic STR Loci

  • NGS of Forensic STR Loci

    D8S1179

    D21S11

    D7S820

    CSF1PO

    D3S1358

    TH01

    D13S317

    D16S539

    D2S1338

    D19S433

    vWA

    TPOX

    D18S51

    A

    D5S818

    FGA

    Sizes of largest observed alleles (not including primer binding/flanking region)

  • NGS of Forensic STR Loci

    D8S1179

    D21S11

    D7S820

    CSF1PO

    D3S1358

    TH01

    D13S317

    D16S539

    vWA

    TPOX

    D18S51

    A

    D5S818

    FGA

    Sizes of largest observed alleles (not including primer binding/flanking region) D2S1338

    D19S433

  • NGS of Forensic STR Loci

    D21S11: Individual appears homozygous by CE but different sequencing composition shown with NGS.

  • D21S11: Individual appears homozygous by CE but different sequencing composition shown with NGS.

    NGS of Forensic STR Loci

  • D21S11: Individuals appear homozygous by CE but different sequencing composition shown with NGS.

    NGS of Forensic STR Loci

  • NGS of Forensic STR Loci

  • NGS of Forensic STR Loci

  • NGS of Forensic STR Loci

  • NGS of Forensic STR Loci

    STR sequence data from 2391c, Component A:

    • Truseq Library Prep

    • MiSeq sequencing

    • STRait Razor data parsing

    • R script (NIST) data viewer

  • Forensic DNA Markers

    • Short Tandem Repeats (STRs)

    – PCR fragment-length polymorphisms

    • Mitochondrial DNA (mtDNA)

    – Sanger sequencing

    • Single Nucleotide Polymorphisms (SNPs)

    http://www.orchidcellmark.ca http://remf.dartmouth.edu/images/mammalianLungTEM/source/8.html

    Mitochondria

    www.wikipedia.org

    Maternally inherited Circular genome

  • • Increase in variants by whole genome analysis

    (based on analysis of 3 SRM samples)

    mtDNA Information

    Over ten times more variable

    per site Three times more polymorphisms than HV alone

    http://www.sas.upenn.edu/~tgschurr/labwork/labwork_text.html

  • Current Method

    • Sequence based on chromatogram

    • Consensus of one forward and one reverse

    NGS

    • Sequence based on thousands of individual reads

    • Improved sensitivity: – Mixture detection

    – Low level heteroplasmy

    mtDNA Information

  • Current Method

    • Minor peaks may not be reproducible

    • SRM 2392 9947a, 1393 G/A heteroplasmy

    NGS

    • More consistent detection of minor genotypes

    • Validation important – Variant calling thresholds

    – Characterizing noise

    mtDNA Information

  • Characterization of SRM 2392 and 2392-I Mitochondrial genome sequencing standard

    Detection of low level heteroplasmy

    0.0%

    5.0%

    10.0%

    15.0%

    20.0%

    HL60 - 2445 HL60 - 5149 9947A - 1393 9947A - 7861

    Min

    or

    Alle

    le F

    req

    ue

    ncy

    C T T A

  • 2878 1981 1571

    571

    18821

    10884

    22719

    17499

    41064 40356

    48071

    42101

    11737

    16411 17697

    11796

    0

    10000

    20000

    30000

    40000

    50000

    0.0%

    5.0%

    10.0%

    15.0%

    20.0%

    HL60 - 2445 HL60 - 5149 9947A - 1393 9947A - 7861

    Co

    verage

    M

    ino

    r A

    llele

    Fre

    qu

    en

    cy

    Characterization of SRM 2392 and 2392-I Mitochondrial genome sequencing standard

    Detection of low level heteroplasmy

    C T T A

  • SNaPshot

    http://portal.ccg.uni-koeln.de/

    Allelic Discrimination (qPCR)

    Forensic DNA Markers

    • Short Tandem Repeats (STRs)

    – PCR fragment-length polymorphisms

    • Mitochondrial DNA (mtDNA)

    – Sanger sequencing

    • Single Nucleotide Polymorphisms (SNPs)

    Sanger Sequencing

    Most methods are low throughput and/or require a lot of DNA

    NGS method can analyze many SNPs for many samples in one run

  • • IISNP-Individual

    • AISNP-Ancestry

    • LISNP-Lineage

    • PISNP-Phenotype

    SNP Information

  • • Individual Identification

    – Balancing has occurred in all populations

    – Low F statistics within (FIS) and among (FST) populations

    – High heterozygosity

    SNP Information

  • • Individual Identification Pakstis 2010, Kidd 2012

    – Panel of 45 unlinked SNPs

    – FST below ≈ 0.07

    – Avg het > 0.4

    – RMP 10-15 to 10-18

    in 44 populations

    SNP Information

  • • HID-Ion Ampliseq Identity Panel (version 2.3)

    – 90 autosomal SNPs

    – 30 Y-chromosome SNPs

    – RMP 10-35

    SNP Information

    Kidd 45

    SNPforID 52

    HID Identity Panel

  • • Ancestry Information – High Fixation Index (FST)

    – Population specific fixation has occurred

    – Low heterozygosity

    • Example – Malaria resistance SNPs in

    Sub-Saharan Africa

    SNP Information

  • • HID Ancestry Panel

    – Beta version 3.0

    – Publicly available soon

    – 170 loci

    – Derived from

    Kosoy et. al (2008): 128 SNPs Kidd et. al (2014): 55 SNPs

    SNP Information

  • • Ion Torrent Personal Genome Machine (PGM) – Launched in 2010

    • Ion Torrent sequencing: – Emulsion PCR for single copy reactors

    – Non-labeled nucleotide triphosphates

    – Flowed over a bead on a semiconductor surface

    • Hydrogen Ion detection – pH change is detected

    – No optics

    Life Tech - Ion Torrent - PGM

  • Ion Torrent PGM Workflow

    http://www.youtube.com/watch?v=MxkYa9XCvBQ

  • The PGM Instrument at NIST

    PGM Sequencer

    OneTouch 2 (Emulsion PCR)

    OneTouch ES (Enrichment)

    7 ft

  • Ampliseq Workflow

    PCR amplify

    Chew back primers

    Ligate adapters

    Emulsion PCR

    Sequencing

    One template per bead/droplet

    454 PGM “Ionogram”

    Ion Ampliseq Library Kit

    1 ng DNA input

    Ampliseq Primer Pool

    Template Kit Sequencing Kit

  • Front-End: Multiplex PCR

    • HID-Ion Ampliseq Identity Panel (IISNP) – 120 markers in a single PCR reaction – Amplified regions 33 bp to 192 bp long

    • HID-Ion Ampliseq Ancestry Panel (AISNP)

    – 170 markers in a single PCR reaction – Amplified regions 34 bp to 136 bp long

    • Small amplicons well suited to degraded or damaged DNA

    60° 4:00

    99° 2:00

    99° 0:15

    18 Cycles Time ≈ 1:30

    1 ng DNA

  • Digest Primer Regions & Ligate Adaptors

    • Enzymatic digestion removes ≈ 25 bp from ends of amplicons

    • Universal sequencing adaptors are ligated to DNA – Adaptors termed P1 and A

    • Barcoded sequencing adaptors can be used in this step

    – Sequence multiple samples in one PGM run

    P1 Adaptor

    A Adaptor

    Barcode Sequence

    PCR Fragment

    Adapted and Barcoded Sequencing Template

    Enzyme Ligase

  • Prepare Ion Sphere Particles (ISPs)

    • Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs

    • Too much DNA polyclonal ISPs (mixed read)

    • Emulsion PCR

    – Nanoliter droplets of PCR reagents in oil – Attaches a single DNA molecule to a single ISP

    • Enrich for positive ISPs

    – Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads

    Ideal Non- Ideal

  • Prepare Ion Sphere Particles (ISPs)

    • Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs

    • Too much DNA polyclonal ISPs (mixed read)

    • Emulsion PCR

    – Nanoliter droplets of PCR reagents in oil – Attaches sequencing template to the ISP

    • Enrich for positive ISPs

    – Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads

    Ideal Non- Ideal

    OneTouch 2

  • Prepare Ion Sphere Particles (ISPs)

    • Libraries quantified by qPCR – Quantity of DNA going into emPCR is very important! – Goal: 10 % to 30 % template positive ISPs

    • Too much DNA polyclonal ISPs (mixed read)

    • Emulsion PCR

    – Nanoliter droplets of PCR reagents in oil – Attaches sequencing template to the ISP

    • Enrich for positive ISPs

    – Liquid handler removes non-templated ISPs – Biotinylated primer/streptavidin beads

    Ideal Non- Ideal

    OneTouch 2

    ISP

    Magnetic bead w/ Streptavidin

    Biotinylated PCR product

    ISP NaOH

    OneTouch ES

  • Sequencing & Data Analysis

    • Library ISPs loaded onto chip

    • PGM runs flows & detects pH

    • Torrent Server & Torrent Suite Software – Processes pH signal into base calls

    – Displays run summary

    – Maps reads to reference genome Photo: www.lifetechnologies.com

  • Data Analysis HID SNP Genotyper Plugin

    Allele coverage histogram

    Normalized y-axis scale

    X-axis is refSNP I.D.

    Autosomal SNPs Y-SNPs

  • Data Analysis HID SNP Genotyper Plugin

    Total Coverage

    Reads for Each Base

    Coverage for Either Strand

    Strand Bias

    Genotype

    Quality Score

    Major Allele Frequency

  • HID SNP Panel Sensitivity Study

    • Dynamic range of DNA input to PCR – 1 ng is recommended – 10 ng (1 data point) – no problems were observed – 1 ng – 0.5 ng – 0.1 ng – 0.05 ng

    • Libraries were generated and pooled (n = 12) • Sequenced on PGM 318 chip (11 M wells)

    – 200 bp read chemistry

    3 Replicates

  • 0.5 ng Input DNA

    HID SNP Panel Sensitivity Study A

    ll s

    cale

    d t

    o 8

    00

    0X

    co

    vera

    ge

    90 Autosomal SNP loci, sorted from highest to lowest coverage

    0.1 ng Input DNA

    0.05 ng Input DNA

    Thresholds: analytical = 50 RFU, stochastic = 300 RFU, PHR = 0.5

    Thresholds:

    50X analytical

    300X stochastic

    50% balance

  • 0.05 ng Input DNA

    0.1 ng Input DNA

    0.5 ng Input DNA

    Identifiler® Plus amplification (29 cycle), 25 µl reaction , 3500xl electrophoresis, 1.2 kV for 8 seconds Thresholds: analytical = 50 RFU, stochastic = 200 RFU, PHR = 0.5

    All

    sca

    led

    to

    40

    00

    RFU

    HID SNP Panel Sensitivity Study

  • HID SNP Panel Sensitivity Study

    D8S1179 D21S11 D7S820 CSF1PO D3S1358 TH01 D13S317 D16S539 D2S1338 D19S433 vWA TPOX D18S51 D5S818 FGA 1 in

    0.05 ng

    1.78E+01

    1.17E+05

    1.17E+06

    0.1 ng

    4.35E+03

    6.19E+10

    6.34E+11

    0.5 ng

    5.67E+14

    rs1

    00

    55

    33

    rs1

    00

    92

    49

    1

    rs1

    01

    52

    50

    rs1

    02

    41

    16

    rs1

    03

    18

    25

    rs1

    04

    88

    71

    0

    rs1

    04

    95

    40

    7

    rs1

    05

    80

    83

    rs1

    07

    73

    76

    0

    rs1

    07

    76

    83

    9

    rs1

    10

    90

    37

    rs1

    29

    97

    45

    3

    rs1

    32

    18

    44

    0

    rs1

    33

    58

    73

    rs1

    35

    53

    66

    rs1

    35

    76

    17

    rs1

    36

    02

    88

    rs1

    38

    23

    87

    rs1

    41

    32

    12

    rs1

    45

    43

    61

    rs1

    46

    37

    29

    rs1

    49

    04

    13

    rs1

    49

    32

    32

    rs1

    49

    85

    53

    rs1

    52

    35

    37

    rs1

    52

    84

    60

    rs1

    59

    60

    6

    rs1

    73

    64

    42

    rs1

    82

    13

    80

    rs1

    87

    25

    75

    rs1

    88

    65

    10

    rs1

    97

    92

    55

    rs2

    01

    62

    76

    rs2

    04

    04

    11

    rs2

    04

    63

    61

    rs2

    07

    68

    48

    rs2

    11

    19

    80

    rs2

    14

    95

    5

    rs2

    21

    95

    6

    rs2

    26

    93

    55

    rs2

    29

    29

    72

    rs2

    34

    27

    47

    rs2

    51

    93

    4

    rs2

    83

    07

    95

    rs2

    83

    17

    00

    rs3

    21

    19

    8

    rs3

    38

    88

    2

    rs3

    54

    43

    9

    rs3

    78

    09

    62

    rs4

    28

    84

    09

    rs4

    30

    04

    6

    rs4

    36

    42

    05

    rs4

    45

    25

    1

    rs4

    53

    00

    59

    rs4

    84

    70

    34

    rs5

    60

    68

    1

    rs5

    76

    26

    1

    rs6

    44

    47

    24

    rs6

    81

    12

    38

    rs6

    95

    54

    48

    rs7

    04

    11

    58

    rs7

    17

    30

    2

    rs7

    19

    36

    6

    rs7

    22

    09

    8

    rs7

    22

    29

    0

    rs7

    27

    81

    1

    rs7

    33

    16

    4

    rs7

    35

    15

    5

    rs7

    37

    68

    1

    rs7

    40

    59

    8

    rs7

    40

    91

    0

    rs7

    52

    03

    86

    rs7

    70

    47

    70

    rs8

    26

    47

    2

    rs8

    73

    19

    6

    rs8

    76

    72

    4

    rs8

    91

    70

    0

    rs9

    01

    39

    8

    rs9

    07

    10

    0

    rs9

    14

    16

    5

    rs9

    64

    68

    1

    rs9

    87

    64

    0

    rs9

    90

    59

    77

    rs9

    93

    93

    4

    rs9

    95

    11

    71

    1 in

    0.05 ng 6.12E+15

    3.62E+18

    7.17E+21

    0.1 ng 7.58E+27

    8.87E+27

    2.88E+30

    0.5 ng 1.16E+35

    2.31E+35

    Identifiler Plus

    PGM HID SNP Panel v2.3

    Just like STR loci, some SNPs are

    consistently less robust

  • 1 in

    0.05 ng 6.12E+15

    3.62E+18

    7.17E+21

    0.1 ng 7.58E+27

    8.87E+27

    2.88E+30

    0.5 ng 1.16E+35

    2.31E+35

    1 in

    0.05 ng

    D8S1179 D3S1358 D2S1338 D19S433 vWA TPOX D18S51 FGA TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.78E+01

    D8S1179 vWA D18S51 TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.17E+05

    D19S433 TPOX FGA TH01 D5S818 D7S820 D16S539 CSF1PO D21S11 1.17E+06

    0.1 ng

    D2S1338 D19S433 vWA TPOX D18S51 FGA D7S820 D16S539 CSF1PO D21S11 4.35E+03

    D2S1338 D21S11 6.19E+10

    CSF1PO D21S11 6.34E+11

    0.5 ng

    5.67E+14

    HID SNP Panel Sensitivity Study

    Identifiler Plus

    PGM HID SNP Panel v2.3

    SNPs have more possible loci and better

    performance at low levels

  • HID SNP Panel Sensitivity Study

    1

    1000

    1000000

    1E+09

    1E+12

    1E+15

    1E+18

    1E+21

    1E+24

    1E+27

    1E+30

    1E+33

    1E+36

    0.05 ng 0.1 ng 0.5 ng

    PGM HID SNP Panel Identifiler Plus

    Ran

    do

    m M

    atch

    Pro

    bab

    ility

    1

    in

    HID SNPs give better RMP

    with 50 pg than ID+ gives with

    0.5 ng

  • HID SNP Panel Sensitivity Study Summary

    • Higher RMPs are expected for SNP panel compared to STRs due to many more loci

    • Under thresholds indicated, higher % SNPs produce results than STRs also

    • Better STR assays (GlobalFiler or NGS-STR) may lessen the “gap”

    • Validation needed for SNP thresholds

  • HID SNP Panel Degraded DNA Study

    Sheared genomic DNA

    →Covaris S2 Focused Ultrasonicator

    + =

    gDNA Sheared DNA

  • HID SNP Panel Degraded DNA Study

    Sheared DNA was fractionated by size range

    Blue Pippin system (3% Gel)

    Automated size selection

    1) 50 bp to 200 bp

    2) 50 bp to 150 bp

    3) 50 bp to 100 bp

    4) 50 bp to 75 bp

    5) 35 bp to 50 bp

    Five individual agarose columns

    Size fractionated fragments collected into recovery wells

    1 2 3 4 5

  • HID SNP Panel Degraded DNA Study

    Sheared DNA was fractionated by size range Agilent Bioanalyzer Trace

    Size selected sheared DNA 50 bp to 200 bp

    50 bp to 150 bp

    50 bp to 100 bp

    50 bp to 75 bp

    35 bp to 50 bp

    Input to HID Panel PCR 1 ng DNA

    Built libraries and sequenced

    Bioanalyzer Standard

    Blue Pippin Marker (65 bp)

  • HID SNP Panel Degraded DNA Study

    90 autosomal IISNPs

    HID SNP Panel

  • Fragmented, size selected < 75 bp

    Fragmented, size selected < 100 bp

    Fragmented, size selected < 150 bp

    Fragmented, size selected < 200 bp

    Fragmented, non-size selected

    Minifiler® amplification (30 cycle), 25 µl reaction, 3500xl electrophoresis, 1.2 kV for 8 seconds Thresholds: analytical = 100 RFU, PHR = 0.5; data scaled to 1000 RFU

    HID SNP Panel Degraded DNA Study

    Fragmented, size selected < 250 bp

    Performed in

    triplicate

    One rep shown

  • Fragmented, size selected < 200 bp

    Fragmented, size selected < 150 bp

    Fragmented, size selected < 100 bp

    Fragmented, size selected < 75 bp

    Fragmented, non-size selected

    PG

    M 3

    18

    Ch

    ip, a

    ll s

    cale

    d t

    o 2

    00

    0X

    co

    vera

    ge

    Fragmented, size selected < 250 bp

    90 Autosomal SNPs, sorted from smallest to largest

    Thresholds:

    50X analytical

    300X stochastic

    50% balance

  • RMP 1 in

    rs10

    055

    33

    rs10

    092

    49

    1

    rs10

    152

    50

    rs10

    241

    16

    rs10

    285

    28

    rs10

    318

    25

    rs10

    488

    71

    0

    rs10

    495

    40

    7

    rs10

    580

    83

    rs10

    773

    76

    0

    rs10

    776

    83

    9

    rs11

    090

    37

    rs12

    997

    45

    3

    rs13

    218

    44

    0

    rs13

    358

    73

    rs13

    553

    66

    rs13

    576

    17

    rs13

    602

    88

    rs13

    823

    87

    rs14

    132

    12

    rs14

    543

    61

    rs14

    637

    29

    rs14

    904

    13

    rs14

    932

    32

    rs14

    985

    53

    rs15

    235

    37

    rs15

    284

    60

    rs15

    960

    6

    rs17

    364

    42

    rs18

    213

    80

    rs18

    725

    75

    rs18

    865

    10

    rs19

    792

    55

    rs20

    162

    76

    rs20

    404

    11

    rs20

    463

    61

    rs20

    562

    77

    rs20

    768

    48

    rs21

    119

    80

    rs21

    495

    5

    rs22

    195

    6

    rs22

    693

    55

    rs22

    929

    72

    rs23

    427

    47

    rs25

    193

    4

    rs28

    307

    95

    rs28

    317

    00

    rs32

    119

    8

    rs33

    888

    2

    rs35

    443

    9

    rs37

    809

    62

    rs42

    884

    09

    rs43

    004

    6

    rs43

    642

    05

    rs44

    525

    1

    rs45

    300

    59

    rs48

    470

    34

    rs56

    068

    1

    rs57

    626

    1

    rs64

    447

    24

    rs68

    112

    38

    rs69

    554

    48

    rs70

    411

    58

    rs71

    730

    2

    rs71

    936

    6

    rs72

    209

    8

    rs72

    229

    0

    rs72

    781

    1

    rs72

    917

    2

    rs73

    316

    4

    rs73

    515

    5

    rs73

    768

    1

    rs74

    059

    8

    rs74

    091

    0

    rs75

    203

    86

    rs77

    047

    70

    rs82

    647

    2

    rs87

    319

    6

    rs87

    672

    4

    rs89

    170

    0

    rs90

    139

    8

    rs90

    710

    0

    rs91

    416

    5

    rs91

    711

    8

    rs93

    828

    3

    rs96

    468

    1

    rs98

    764

    0

    rs99

    059

    77

    rs99

    393

    4

    rs99

    511

    71

    75-1 1.17E+01

    75-2 3.14E+01

    75-3 3.14E+01

    100-1 6.83E+05

    100-2 2.92E+07

    100-3 1.38E+08

    150-1 1.05E+21

    150-2 8.75E+19

    150-3 5.08E+20

    200-1 1.96E+30

    200-2 3.25E+26

    200-3 6.61E+27

    250-1 1.75E+36

    250-2 8.62E+35

    250-3 3.47E+36

    HID SNP Panel Degraded DNA Study

    PG

    M II

    SNP

    s (9

    0)

    Min

    iFile

    r ST

    Rs

    (8)

  • RMP 1 in

    PG

    M II

    SNP

    s (9

    0)

    Min

    iFile

    r ST

    Rs

    (8)

    RMP 1 in

    HID SNP Panel Degraded DNA Study

    SNPs have MANY more possible loci

    and better performance in

    degraded samples

    75-1

    75-2

    75-3

    100-1

    100-2

    100-3

    150-1 7.08E+03

    150-2 7.08E+03

    150-3 1.26E+03

    200-1 7.77E+07

    200-2 6.94E+09

    200-3 6.94E+09

    250-1 6.94E+09

    250-2 6.94E+09

    250-3 6.94E+09

    RMP 1 in

  • 1.0E+00

    1.0E+03

    1.0E+06

    1.0E+09

    1.0E+12

    1.0E+15

    1.0E+18

    1.0E+21

    1.0E+24

    1.0E+27

    1.0E+30

    1.0E+33

    1.0E+36

    1.0E+39

    HID SNP Panel Degraded DNA Study

  • HID SNP Panel Degraded DNA Study Summary

    • SNPs and STRs show expected performance in each fraction based on amplicon size

    • Some SNPs can still amplify in degraded samples where STRs cannot

    • Due to the high number of SNPs, very high RMPs are possible

    • Better STR assays (GlobalFiler or NGS-STR) may lessen the “gap”

    • Validation needed for SNP thresholds

  • SNPs and Mixtures

    AA

    AB

    BB

    1 A B

    AA 100% 0%

    AB 50% 50%

    BB 0% 100%

    % is coverage (like PH balance)

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    ???

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    One single source

    sample, major allele frequency plotted

    for 90 HID SNPs (in ascending order)

    Akin to an imbalanced STR locus

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    One single source sample in triplicate,

    major allele frequency plotted

    for 90 HID SNPs (in ascending order)

    3 SNPs give outlying values, less useful for

    mixtures

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    Two single source samples in triplicate,

    major allele frequency plotted

    for 90 HID SNPs (in ascending order)

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    Three single source samples in triplicate,

    major allele frequency plotted

    for 90 HID SNPs (in ascending order)

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    Four single source samples in triplicate,

    major allele frequency plotted

    for 90 HID SNPs (in ascending order)

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    HID SNP Panel Mixture Detection

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    HOMOZYGOTE AA or BB

    HETEROZYGOTE AB

    Single source

    samples should be either

    50% or 100%

    Five single source samples in triplicate,

    major allele frequency plotted

    for 90 HID SNPs (in ascending order)

    Assessing single source outliers

    will improve mixture model

  • 1 1 1A 1A 1B 1B A B

    AA AA 2 2 0 0 100% 0%

    AA AB 2 1 0 1 75% 25%

    AB AA 1 2 1 0 75% 25%

    AA BB 2 0 0 2 50% 50%

    AB AB 1 1 1 1 50% 50%

    BB AA 0 2 2 0 50% 50%

    AB BB 1 0 1 2 25% 75%

    BB AB 0 1 2 1 25% 75%

    BB BB 0 0 2 2 0% 100%

    SNPs in 1:1 Mixtures

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    1 2 3 4 5 6 7 8 9

    B

    A

    AA AA

    AA AB

    AB AA

    AA BB

    AB AB

    BB AA

    AB BB

    BB AB

    BB BB

    1 1

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    A two-person

    mixture in a 1:1 ratio should have

    frequencies at: 50%, 75%, and 100%

    SNPs in 1:1 Mixtures

    One single source

    sample, major allele frequency plotted

    for 90 HID SNPs (in ascending order)

    Theoretical distribution— Size of “bin” is related to average heterozygosity

    Genotype combinations in this bin are:

    AA:AB, AB:AA, AB:BB, BB:AB

    AA:BB, AB:AB, BB:AA

    AA:AA, BB:BB

  • 2 1 2A 1A 2B 1B A B

    AA AA 4 2 0 0 100% 0%

    AA AB 4 1 0 1 83% 17%

    AA BB 4 0 0 2 67% 33%

    AB AA 2 2 2 0 67% 33%

    AB AB 2 1 2 1 50% 50%

    AB BB 2 0 2 2 33% 67%

    BB AA 0 2 4 0 33% 67%

    BB AB 0 1 4 1 17% 83%

    BB BB 0 0 4 2 0% 100%

    AA AA

    AA AB

    AA BB

    AB AA

    AB AB

    AB BB

    BB AA

    BB AB

    BB BB

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    1 2 3 4 5 6 7 8 9

    B

    A

    2 1

    SNPs in 2:1 Mixtures

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Maj

    or

    Alle

    le F

    req

    ue

    ncy

    90 Autosomal SNPs

    SNPs in 2:1 Mixtures

    A two-person mixture in a 1:1

    ratio should have frequencies at: 50%, 67.5%, 82.5%, 100%

    One single source

    sample, major allele frequency plotted

    for 90 HID SNPs (in ascending order)

  • 3 1 3A 1A 3B 1B A B

    AA AA 6 2 0 0 100% 0%

    AA AB 6 1 0 1 88% 13%

    AA BB 6 0 0 2 75% 25%

    AB AA 3 2 3 0 63% 38%

    AB AB 3 1 3 1 50% 50%

    AB BB 3 0 3 2 38% 63%

    BB AA 0 2 6 0 25% 75%

    BB AB 0 1 6 1 13% 88%

    BB BB 0 0 6 2 0% 100%

    AA AA

    AA AB

    AA BB

    AB AA

    AB AB

    AB BB

    BB AA

    BB AB

    BB BB

    3 1

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    1 2 3 4 5 6 7 8 9

    B

    A

    SNPs in 3:1 Mixtures

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Var

    ian

    t Fr

    equ

    en

    cy

    90 Autosomal SNPs

    A two-person mixture in a 3:1

    ratio should have frequencies at:

    50%, 62.5%, 75%, 87.5% and 100%

    One single source

    sample, major allele frequency plotted

    for 90 HID SNPs (in ascending order)

    SNPs in 3:1 Mixtures

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    >3:1 or >2 people =

    >> “bins”

    Var

    ian

    t Fr

    equ

    en

    cy

    90 Autosomal SNPs

    SNPs in 3:1 Mixtures

    A two-person mixture in a 3:1

    ratio should have frequencies at:

    50%, 62.5%, 75%, 87.5% and 100%

    One single source sample and one 3:1

    mixed sample, major allele

    frequency plotted for 90 HID SNPs

    (in ascending order)

    Actual vs Theoretical

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Var

    ian

    t Fr

    equ

    en

    cy

    90 Autosomal SNPs

    A two-person mixture in a 3:1

    ratio should have frequencies at:

    50%, 62.5%, 75%, 87.5% and 100%

    One single source sample and two 3:1

    mixed samples, major allele

    frequency plotted for 90 HID SNPs

    (in ascending order)

    SNPs in 3:1 Mixtures

  • 50

    55

    60

    65

    70

    75

    80

    85

    90

    95

    100

    Var

    ian

    t Fr

    equ

    en

    cy

    90 Autosomal SNPs

    A two-person mixture in a 3:1

    ratio should have frequencies at:

    50%, 62.5%, 75%, 87.5% and 100%

    One single source sample and three 3:1

    mixed samples, major allele

    frequency plotted for 90 HID SNPs

    (in ascending order)

    SNPs in 3:1 Mixtures

  • HID SNP Panel Mixtures Summary

    • Mixtures can be detected in SNP data based on the coverage levels at heterozygous loci

    • It may be possible to determine two-person 1:1 or 2:1 mixtures (maaaybe 3:1)

    • More than two contributors or greater than 3:1 mixtures will be difficult to distinguish

    • Need to determine which SNPs “behave”

    • Stay tuned!

  • PGM AIM Panel (beta testing)

    • Ampliseq library prep

    • 170 SNPs

    • Seldin 128

    • Kidd 55

    • Analysis plug-in integrates FROGkb

  • AIM Panel Ancestry Prediction – SRM 2391c

    SRM 2391c Component

    Gender Ethnicity

    (self declared)

    A Female Not listed

    B Male Mexican-American

    C Male Melanesian

    D Female:Male Mixed sample

    E Female Not listed

    F Male Caucasian

    • Likelihood Ratio calculations – Four categories extant in both Kidd and Seldin studies

    • Europeans, African Americans, Maya, and Han Chinese

    – Allows comparison of SNP sets’ performance – Representative of major U.S. populations

  • HID SNP Genotyper Plugin (v4.1 Beta) New Feature – Ancestry Map

    • Heatmap of highest probability of origin

  • Ancestry Prediction SRM 2391c Component A

    SRM 2391c Component

    Gender Ethnicity Kidd 55

    Prediction Seldin 128 Prediction

    A Female Not listed European 1.02 x 1033

    European 6.32 x 1066

    Kidd 55 SNPs Seldin 128 SNPs

  • Ancestry Prediction SRM 2391c Component B

    SRM 2391c Component

    Gender Ethnicity Kidd 55

    Prediction Seldin 128 Prediction

    B Male Mexican-American

    European 5.39 x 1012

    Han Chinese 1.48 x 1019

    Kidd 55 SNPs Seldin 128 SNPs

  • Ancestry Prediction SRM 2391c Component C

    SRM 2391c Component

    Gender Ethnicity Kidd 55

    Prediction Seldin 128 Prediction

    C Male Melanesian Han Chinese 1.54 x 1014

    Han Chinese 6.67 x 1028

    Kidd 55 SNPs Seldin 128 SNPs

  • Ancestry Prediction SRM 2391c Component E

    SRM 2391c Component

    Gender Ethnicity Kidd 55

    Prediction Seldin 128 Prediction

    E Female Not listed European 5.41 x 1021

    European 3.92 x 1050

    Kidd 55 SNPs Seldin 128 SNPs

  • Ancestry Prediction SRM 2391c Component F

    SRM 2391c Component

    Gender Ethnicity Kidd 55

    Prediction Seldin 128 Prediction

    F Male Caucasian European 2.35 x 1031

    European 1.16 x 1055

    Kidd 55 SNPs Seldin 128 SNPs

  • HID SNP Panel Ancestry Summary

    • 170 SNP panel containing two SNP sets that are suitable for use in U.S.

    • Plug-in integrates FROG-kb (http://frog.med.yale.edu/FrogKB/)

    • Heat maps give quick overview

    • Interpretation tools being developed

    – Combining loci

    – Choosing/combining populations

    http://frog.med.yale.edu/FrogKB/http://frog.med.yale.edu/FrogKB/

  • Conclusions

    • NGS can give more information on currently used forensic markers

    – More STRs and STR sequence info

    – Whole genome mtDNA

    • NGS facilitates genotyping of forensic SNPs

    • SNPs may help with low level & degraded samples

    • SNPs may provide ancestry (and phenotype?) information

    • Forensic NGS kits/methods are being developed

    • Many questions to answer prior to implementation

  • Acknowledgements

    Dr. Peter Vallone Group Leader

    Kevin Kiesler Research Biologist

    THANK YOU

    Funding from the FBI Biometrics Center of Excellence

    Forensic DNA Typing as a Biometric Tool

    Thermo Fisher (Life Tech):

    Nnamdi Ihuegbu

    Robert Lagace

  • Applied

    Genetics Thank you for your attention!

    Contact Info:

    [email protected]

    301-975-6401

    mailto:[email protected]