Measuring Rates of mtDNA Heteroplasmy Using a...

Preview:

Citation preview

Measuring Rates of mtDNA Heteroplasmy Using a NextGen

Sequencing Approach

Mitchell M. Holland, Ph.D.Former Director, Forensic Science Program

Associate Professor, Biochem & MolBioPenn State University, University Park, PA

www.forensics.psu.edu

NC State University15 Sep 2015

Nucleus

Mitochondria

Nuclear DNA

Mitochondrial DNAHigh Copy Number Genome

Types of DNA in the Cell

Membrane-enclosed organelles distributed through the cytosol of most eukaryotic cells

Tsar Nicholas II Family Reference5 Generations Removed

Identification ofNicholas Romanov

Tsar Nicholas II

Identification ofNicholas Romanov

Georgij Romanov

LR = 150

LR = 375,000When the heteroplasmy is considered

“Substitution” Rate of mtDNA

We compared DNA sequences of the two CR hyper-variable segments from close maternal relatives, from 134 independent mtDNA lineages spanning 327 generational events.

Germline Bottleneck

Family Reference5 Generations Removed

DGGE to Identify Heteroplasmy

Used DGGE analysis to identify the heteroplasmic

sequences … including from the distant maternal

relative

“Substitution” Rate of mtDNA

Ten substitutions were observed, resulting in an empirical rate of 1/33 generations, or 2.5/site/Myr. This is roughly twenty-fold higher than estimates derived from pylogenetic studies; 0.118 +/- 0.031/site/Myr.

Using our empirical rate to calibrate the mtDNA molecular clock would result in an age for the mtDNA MRCA of only ~6,500 y.a., clearly incompatible with the known age of modern humans.

Genetic Bottlenecks & Empirical Mutation Rates

The germline mutation rate is 0.13 mutations/site/Myr (compared to phylogenic rate estimates of 0.118)

The number of mtDNA molecules “transmitted” to the next generation is 30-35 (human germline bottleneck)

Non-synonymous mutations showed signs of purifying selection

Proceedings of the National Academy of Sciences (2014) Using an NGS

Approach

Hypothesis: An NGS approach will allow for the routine detection and reporting of mtDNA heteroplasmy, including low level variants. Differences in heteroplasmic profiles may allow for the differentiation of maternal relatives.

Problem: Forensic labs still don’t have a suitable method for detecting and reporting mtDNA heteroplasmy. Even high levels of heteroplasmy go unreported, lowering the discrimination potential of the typing system.

Here we are in 2015, and …

Our Initial Work: 2009-2011

www.isabs.hr www.forensics.psu.edu (Under Mitch Holland’s Research Page)www.cmj.hr

Croat Med J (2011), 52, pp. 299-313

Using the 454 LifeSciences GS Junior Instrument & Chemistry

Sample SangermtDNA Profile

Percent of Minor Heteroplasmy & Site

454 GS Junior mtDNA Profile

Percent of Minor Heteroplasmy & Site

F216069T, 16093C, 16126C, 16261T, 16274A, 16355T

16311 – 18.4% C16069T, 16093C, 16126C, 16261T, 16274A, 16355T

16093 – 3.71% T 16261 – 1.29% C 16311 – 20.14% C

F316069T, 16126C, 16145A, 16172C, 16261T

Not Detected16069T, 16126C, 16145A, 16172C, 16261T

Not Detected

F4 No polymorphisms Not Detected No polymorphisms Not Detected

F5 16129A, 16172C, 16223T, 16311C Not Detected 16129A, 16172C,

16223T, 16311C16129 – 0.51% G 16311 – 0.33% T

F7, F12-13, M13-14

16192T, 16256T,16270T Not Detected 16192T, 16256T,

16270T 16192 - 2.64-4.50% C

F8 16223T, 16362C Not Detected 16223T, 16362C 16223 – 1.86% C

F9 16356C Not Detected 16356C Not Detected

F10 16298C Not Detected 16298C 16298 – 0.45% T

F1616126C, 16239T, 16294T, 16296T,16304C

Not Detected16126C, 16239T, 16294T, 16296T, 16304C

Not Detected

F25 16343G Not Detected 16343G Not DetectedF26 16093C Not Detected 16093C Not DetectedF27 16172C, 16278T Not Detected 16172C, 16278T Not DetectedM3 16355T Not Detected 16355T Not DetectedM4 16111T Not Detected 16111T 16111 – 0.52% C

Evaluated 30 individuals from25 different mtDNA lineages

0.33%or1/300

Table 3, Holland et al, CMJ 2011

Concordance

3.71% C/THeteroplasmy

1.29% T/CHeteroplasmy

20.14% C/THeteroplasmy

Sanger versus NGS Heteroplasmy Detection

SAN

GER

NG

S

Figure 2, Holland et al, CMJ 2011

Other Examples

PGM capable of producing quality, reliable mtDNA sequence data

64 mtgenomes

<0.02% Differences from Sanger Data

Most Differences in Homopolymeric Stretches

Concordance

M5

16114A, 16129A, 16192T, 16213A, 16223T, 16278T, 16355T, 16362C

Not Detected

16114A, 16129A, 16192T, 16213A, 16223T, 16278T, 16355T, 16362C

16192 – 3.18% C

M7 16129A, 16223T,16264T Not Detected 16129A, 16223T,

16264T Not Detected

M8 16224C, 16311C Not Detected 16224C, 16311C Not Detected

M9 16301T, 16343G, 16356C Not Detected 16301T, 16343G,

16356C Not Detected

M10 16304C Not Detected 16304C16209 – 2.62% C 16222 – 2.30% T 16304 – 2.99% T

M11 16129A, 16223T Not Detected 16129A, 16223T Not Detected

M12 16069T, 16126C Not Detected 16069T, 16126C 16126- 1.14% T

M15 16093C, 16224C,16311C Not Detected 16093C, 16224C,

16311C 16093 – 3.04% T

M17 16126C, 16294T,16296T Not Detected 16126C, 16294T,

16296T Not Detected

M18 16278T, 16304C,16311C Not Detected 16278T, 16304C,

16311C

16128 – 0.52% T 16278 – 0.77% C 16293 – 0.77% G 16304 – 1.00% T

M19, F22 16069T, 16126C,16222T Not Detected 16069T, 16126C,

16222T Not Detected

Sample SangermtDNA Profile

Percent of Minor Heteroplasmy & Site

454 GS Junior mtDNA Profile

Percent of Minor Heteroplasmy & Site

Is low level heteroplasmy reproducible?

Reproducibility

Reproducibility

M10 16304C Not Detected 16304C16209 – 2.62% C 16222 – 2.30% T 16304 – 2.99% T

Reproducibility

Sample SangermtDNA Profile

Percent of Minor Heteroplasmy & Site

454 GS Junior mtDNA Profile

Percent of Minor Heteroplasmy & Site

M10 Replicate #116209 – 2.58% C 16222 – 2.03% T 16304 – 1.87% T

M10 Replicate #216209 – 2.32% C 16222 – 2.57% T 16304 – 0.56% T

Rate of Heteroplasmy

Data Set = 109 Individual Lineages(50 Pairs of Maternal

Relatives)

0.5-1.0% Heteroplasmy

>1% Heteroplasmy

>10% Heteroplasmy

Coding Region 69% 50% 14%

Control Region 50% 26% 8.6%*

*Consistent with previous reports: for example, Irwin et al, J Mol Evol 2009

Things to Consider

If we agree that NGS should be employed in forensic cases then we need to better understand:

rates of heteroplasmy (per sample & per nucleotide)

transmission and drift of heteroplasmic variants

where to set reporting thresholds

how DNA damage will impact thresholds

statistical approaches when reporting heteroplasmy

• mtDNA Control Region

• Buccal swabs from 550 Unrelated individuals

• European decent

• Three age groups

• 18-30, 31-50, >50 yoa

• MiSeq/Nextera XT

• Initial findings

• Haplotypes/Heteroplasmy

Quigley's Cartoons | blog | June 5, 2013 http://www.capecodtoday.com/blogs/quigley/2013/06/05/19650-swabbing-cheek

NIJ 2014-DN-BX-K022

Rate Study

http://forensics.psu.edu/research/dr.-mitchell-holland

Haplotypes

• 265 samples analyzed, thus far

• 222 different haplotypes in the dataset (84%)

• 196/265 unique haplotypes (74%)

• Consistent with previous analyses, but higher percentages due to sequence range analyzed

~72%

~63%

Shared Haplotypes

16 7 2 1

Most common haplotype = 16519C, 263G, 315.1C (3 %)Shared by 8/265 individuals

H, 93

U, 42R, 27

J, 26

T, 25

K, 23

I, M, N, V, W, X, 20

Native American (C n=3)African (L n=6)

Haplogroups

Heteroplasmy

0

10

20

30

40

50

60

70N

o H

eter

opla

smy

One

Site

Two

Site

s

Thre

e S

ites

Four

Site

s

Five

Site

s

At L

east

One

At L

east

Tw

o

At L

east

Thr

ee

At L

east

Fou

r

At L

east

Fiv

e

Obs

erva

tions

of H

eter

opla

smy

Observations of Heteroplasmy

1-10% MAF>10% MAF

13% individuals

60% individuals

NOTE: >10% means at least one site above this value

27%

24%26%

0

5

10

15

20

25

30

35

40

45

18‐30 Years  of Age 31‐50 Years of Age >50 Years  of Age

Samples with No Heteroplasmy

Heteroplasmy v. Age

Normalized for Sample Set Size

0

5

10

15

20

25

30

35

40

45

One Site Two Sites Three Sites Four Sites Five or MoreSites

18-30 Years of Age

31-50 Years of Age

>50 Years of Age

18-30: 26% individuals31-50: 27% individuals>50: 43% individuals

Heteroplasmy v. AgeN

umbe

r of S

ampl

es

Heteroplasmy v. Site

Cold SpotsHot Spots

65% in HV1, 21% in HV2, 14% Outside HV1/HV2

Likelihood Ratio

LR = p(E1/R) x p(E2/R)

p(E1/R’) x p(E2/R’)

p(E1/R) = the probability of the evidence (match between Georgij and Nicholas) given the hypothesis that the remains are those of Nicholas Romanov

E2 = the probability of co-occurrence of heteroplasmy

R’ = given the hypothesis that the remains are unrelated

LR = 375,000Increase Discrimination Potential

Differentiate Between Maternal Relatives

#1098

Primary Haplotype Heteroplasmy Positions200 A/G (3.0%)

A263G

T16093C 16093T/C (12.6%)C16261TC16291CT16311CT16362CT16519C

#1100

Primary Haplotype Heteroplasmy Positions

A263G

T16093C 16093C/T (3.4%)C16261TC16291CT16311CT16362CT16519C

Issues Still to AddressForensic Context

Reporting mechanism for heteroplasmy

Weight of a heteroplasmic match

Impact of maternal transmission of heteroplasmic variants

Impact of drift in heteroplasmic variants at the tissue level

Thanks!!

IlluminaCydne Holt, Kathy Stephens, Joe Valaro, Carey Davis, Dan Gheba, etc

SoftGenetics – NextGENe®

John Fosnacht, Teresa Snyder-Leiby, etc

Penn StateKateryna Makova, Anton Nekratenko

Mitotyping TechnologiesBob Bever, et al

Battelle Memorial Institute

National Institute of Justice (NIJ 2014-DN-BX-K022)

Eberly College of Science, Forensic Science Program

Current Research Group

Jen McElhoe,Research Associate (NIJ)

Master’s Students:Molly Rathbun (damage)Laura Wilson (D-loop val)Elena Zavala (bone extr)Jamie Gallimore (drift)

UG Students:Alyssa DuffyJillian BakerErica Pack

Walther Parson & Ann Gross

www.forensics.psu.edummh20@psu.edu

Thanks for your hospitality!!

Recommended