DNA Short Tandem Repeats
Organism
DNA Short Tandem Repeats
Organ
DNA Short Tandem Repeats
Cell
Weights
• 1kg – a bag of sugar
• 1g – paper clip
• 1mg (milligram) 0.001g – brain of a bee
• 1µg (microgram) 0.000001g weight of a
bacterium
• 1ng (nanogram) 0.000000001g a millionth
of a grain of salt - recommended input to
profiling
• 1pg (picogram) 0.000000000001g 6pg of
DNA from each cell
Cells
• We lose about 30,000-40,000 skin cells an
hour
• In a year, you lose about 8lbs of cells
• “Where do they all go? The dust that collects
on your tables, TV, windowsills and on those
picture frames that are so hard to get clean is
made mostly from dead human skin cells. In
other words, your house is filled with former
bits of yourself.”
• About 10,000 will fit on the head of a pin
• Current DNA technology can profile one cell
DNA Short Tandem Repeats
Nucleus
DNA Short Tandem Repeats
Chromosomes
DNA Short Tandem Repeats
DNA
DNA Short Tandem Repeats
Locus
DNA Short Tandem Repeats
STR
DNA Short Tandem Repeats
DNA Short Tandem Repeats
Allele
DNA Short Tandem Repeats
Allele
5
3
DNA Short Tandem Repeats
Locus is important
FGA 3
D3 3
DNA Short Tandem Repeats
A D3 vWA D16 D2 D8 D21 D18 D19 THO1 X Y 17 18 18 11 12 18 24 12 14 29 13 17 14 9 9.3
DNA profile
Locus
Allele Heterozygote
Homozygote
The process
• Extraction
• Quantitation
• Amplification
• Separation
• Interpretation
• Evaluation
Amplification = Multiplication
Raw data
Single source profile
One DNA component
from mother,
another from father
Area of DNA tested
Names of DNA
components
Why statistics?
• DNA is NOT unique
• We look at only a few areas
• Need to know what the probability
of finding the profile by chance is
(i.e. to give an idea of how many
other people may have been the
source of the profile)
Statistical estimates
= 0.1
1 in a billion
1 in 10 1 in 111 1 in 20
1 in 22,200
x x
1 in 100 1 in 14 1 in 81
1 in 113,400
x x
1 in 116 1 in 17 1 in 16
1 in 31,552
x x
Probability
• Black hair
• Blue eyes
• Beard
• Gold tooth
0.6
0.25
0.01
0.001
Probability= 0.6 x 0.25 x 0.01 x 0.001
= 0.0000015
= 1 in 666,666
Random Match Probability
R B
f 0.1 0.1
RB = 0.1 x 0.1 = 0.02 = 2 in 100 x 2 = 1 in 50
Mixtures
Mixtures
?
Mixtures
?
Mixtures
?
Mixtures
?
Mixtures
Mixtures
RB
RY
RG
BY
BG
GY
= 6 ‘suspect’ profiles that
‘cannot be excluded’ as
contributors
How many suspects?
• With 6 possibilities at each of 15 areas
• There are 6x6x6x6x6x6x6x6x6x6x6x6x6x6x6=
• More than 60 million suspect profiles
Alleles observed on ‘outside’
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13 31.2
8
10
11
10
11
12
16
17
18
6 9
9.3
11
12
11
12
13
14
17
19
25
13
14
14
15
16
8
11
12
14
15
16
12
13
21
22
24
25
13
29
31.2
32.2
8
10
11
12
11
12
16
18
6 7
8
9.3
11
12
13
9 12
13
14
17
25
13
14
14
16
18
8
11
14
16
12
13
20
21
24
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
Alleles observed on ‘outside’
No. of alleles at each locus
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
1 3 4 3 3 5 3 5 3 2 4 3 3 2 5
No of ‘suspect’ profiles
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
1 3 4 3 3 5 3 5 3 2 4 3 3 2 5
1 3 6 3 3 10 3 10 3 1 6 3 3 1 10
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
1 3 4 3 3 5 3 5 3 2 4 3 3 2 5
1 x3 x6 x3 x3 x10 x3 x10 x3 x1 x6 x3 x3 x1 x10
No of ‘suspect’ profiles
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
1 3 4 3 3 5 3 5 3 2 4 3 3 2 5
1 x3 x6 x3 x3 x10 x3 x10 x3 x1 x6 x3 x3 x1 x10
= 78,732,000 ‘suspect profiles
No of ‘suspect’ profiles
D8
D8
D8
Adding ‘new’ alleles at D8 D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
9
11
13
14
29
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
4 3 4 3 3 5 3 5 3 2 4 3 3 2 5
6 3 6 3 3 10 3 10 3 1 6 3 3 1 10
472,392,000 (470m) ‘suspect’ profiles
D21
D21 ‘zoom’
D21
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
9
11
13
14
28
29
30
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
4 5 4 3 3 5 3 5 3 2 4 3 3 2 5
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
19
11
13
14
28
29
30
31.2
32.2
8
10
11
12
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
9
11
12
13
14
17
19
25
13
14
14
15
16
18
8
11 12
14
15
16
12
13
20
21
22
24 25
4 5 4 3 3 5 3 5 3 2 4 3 3 2 5
6 10 6 3 3 10 3 10 3 1 6 3 3 1 10
1,574,640,000 (1.5 billion) ‘suspect profiles
Adding ‘new’ alleles at D21
D8 D21 CSF D3 THO1 D13 D19 TPOX D18 D5
IN
13
14 31.2 10 16 6 12
13
14 11 13 20 OUT 13 29
31.2
32.2
10
11
12
16
17
18
6
7
8
9
9.3
11
12
13
13
14
8
11
12
14
15
16
20
21
22
24
25
Alleles on inside & outside
The Likelihood Ratio = LR
Probability of this evidence if the DNA came from Mr X + unknown
Probability of this evidence if it came from 2 unknowns
LR = Probability of E given Hpros
Probability of E given Hdef
“… times more likely”
e.g. LR = 1/10
1/100 =
0.1
0.001 = 10
LR = 1 (1/frequency)
For single source profiles
=frequency
e.g. 1/(1/10) = 10
Mixtures
R B Y G
f 0.25 0.25 0.25 0.25
X p(Hp) p(Hd) LR
RB 0.125 0.0469 2.67
RY 0.125 0.0469 2.67
RG 0.125 0.0469 2.67
BY 0.125 0.0469 2.67
BG 0.125 0.0469 2.67
YG 0.125 0.0469 2.67
“Mr X + unknown rather than two unknowns”
R B Y G
f 0.1 0.1 0.25 0.25
Mr X p(Hp) p(Hd) LR
RB 0.125 0.0075 16.67
RY 0.05 0.0075 6.67
RG 0.05 0.0075 6.67
BY 0.05 0.0075 6.67
BG 0.05 0.0075 6.67
YG 0.02 0.0075 2.67
“Mr X + unknown rather than two unknowns”
R B Y G
f 0.1 0.1 0.25 0.25
Mr X p(Hp) p(Hd) LR
RB 0.125 0.0075 16.67
RY 0.05 0.0075 6.67
RG 0.05 0.0075 6.67
BY 0.05 0.0075 6.67
BG 0.05 0.0075 6.67
YG 0.02 0.0075 2.67
“Mr X + unknown rather than two unknowns”
RG 33.33
“Mr X + unknown rather than two unknowns”
R B Y G
f 0.01 0.1 0.2 0.5
Mr X p(Hp) p(Hd) LR
RB 0.2 0.0012 166.67
RY 0.1 0.0012 83.33
RG 0.04 0.0012 33.33
BY 0.01 0.0012 8.33
BG 0.004 0.0012 3.33
YG 0.002 0.0012 1.67
“Mr X + unknown rather than two unknowns”
R B Y G
f 0.01 0.1 0.2 0.5
Mr X p(Hp) p(Hd) LR
RB 0.2 0.0012 166.67
RY 0.1 0.0012 83.33
RG 0.04 0.0012 33.33
BY 0.01 0.0012 8.33
BG 0.004 0.0012 3.33
YG 0.002 0.0012 1.67
“Mr X + unknown rather than two unknowns”
More complicated mixture
Second area
Second area
A
B
A
C
D
D
B
C
Second area (locus)
A B C D
AB
AC
AD
BC
BD
CD
= 6 ‘suspect’ profiles that
‘cannot be excluded’ as
contributors
Second area only
AB AC AD BC BD CD
RB
RY
RG
BY
BG
YG
444
AB AC AD BC BD CD
RB
RY
RG
BY
BG
YG
444
889
444
AB AC AD BC BD CD
RB
RY
RG
BY
BG
YG
1,778
889
1,778
444
889
444
“X + unknown rather than two unknowns”
AB AC AD BC BD CD
RB
RY
RG
BY
BG
44,444
22,222
44,444
11,111
22,222
11,111
YG
1,778
889
1,778
444
889
444
“X + unknown rather than two unknowns”
AB AC AD BC BD CD
RB
88,889
44,444
88,889
22,222
44,444
22,222
RY
RG
BY
BG
44,444
22,222
44,444
11,111
22,222
11,111
YG
1,778
889
1,778
444
889
444
“X + unknown rather than two unknowns”
AB AC AD BC BD CD
RB
88,889
44,444
88,889
22,222
44,444
22,222
RY
3,556
1,778
3,556
889
1,778
889
RG
8,889
4,444
8,889
2,222
4,444
2,222
BY
17,778
8,889
17,778
4,444
8,889
4,444
BG
44,444
22,222
44,444
11,111
22,222
11,111
YG
1,778
889
1,778
444
889
444
“X + unknown rather than two unknowns”
Stochastic variation
Examples so far assume allele
calls are certain, but low template
samples cause new problems
because of stochastic variation.
•Stochastic variation is random
variation
•Failure to reproduce results
•Leads to uncertainty
The crimestain
Standard technique
Enough sample so that no dropout is expected and peak height represents
amount of DNA present (i.e. not variable)
Low Template Sample
• Stochastic variation is random
variation
• Failure to reproduce results
• Leads to uncertainty
A B
C
D
E
F
G H
I
A B
C D
E F
Dropout or dropin?
D8 D21 D7 CSF D3 THO1 D13 D16 D2 D19 vWA TPOX D18 D5 FGA
13 31.2
8
10
11
10
11
12
16
17
18
6
9
9.3
11
12
11
12
13
14
17
19
25
13
14
14
15
16
8
11
12
14
15
16
12
13
21
22
24
25
13
29
31.2
32.2
8
10
11
12
11
12
16
18
6
7
8
9.3
11
12
13
9
12
13
14
17
25
13
14
14
16
18
8
11
14
16
12
13
20
21
24
Probability of dropout and dropin
p(D)
Is the probability that an allele is
really there but you have not
detected it.
p(C)
Is the probability that an allele you
have detected is not from the
crimestain – it is contamination
FST statistic
• FST is the programme used to
calculate the LR in this case
• Statistic depends on
– Probability of dropout which is
• Dependent usually on the weight of DNA
• Which is unknown for the minor
contributors
– And the validation data do not support
any p(D) for any weight of DNA
– The LR being correct
Low Template Sample
• Identified by variable results, NOT
the amount of DNA
• Causes problems in;
– Identifying ‘true’ sample alleles
– Using peak height information
• Inclusion/exclusion of people
• Number of contributors