Upload
nahrul-ney
View
219
Download
0
Embed Size (px)
Citation preview
7/24/2019 Introduction to Bioinformatics-5
1/58
1
Introduction to
Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
2/58
2
Introduction to Bioinformatics.
LECTURE 5: Variation within and betweenspecies
* Chapter 5: re Neanderthals among us?
7/24/2019 Introduction to Bioinformatics-5
3/58
3
!eanderta"# $erman%# &'5(
Initia" interpretations:
* bear skull
* pathological idiot
* )"d utchman...
http://upload.wikimedia.org/wikipedia/commons/3/37/Neandertal_1856.jpg7/24/2019 Introduction to Bioinformatics-5
4/58
4
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
7/24/2019 Introduction to Bioinformatics-5
5/58
#
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
7/24/2019 Introduction to Bioinformatics-5
6/58
-
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
7/24/2019 Introduction to Bioinformatics-5
7/58
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
5.& Variation in DNA sequences
* E/en closel0 related indi/iduals differ in genetic seuences
* point mutations $ cop0 error at certain location
* )eual reproduction 5 diploid genome
7/24/2019 Introduction to Bioinformatics-5
8/58
6
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
ip"oid chromosomes
7/24/2019 Introduction to Bioinformatics-5
9/58
8
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
+itosis: dip"oid reproduction
7/24/2019 Introduction to Bioinformatics-5
10/58
19
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
+eiosis: dip"oid ,-double / hap"oid ,-single
7/24/2019 Introduction to Bioinformatics-5
11/58
11
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
* t0ping error rate /er0 good t0pist$ 1 error : 1; t0ped letters
* all our diploid cells constantl0 reproduce billion letters
* t0pical cell cop0ing error rate is < 1 error :1 =bp
7/24/2019 Introduction to Bioinformatics-5
12/58
12
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
$ER+ LI!E
Re0erse time and fo""ow %our ce""s:
>%o? 0ou count < 1913cells
>,ne generation ago 0ou had 2 cells @some?hereA in 0our parents bod0>)mall Tgenerations ago 0ou had 2T5 multiple ancestorscells
>Large Tgenerations ago 0ou counted fertile ancestorscells>Congratulations$ 0ou are 3.4 billion 0ears old
1ast2forward time and fo""ow %our ce""s:
>,nl0 a fe? cells in 0our reproducti/e organs ha/e a chance to li/e on
in the net generations
>he rest including 0ou ?ill die D
7/24/2019 Introduction to Bioinformatics-5
13/58
13
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
$ER+ LI!E +UTTI)!3
This potentia""% immorta" "inea4e of ,4erm ce""s is
ca""ed the $ER+ LI!E
"" mutations that we ha0e accumu"ated are en routeon
the 4erm "ine
7/24/2019 Introduction to Bioinformatics-5
14/58
14
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
* o"%morphism$ multiple possibilities for a nucleotide$ a""e""e
* )ingle %ucleotide ol0morphism 5 )% snipF point mutation
eample$ '''''' /s '''C'''
* Gumans$ )% H 1:1#99 bases H 9.9-
* )" H )hort andem "epeats microsatelites
eample$ C'C'C'C'C'C'C'C'C' D
* ransition & trans/ersion
7/24/2019 Introduction to Bioinformatics-5
15/58
1#
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
urines 6 %rimidines
7/24/2019 Introduction to Bioinformatics-5
16/58
1-
Introduction to Bioinformatics
#.1 +'"I'I,% I% (%' )E7!E%CE)
Transitions 6 Trans0ersions
7/24/2019 Introduction to Bioinformatics-5
17/58
1
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
5.7 Mitochondrial DNA
* mitochondriae are inherited onl0 /ia the maternal line
* +er0 suitable for comparing e/olutionJ not reshuffled
7/24/2019 Introduction to Bioinformatics-5
18/58
16
Introduction to Bioinformatics
#.2 KI,CG,%("I'L (%'
H.sapiensmitochondrion
I d i Bi i f i
7/24/2019 Introduction to Bioinformatics-5
19/58
18
Introduction to Bioinformatics
#.2 KI,CG,%("I'L (%'
E+ photo4raph of 8. 3apiens mt!
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
20/58
29
Introduction to Bioinformatics
#.2 KI,CG,%("I'L (%'
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
21/58
21
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
5.9 Variation between species
* 4enetic 0ariationaccounts for morpho"o4ica"2
ph%sio"o4ica"2beha0iora"0ariation
* $enetic/ariation c.. distance relates to ph%"o4enetic
re"ationHrelationship
* %ecessit0 to measure distances bet?een seuences$ ametric
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
22/58
22
Introduction to Bioinformatics
#.3 +'"I'I,% BEEE% )ECIE)
3ubstitution rate
* Kutations originate in single indi/iduals
* Kutations can become fiedin a population
* +utation rate$ rate at ?hich ne? mutations arise
* 3ubstitution rate$ rate at ?hich a species fies ne? mutations
* Mor neutra" mutations
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
23/58
23
Introduction to Bioinformatics
#.3 +'"I'I,% BEEE% )ECIE)
3ubstitution rate and mutation rate
* Mor neutra" mutations
* N H 2%O*1:2% H O
* N H ;:2
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
24/58
24
Introduction to Bioinformatics
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
5.; Estimating genetic distance
* )ubstitutions are independent P
* )ubstitutions are random
* Kultiple substitutions ma0 occur
* Back&mutations mutate a nucleotide back to an earlier /alue
I t d ti t Bi i f ti
7/24/2019 Introduction to Bioinformatics-5
25/58
2#
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
Kultiple substitutions and Back&mutations
concealthe real genetic distance
GACTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT
TTCTGATCCACCTCTGATCCATCGGAACTGATCGT
GTCTGATCCACCTCTGATCCATTGGAACTGATCGTobser0ed : 7 ,- d
actua" : ; ,- K
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
26/58
2-
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
* 3aturation$ on a/erage one substitution per site
* ?o random seuences of eual length ?ill match
for approimatel0 Q of their sites
* In saturation therefore the proportional genetic
distance is Q
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
27/58
2
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
* Truegenetic distance proportion$ K
* )bser0edproportion of differences$ d
* (ue to back&mutations K d
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
28/58
26
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
3E
7/24/2019 Introduction to Bioinformatics-5
29/58
28
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
!he "u#es$%antor modelCorrection for multiple substitutions
)ubstitution probabilit0persitepersecond is >
)ubstitution means there are 3 possible replacements
e.g. C R S'J=JT
%on&substitution means there is 1 possibilit0e.g. C R C
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
30/58
39
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
hereforeJ the one&step Karko/ process has the follo?ingtransition matri$
MJC=
A C G T
A 1- /3 /3 /3
C /3 1- /3 /3
G /3 /3 1- /3
T /3 /3 /3 1-
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
31/58
31
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
'fter tgenerations the substitution probabilit0 is$
Mt= MJCt
Ei4en20a"uesand ei4en20ectorsof Mt$
V1H 1Jmultiplicit0 1$ 01H 1:4 1 1 1 1
V2..4
H 1&4W:3Jmultiplicit0 3$ 02
H 1:4 &1 &1 1 1
03H 1:4 &1 &1 &1 1
04H 1:4 1 &1 1 &1
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
32/58
32
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL3pectra" decompositionof Mt$
MJCt= ii
t0i0i
(efine Mt as$
MJCt=
hereforeJ substitution probabilit0 st per site after t
generations is$
st= Q & Q1 & 4W:3t
r(t) s(t) s(t) s(t)s(t) r(t) s(t) s(t)
s(t) s(t) r(t) s(t)
s(t) s(t) s(t) r(t)
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
33/58
33
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
substitution probabilit0 st per site after t generations$
st= Q & Q1 & 4W:3t
obser&edgenetic distance dafter t generations X st $
d = Q & Q1 & 4W:3t
Mor small W$( )dt
341ln
4
3
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
34/58
34
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
Mor small Wthe obser0edgenetic distance is$
he actua"genetic distance is of course$
K = t
)o$
his is the ?u=es2Cantor formu"a$ independent of and t.
( )dt341ln
4
3
( )dK34
43 1ln
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
35/58
3#
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
he ?u=es2Cantor formu"a$
Mor sma""dusing ln1Yx Xx$ K d)o$ actual distance observed distance
Mor saturation$ dZ [ $ K R\)o$ if observed distancecorresponds to random seuence&distance then the actual distance becomes indeterminate
( )dK 3443 1ln
7/24/2019 Introduction to Bioinformatics-5
36/58
3-
?u=es2Cantor
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
37/58
3
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
Variance in K
If$K = f(d)then$
)o$
=eneration of a seuence of length n ?ith substitution rate
dis a binomial process$
and therefore ?ith /ariance$ Var(d) = d(1-d)/n
Because of the Uukes&Cantor formula$
knk ddk
nk
= )1()(Prob
dd
K
341
1
=
)(Var)(Var
2
dd
KK
=
2
2
2d
d
KKd
d
KK
=
=
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
38/58
36
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
Variance in K
+ariance$ Var(d) = d(1-d)/n
Uukes&Cantor$
)o$
dd
K
341
1
=
2
34 )1(
)1()(Var dn
dd
K
7/24/2019 Introduction to Bioinformatics-5
39/58
38
Var,K
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
40/58
49
Introduction to Bioinformatics
#.4 GE U!;E)&C'%," K,(EL
E'AM()E *.+ on page ,-
* Create artificial data ?ith nH 1999$ generate K*mutations
* Count d
* ith Uukes&Cantor relation reconstruct estimate Kd
* lot Kd K
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
41/58
41
Introduction to Bioinformatics
#.4 E]'KLE #.4 on page 89
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
42/58
42
Introduction to Bioinformatics
#.4 E]'KLE #.4 on page 89
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
43/58
43
Introduction to Bioinformatics
#.4 E]'KLE #.4 on page 89
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
44/58
44
Introduction to Bioinformatics
#.4 E]'KLE #.4 on page 89 H 1I$ 5.9
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
45/58
4#
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
!he Kimura $parameter model
Include substitution bias in correction factor
Transitionprobabilit0 =^' and ^Cpersitepersecondis /
Trans0ersionprobabilit0 =^J =^CJ '^J and '^Cper
sitepersecond is0
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
46/58
4-
Introduction to Bioinformatics
#.4 GE ;IK!"' 2&'"'K K,(EL
he one&step Karko/ process substitution matrino? becomes$
MK!"=
A C G TA 1--
C 1--
G 1--
T 1--
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
47/58
4
Introduction to Bioinformatics
#.4 GE ;IK!"' 2&'"'K K,(EL
'fter tgenerations the substitution probabilit0 is$
Mt= MK!"t
(etermine of Mt$
ei4en20a"uesSiT
and ei4en20ectorsS&iT
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
48/58
46
Introduction to Bioinformatics
#.4 GE ;IK!"' 2&'"'K K,(EL
3pectra" decompositionof Mt$
MK!"t= ii
t0i0i
(etermine fraction of transitionsper site after tgenerations $P(t)
(etermine fraction of transitionsper site after t
generations $ Q(t)
=enetic distance$K - ln(1-2P-Q) ln(1 2Q)
Mraction of substitutionsd = P + Q Jukes-Cantor
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
49/58
48
Introduction to Bioinformatics
#.4 E)IK'I%= =E%EIC (I)'%CE
1ther models 2or nucleotide e&olution* (ifferent t0pes of transitions:trans/ersions
* air?ise substitutions =" H =eneral ime "e/ersible model
* 'mino&acid substitutions matrices
* D
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
50/58
#9
#.4 E)IK'I%= =E%EIC (I)'%CE
1ther models 2or nucleotide e&olution
E1ICIT$
all abo/e models assume s0mmetric substitution probs_
prob'R H probR'
%o? strong e/idence that this assumption is nottrue
Cha""en4e: incorporate this in a se"f2consistent mode"
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
51/58
#1
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
5.5 %A3E 3!4D56 Neanderthals
* mt(%' of 29- #$ sapiensfrom different regions
* Mragments of mt(%' of 2 #$ neandert%aliensis& includingthe original 16#- specimen.
* all 296 samples from =enBank
* ' homologous seuence of 699 bp of the G+" could befound in all 296 specimen.
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
52/58
#2
#.# C')E )!(`$ 'eandert%als
* air?ise genetic difference 5 corrected ?ith Uukes&Cantor
formula
* di&( is UC&corrected genetic difference bet?een pair i&()*
* dH d
K() Multi +i,ensional caling$ translate distance table
d to a n(&map.J here 2(&map
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
53/58
#3
#.# C')E )!(`$ 'eandert%als
distance map d,i#7
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
54/58
#4
#.# C')E )!(`$ 'eandert%als
MDS
H. sapiens
H. neanderthaliensiswe""2se
parated
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
55/58
##
#.# C')E )!(`$ 'eandert%als
phylogentic tree
7/24/2019 Introduction to Bioinformatics-5
56/58
#-
E! of LECTURE 5
Introduction to Bioinformatics
7/24/2019 Introduction to Bioinformatics-5
57/58
#
LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%
7/24/2019 Introduction to Bioinformatics-5
58/58
#6