Introduction to Bioinformatics-5

Embed Size (px)

Citation preview

  • 7/24/2019 Introduction to Bioinformatics-5

    1/58

    1

    Introduction to

    Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    2/58

    2

    Introduction to Bioinformatics.

    LECTURE 5: Variation within and betweenspecies

    * Chapter 5: re Neanderthals among us?

  • 7/24/2019 Introduction to Bioinformatics-5

    3/58

    3

    !eanderta"# $erman%# &'5(

    Initia" interpretations:

    * bear skull

    * pathological idiot

    * )"d utchman...

    http://upload.wikimedia.org/wikipedia/commons/3/37/Neandertal_1856.jpg
  • 7/24/2019 Introduction to Bioinformatics-5

    4/58

    4

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

  • 7/24/2019 Introduction to Bioinformatics-5

    5/58

    #

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

  • 7/24/2019 Introduction to Bioinformatics-5

    6/58

    -

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

  • 7/24/2019 Introduction to Bioinformatics-5

    7/58

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

    5.& Variation in DNA sequences

    * E/en closel0 related indi/iduals differ in genetic seuences

    * point mutations $ cop0 error at certain location

    * )eual reproduction 5 diploid genome

  • 7/24/2019 Introduction to Bioinformatics-5

    8/58

    6

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    ip"oid chromosomes

  • 7/24/2019 Introduction to Bioinformatics-5

    9/58

    8

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    +itosis: dip"oid reproduction

  • 7/24/2019 Introduction to Bioinformatics-5

    10/58

    19

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    +eiosis: dip"oid ,-double / hap"oid ,-single

  • 7/24/2019 Introduction to Bioinformatics-5

    11/58

    11

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    * t0ping error rate /er0 good t0pist$ 1 error : 1; t0ped letters

    * all our diploid cells constantl0 reproduce billion letters

    * t0pical cell cop0ing error rate is < 1 error :1 =bp

  • 7/24/2019 Introduction to Bioinformatics-5

    12/58

    12

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    $ER+ LI!E

    Re0erse time and fo""ow %our ce""s:

    >%o? 0ou count < 1913cells

    >,ne generation ago 0ou had 2 cells @some?hereA in 0our parents bod0>)mall Tgenerations ago 0ou had 2T5 multiple ancestorscells

    >Large Tgenerations ago 0ou counted fertile ancestorscells>Congratulations$ 0ou are 3.4 billion 0ears old

    1ast2forward time and fo""ow %our ce""s:

    >,nl0 a fe? cells in 0our reproducti/e organs ha/e a chance to li/e on

    in the net generations

    >he rest including 0ou ?ill die D

  • 7/24/2019 Introduction to Bioinformatics-5

    13/58

    13

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    $ER+ LI!E +UTTI)!3

    This potentia""% immorta" "inea4e of ,4erm ce""s is

    ca""ed the $ER+ LI!E

    "" mutations that we ha0e accumu"ated are en routeon

    the 4erm "ine

  • 7/24/2019 Introduction to Bioinformatics-5

    14/58

    14

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    * o"%morphism$ multiple possibilities for a nucleotide$ a""e""e

    * )ingle %ucleotide ol0morphism 5 )% snipF point mutation

    eample$ '''''' /s '''C'''

    * Gumans$ )% H 1:1#99 bases H 9.9-

    * )" H )hort andem "epeats microsatelites

    eample$ C'C'C'C'C'C'C'C'C' D

    * ransition & trans/ersion

  • 7/24/2019 Introduction to Bioinformatics-5

    15/58

    1#

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    urines 6 %rimidines

  • 7/24/2019 Introduction to Bioinformatics-5

    16/58

    1-

    Introduction to Bioinformatics

    #.1 +'"I'I,% I% (%' )E7!E%CE)

    Transitions 6 Trans0ersions

  • 7/24/2019 Introduction to Bioinformatics-5

    17/58

    1

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

    5.7 Mitochondrial DNA

    * mitochondriae are inherited onl0 /ia the maternal line

    * +er0 suitable for comparing e/olutionJ not reshuffled

  • 7/24/2019 Introduction to Bioinformatics-5

    18/58

    16

    Introduction to Bioinformatics

    #.2 KI,CG,%("I'L (%'

    H.sapiensmitochondrion

    I d i Bi i f i

  • 7/24/2019 Introduction to Bioinformatics-5

    19/58

    18

    Introduction to Bioinformatics

    #.2 KI,CG,%("I'L (%'

    E+ photo4raph of 8. 3apiens mt!

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    20/58

    29

    Introduction to Bioinformatics

    #.2 KI,CG,%("I'L (%'

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    21/58

    21

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

    5.9 Variation between species

    * 4enetic 0ariationaccounts for morpho"o4ica"2

    ph%sio"o4ica"2beha0iora"0ariation

    * $enetic/ariation c.. distance relates to ph%"o4enetic

    re"ationHrelationship

    * %ecessit0 to measure distances bet?een seuences$ ametric

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    22/58

    22

    Introduction to Bioinformatics

    #.3 +'"I'I,% BEEE% )ECIE)

    3ubstitution rate

    * Kutations originate in single indi/iduals

    * Kutations can become fiedin a population

    * +utation rate$ rate at ?hich ne? mutations arise

    * 3ubstitution rate$ rate at ?hich a species fies ne? mutations

    * Mor neutra" mutations

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    23/58

    23

    Introduction to Bioinformatics

    #.3 +'"I'I,% BEEE% )ECIE)

    3ubstitution rate and mutation rate

    * Mor neutra" mutations

    * N H 2%O*1:2% H O

    * N H ;:2

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    24/58

    24

    Introduction to Bioinformatics

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

    5.; Estimating genetic distance

    * )ubstitutions are independent P

    * )ubstitutions are random

    * Kultiple substitutions ma0 occur

    * Back&mutations mutate a nucleotide back to an earlier /alue

    I t d ti t Bi i f ti

  • 7/24/2019 Introduction to Bioinformatics-5

    25/58

    2#

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    Kultiple substitutions and Back&mutations

    concealthe real genetic distance

    GACTGATCCACCTCTGATCCTTTGGAACTGATCGT

    TTCTGATCCACCTCTGATCCTTTGGAACTGATCGT

    TTCTGATCCACCTCTGATCCATCGGAACTGATCGT

    GTCTGATCCACCTCTGATCCATTGGAACTGATCGTobser0ed : 7 ,- d

    actua" : ; ,- K

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    26/58

    2-

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    * 3aturation$ on a/erage one substitution per site

    * ?o random seuences of eual length ?ill match

    for approimatel0 Q of their sites

    * In saturation therefore the proportional genetic

    distance is Q

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    27/58

    2

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    * Truegenetic distance proportion$ K

    * )bser0edproportion of differences$ d

    * (ue to back&mutations K d

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    28/58

    26

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    3E

  • 7/24/2019 Introduction to Bioinformatics-5

    29/58

    28

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    !he "u#es$%antor modelCorrection for multiple substitutions

    )ubstitution probabilit0persitepersecond is >

    )ubstitution means there are 3 possible replacements

    e.g. C R S'J=JT

    %on&substitution means there is 1 possibilit0e.g. C R C

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    30/58

    39

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    hereforeJ the one&step Karko/ process has the follo?ingtransition matri$

    MJC=

    A C G T

    A 1- /3 /3 /3

    C /3 1- /3 /3

    G /3 /3 1- /3

    T /3 /3 /3 1-

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    31/58

    31

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    'fter tgenerations the substitution probabilit0 is$

    Mt= MJCt

    Ei4en20a"uesand ei4en20ectorsof Mt$

    V1H 1Jmultiplicit0 1$ 01H 1:4 1 1 1 1

    V2..4

    H 1&4W:3Jmultiplicit0 3$ 02

    H 1:4 &1 &1 1 1

    03H 1:4 &1 &1 &1 1

    04H 1:4 1 &1 1 &1

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    32/58

    32

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL3pectra" decompositionof Mt$

    MJCt= ii

    t0i0i

    (efine Mt as$

    MJCt=

    hereforeJ substitution probabilit0 st per site after t

    generations is$

    st= Q & Q1 & 4W:3t

    r(t) s(t) s(t) s(t)s(t) r(t) s(t) s(t)

    s(t) s(t) r(t) s(t)

    s(t) s(t) s(t) r(t)

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    33/58

    33

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    substitution probabilit0 st per site after t generations$

    st= Q & Q1 & 4W:3t

    obser&edgenetic distance dafter t generations X st $

    d = Q & Q1 & 4W:3t

    Mor small W$( )dt

    341ln

    4

    3

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    34/58

    34

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    Mor small Wthe obser0edgenetic distance is$

    he actua"genetic distance is of course$

    K = t

    )o$

    his is the ?u=es2Cantor formu"a$ independent of and t.

    ( )dt341ln

    4

    3

    ( )dK34

    43 1ln

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    35/58

    3#

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    he ?u=es2Cantor formu"a$

    Mor sma""dusing ln1Yx Xx$ K d)o$ actual distance observed distance

    Mor saturation$ dZ [ $ K R\)o$ if observed distancecorresponds to random seuence&distance then the actual distance becomes indeterminate

    ( )dK 3443 1ln

  • 7/24/2019 Introduction to Bioinformatics-5

    36/58

    3-

    ?u=es2Cantor

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    37/58

    3

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    Variance in K

    If$K = f(d)then$

    )o$

    =eneration of a seuence of length n ?ith substitution rate

    dis a binomial process$

    and therefore ?ith /ariance$ Var(d) = d(1-d)/n

    Because of the Uukes&Cantor formula$

    knk ddk

    nk

    = )1()(Prob

    dd

    K

    341

    1

    =

    )(Var)(Var

    2

    dd

    KK

    =

    2

    2

    2d

    d

    KKd

    d

    KK

    =

    =

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    38/58

    36

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    Variance in K

    +ariance$ Var(d) = d(1-d)/n

    Uukes&Cantor$

    )o$

    dd

    K

    341

    1

    =

    2

    34 )1(

    )1()(Var dn

    dd

    K

  • 7/24/2019 Introduction to Bioinformatics-5

    39/58

    38

    Var,K

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    40/58

    49

    Introduction to Bioinformatics

    #.4 GE U!;E)&C'%," K,(EL

    E'AM()E *.+ on page ,-

    * Create artificial data ?ith nH 1999$ generate K*mutations

    * Count d

    * ith Uukes&Cantor relation reconstruct estimate Kd

    * lot Kd K

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    41/58

    41

    Introduction to Bioinformatics

    #.4 E]'KLE #.4 on page 89

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    42/58

    42

    Introduction to Bioinformatics

    #.4 E]'KLE #.4 on page 89

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    43/58

    43

    Introduction to Bioinformatics

    #.4 E]'KLE #.4 on page 89

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    44/58

    44

    Introduction to Bioinformatics

    #.4 E]'KLE #.4 on page 89 H 1I$ 5.9

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    45/58

    4#

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    !he Kimura $parameter model

    Include substitution bias in correction factor

    Transitionprobabilit0 =^' and ^Cpersitepersecondis /

    Trans0ersionprobabilit0 =^J =^CJ '^J and '^Cper

    sitepersecond is0

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    46/58

    4-

    Introduction to Bioinformatics

    #.4 GE ;IK!"' 2&'"'K K,(EL

    he one&step Karko/ process substitution matrino? becomes$

    MK!"=

    A C G TA 1--

    C 1--

    G 1--

    T 1--

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    47/58

    4

    Introduction to Bioinformatics

    #.4 GE ;IK!"' 2&'"'K K,(EL

    'fter tgenerations the substitution probabilit0 is$

    Mt= MK!"t

    (etermine of Mt$

    ei4en20a"uesSiT

    and ei4en20ectorsS&iT

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    48/58

    46

    Introduction to Bioinformatics

    #.4 GE ;IK!"' 2&'"'K K,(EL

    3pectra" decompositionof Mt$

    MK!"t= ii

    t0i0i

    (etermine fraction of transitionsper site after tgenerations $P(t)

    (etermine fraction of transitionsper site after t

    generations $ Q(t)

    =enetic distance$K - ln(1-2P-Q) ln(1 2Q)

    Mraction of substitutionsd = P + Q Jukes-Cantor

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    49/58

    48

    Introduction to Bioinformatics

    #.4 E)IK'I%= =E%EIC (I)'%CE

    1ther models 2or nucleotide e&olution* (ifferent t0pes of transitions:trans/ersions

    * air?ise substitutions =" H =eneral ime "e/ersible model

    * 'mino&acid substitutions matrices

    * D

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    50/58

    #9

    #.4 E)IK'I%= =E%EIC (I)'%CE

    1ther models 2or nucleotide e&olution

    E1ICIT$

    all abo/e models assume s0mmetric substitution probs_

    prob'R H probR'

    %o? strong e/idence that this assumption is nottrue

    Cha""en4e: incorporate this in a se"f2consistent mode"

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    51/58

    #1

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

    5.5 %A3E 3!4D56 Neanderthals

    * mt(%' of 29- #$ sapiensfrom different regions

    * Mragments of mt(%' of 2 #$ neandert%aliensis& includingthe original 16#- specimen.

    * all 296 samples from =enBank

    * ' homologous seuence of 699 bp of the G+" could befound in all 296 specimen.

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    52/58

    #2

    #.# C')E )!(`$ 'eandert%als

    * air?ise genetic difference 5 corrected ?ith Uukes&Cantor

    formula

    * di&( is UC&corrected genetic difference bet?een pair i&()*

    * dH d

    K() Multi +i,ensional caling$ translate distance table

    d to a n(&map.J here 2(&map

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    53/58

    #3

    #.# C')E )!(`$ 'eandert%als

    distance map d,i#7

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    54/58

    #4

    #.# C')E )!(`$ 'eandert%als

    MDS

    H. sapiens

    H. neanderthaliensiswe""2se

    parated

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    55/58

    ##

    #.# C')E )!(`$ 'eandert%als

    phylogentic tree

  • 7/24/2019 Introduction to Bioinformatics-5

    56/58

    #-

    E! of LECTURE 5

    Introduction to Bioinformatics

  • 7/24/2019 Introduction to Bioinformatics-5

    57/58

    #

    LEC!"E #$ I%E"& '%( I%"')ECIE) +'"I'I,%

  • 7/24/2019 Introduction to Bioinformatics-5

    58/58

    #6