43
6.874/… Recitation 2 Courtesy of an MIT Teaching Assistant. 1

6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

6.874/… Recitation 2Courtesy of an MIT Teaching Assistant.

1

Page 2: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Reminders

• Pset 1 posted – due Feb 20th (no extra problem)

• Pset 2 posted – due Mar 13th

• Project teams due – Feb 25th

• Interests and background directory has been posted

• Lecture videos will be posted on MITx soon – next week?

2

Page 3: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Today

• Clustering (6.874 topic)

• Biology review

• Alignment

3

Page 4: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Courtesy of Macmillan Publishers Limited. Used with permission.

Source: Gkountela, Sofia, Ziwei Li, et al. "The Ontogeny of cKIT+ HumanPrimordial Germ Cells Proves to be a Resource for Human Germ Line

Reprogramming, Imprint Erasure and in Vitro Differentiation." Nature CellBiology 15, no. 1 (2013): 113-22.

4

Page 5: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Clustering – K-means

• Group points together based on how ‘close’ they are to each other

• Dataset of unlabelled points: 𝑋 = 𝑥1, 𝑥2, … , 𝑥𝑁 , 𝑥𝑛 ∈ 𝑅𝑑

• Assume K clusters – each is defined by a centroid 𝜇𝑘• 𝑟𝑛𝑘 = 1 if 𝑥𝑛 belongs to cluster k

• Find unknowns 𝜇𝑘 and 𝑟𝑛𝑘

5

Page 6: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

6

© Springer-Verlag. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Source: Hastie, Trevor, Robert Tibshirani, et al. The Elements of Statistical Learning

Springer-Verlag 2, no. 1, 2009.

.

Page 7: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

7

Page 8: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.8

Page 9: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.9

Page 10: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.10

Page 11: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.11

Page 12: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.12

Page 13: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.13

Page 14: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.14

Page 15: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.15

Page 16: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.16

Page 17: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Courtesy of Macmillan Publishers Limited. Used with permission.

Source: D'haeseleer, Patrik. "How Does Gene Expression Clustering Work?."

Nature Biotechnology 23, no. 12 (2005): 1499-1502.

17

Page 18: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Hierarchical clustering

• Organize data in a tree• Leaves are individual

genes/species

• Path lengths between leaves are distances

• Similar points should lie in same lower subtrees

• Used to reveal evolutionary history of sequences

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.18

Page 19: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

19

Page 20: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Hierarchical clustering

• What dissimilarity measure should be used?• To compare individual points

• What type of linkage should be used?• To compare clusters with each other

• Where do we cut dendrogram to obtain clusters?

20

Page 21: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© Springer-Verlag. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Source: Hastie, Trevor, Robert Tibshirani, et al. The Elements of Statistical Learning.Springer-Verlag 2, no. 1, 2009.

21

Page 22: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

22

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Page 23: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Complete 𝑑 𝐴, 𝐵 = max𝑥∈𝐴,𝑦∈𝐵

𝑑(𝑥, 𝑦) Maximal interclusterdissimilarity

Single 𝑑 𝐴, 𝐵 = min𝑥∈𝐴,𝑦∈𝐵

𝑑(𝑥, 𝑦) Minimal interclusterdissimilarity

Average 𝑑 𝐴, 𝐵 =

𝑥∈𝐴,𝑦∈𝐵

𝑑(𝑥, 𝑦)

𝐴 𝐵

Average interclusterdissimilarity(UPGMA)

Centroid

𝑑 𝐴, 𝐵

= 𝑑 𝑥∈𝐴

𝑥𝐴 , 𝑦∈𝐵

𝑦𝐵

Dissimilarity betweencentroids of each cluster

23

Page 24: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.24

Page 25: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Model selection

• Hierarchical clustering – cutoff tree at certain point

• K-means – how to choose K?• Balance number of clusters (# of parameters – K=n is uninformative) and the

variance of the clusters

• BIC Score – general model selection criterion

• BIC = −2 × loglikelihood + d × log(N)

• Can use to decide whether to split a cluster

• Computer BIC score of cluster and two potential child clusters – if BIC score is lower after split, do not accept split

25

Page 26: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Biclustering

• Simultaneous clustering of rows and columns of a matrix

• Bicluster – subset of rows which exhibit similar behavior across a subset of columns, or vice versa

26

Page 27: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

© IEEE. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.Source: Madeira, Sara C., and Arlindo L. Oliveira. "Biclustering Algorithms for Biological Data Analysis: A Survey." Computational Biology and Bioinformatics,IEEE/ACM Transactions on 1, no. 1 (2004): 24-45.

27

Page 28: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Courtesy of Macmillan Publishers Limited. Used with permission.

Source: Gkountela, Sofia, Ziwei Li, et al. "The Ontogeny of cKIT+ Human Primordial Germ Cells Proves to be a Resource for

Human Germ Line Reprogramming, Imprint Erasure and in Vitro Differentiation." Nature Cell Biology 15, no. 1 (2013): 113-22.28

Page 29: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Biology Review

29

Page 30: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Selection

• Negative selection (purifying/natural selection) – removal of deleterious traits

• Positive selection – increases prevalence of adaptive traits

• Thinking about selection happening at different levels• Protein level: Sequence -> Structure -> Function

• RNA level: splicing, degradation/processing (NMD)

• DNA level: DNA-protein binding sites

30

Page 31: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Synonymous/Non-synonymous mutations

• Redundancy built into the genetic code

• Synonymous – one base changes

s

Second Position

U C A G

UUU UCU UAU UGU UPhe Tyr CysUUC UCC UAC UGC CU SerUUA UCA UAA Stop UGA Stop ALeuUUG UCG UAG Stop UGG Trp G

CUU CCU CAU CGU UHisCUC CCC CAC CGC C

C Leu Pro ArgFirst CUA CCA CAA CGA A ThirdGlnPosition CUG CCG CAG CGG G Position

(5' end) (3' end)AUU ACU AAU AGU UAsn SerAUC Ile ACC AAC AGC CA ThrAUA ACA AAA AGA ALys ArgAUG Met ACG AAG AGG G

GUU GCU GAU GGU UAspGUC GCC GAC GGC CG Val Ala GlyGUA GCA GAA GGA AGluGUG GCG GAG GGG G

Image by MIT OpenCourseWare.

for another in an exon, but the resulting amino acid sequence iunchanged

• Non-synonymous – new AA

• Can affect splicing, mRNA processing - so may not be silent

31

Page 32: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Side-chain biochemistry

• Amino acids classified by properties of side chains• Grouped by general properties

• Substitutions of amino acid with another of similar chemical properties may conserve protein function

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.32

Page 33: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Side chain size (Trp – W)

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.33

Page 34: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Disulfide bond

• Important in protein folding –holds two distant portions of protein together

• Occurs between Cys residues

34

Page 35: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Next-generation sequencing

• Sequencing is always of DNA• Need to convert RNA to DNA by reverse transcription (RT)

• Illumina is current leader in the field• 8 lanes on a flow cell

• Each lane can sequence 200 million 100bp reads – 20 Gbps!

• Can sequence multiple samples per lane by barcoding

• Requires (heterogeneous) population of cells to get enough DNA for sample

• Single cell sequencing applications are becoming more common (RNAseq)

• Single molecule technologies are still being developed – PacBio

35

Page 36: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Alignment

36

Page 37: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Alignment

Biological Sequence Analysis - Durbin37

Page 38: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Local alignment example

Do a local alignment between these using PAM250 and gap penalty -2:

AWEK

FWEF

© unknown source. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.38

Page 39: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Local alignment solution

alignment: W EW E

39

Page 40: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Global alignment solution

alignment: A W E KF W E F

40

Page 41: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Global Semiglobal Local (gapped)

Penalties at edges? Yes No No

Reset to 0 instead of including

negative entries?

No No Yes

End of alignment Bottom right entry Highest score entry in bottom row or rightmost column

Highest score entry in matrix

41

Page 42: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

Reminders

• Pset 1 posted – due Feb 20th (no extra problem)

• Pset 2 posted – Due Mar 13th

• Project teams due – Feb 25th

• Interests and background directory has been posted

• Lecture videos will be posted on MITx soon – next week?

42

Page 43: 6.874/… Recitation 2 - MIT OpenCourseWare · 2020. 1. 4. · traits •Positive selection –increases prevalence of adaptive traits •Thinking about selection happening at different

MIT OpenCourseWarehttp://ocw.mit.edu

7.91J / 20.490J / 20.390J / 7.36J / 6.802 / 6.874 / HST.506 Foundations of Computational and Systems BiologySpring 2014

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.