Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 1
Mon, Wed 11:30 AM - 12:50, on Zoom*Prof: Gill BejeranoCA: Boyoung (Bo) Yoo* Track class on Piazza
CS273A
Gill Lecture 6: Repeats II
The Human GenomeSource Code
1
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 2
Announcements
• Homework 3 is due 2/24 11:59PM PST
• Error in the homework 3 writeup– In Question 2.3, the line count we gave for hg19_lung_gtex.bed was not uniquely sorted. This file should be uniquely sorted and have 108,874 lines – Also, the overlapping regions (the result from bedtoolsintersect) should be uniquely sorted
• Reminder: midterm survey will close today at 5PM (anonymous!)
2
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 3
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
3
2
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 4
Class Topics(0) Genome context:
cells, DNA, central dogma(1) Genome content / genome function:
genes, gene regulation, epigenetics, repeats, SARS-CoV-2(2) Genome sequencing:
technologies, assembly/analysis, technology dependence (3) Genome evolution:
evolution = mutation + selection, main forces of evolution:Neutral evolution, Negative selection, Positive selection
(4) Population genomics:Human migration, paternity testing, forensics, cryptogenomics
(5) Genomics of human disease:personal genomics, GxE disease types, deep dive monogenics
(6) Comparative Genomics :Genomics of amazing animal adaptations, ultraconservation
good morning!
4
We are technically DONE with genome function
Biology – not that complicated!!
Functional part list • In our genome:
• Gene• Protein coding• Non coding / RNA genes
• Gene regulatory elements• “Atomic” event: transcription factor binding site• Build up: promoters, enhancers, silencers, gene reg. domain
• “Around” our genome• Chromatin – open / closed• Epigenomic (and some epigenetic) marks
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 5
5
The Functional Genome
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 6
Type # in genome % of genomegenes 20,000 2-3%
ncRNA 20,000+ 2%
cis elements 1,000,000 10-15%
Corollary: most of the genome is devoid of function (which we understand)
6
3
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 7
Classes of Interspersed Repeats
7
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 8
LINE & SINE Elements
8
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 9
LINE & SINE Elements
9
4
SARS-CoV-2 integration into human genome?
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 10
2020
10
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 11
Typical RT work…
11
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 12
Genomic Transmission
For repeat copies to accumulate through human generations they must make it into the germline cells (eggs & sperms).
Equally true for any genomic mutation.
cell
genome =all DNA
chicken ≈ 1013 copies(DNA) of egg (DNA)
chicken
eggegg
egg
celldivision
DNA strings =Chromosomes
12
5
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 13
DNA Transposons
13
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 14
Retrovirus-like Elements
14
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 15
Every Genome is DifferentDNA Replication is imperfect – between individuals of the same species, even between the cells of an individual.
...ACGTACGACTGACTAGCATCGACTACGA...
chicken
egg ...ACGTACGACTGACTAGCATCGACTACGA...
functionaljunk
TT CAT
“anythinggoes”
many changesare not tolerated
chicken
This has bad implications – disease, and good implications – adaptation.
15
6
TE composition and assortment vary among eukaryotic genomes
20%
40%
60%
80%
100%
Slime
mol
dBud
ding
yea
stFi
ssio
n ye
ast
Neuro
spor
aAra
bido
psis
RiceNem
atod
eDro
soph
ila
Mos
quito
Fugu
Mou
seHum
an
DNA transposons
LTR Retro.
Non-LTR Retro.
Feschotte & Pritham 200616http://cs273a.stanford.edu [Bejerano Fall09/10]
16
Repeats: mostly neutralMost repeat events/instances are neutral.
Ie, a repeat instance is dropped in a new place, and joins the rest of the neutral DNA, gradually decaying over time.
Many repeat copies are “dead as a duck” on arrival at their new location (eg 5’ truncation).
Some instances may be active (spawn new instances) for a while, but when an active copy is hit by a mutation – the host is not affected, the instance is inactivated and decays away.
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 17
17
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 18
Repeat Ages
18
7
Figure from Ryan Gregory (2005)
INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS
19http://cs273a.stanford.edu [Bejerano Winter 2019/20]
19
The amount of TE correlate positively with genome size
Plas
mod
ium
Slim
e m
old
Budd
ing
yeas
t
Fiss
ion
yeas
t
Neur
ospo
ra
Arab
idop
sis
Bras
sica
Rice
Mai
zeNe
mat
ode
Dros
ophi
laM
osqu
itoSe
a sq
uirt
Zebr
afis
hFu
guM
ouse
Hum
an
0
500
1000
1500
2000
2500
3000 Genomic DNA
TE DNA
Protein-codingDNA
Mb
Feschotte & Pritham 200620http://cs273a.stanford.edu [Bejerano Fall09/10]
20
TEs
Protein-coding genes
The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size
Gregory, Nat Rev Genet 2005 21
21
8
Repeats: not just neutralSo far we treated all repeat proliferation events as neutral.
While the majority of them appear to be neutral, this is certainly not the case for all repeat instances.
And because there are so many repeat instances even a small fraction of all repeats can be a big set compared to other types of elements in the genome.(Eg, 1% of ½ the genome is still a lot)
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 22
22
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 23
23
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 24
24
9
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 25
Repeats & Retroposed Genes
Remember how LINEs reverse transcribe copies of themselves back into the genome? How they sometimes reverse transcribe SINEs “by mistake”? Well, they also grab m/ncRNAs and reverse transcribe them into the genome!
Retrogenes (“retrotranscribed”):Protein coding RNA that was reverse transcribed and inserted back into the genome.The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).
25
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 26
Retroposed Genes & Pseudogenes
Pseudogenes (“dead genes”):Genomic sequences that resemble (originated from) genes that no longer make proteins.
Retrogenes (“retrotranscribed”):Protein coding RNA that was reverse transcribed and inserted back into the genome.The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).
26
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 27
Repeat Insertions Can “Break Things”
27
10
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 28
Repeat Insertions Can “Make Things”
28
Any Sequence Can Become Functional
Random mutation (especially in a large place like our genome) can create functional DNA elements out of neutrally evolving sequences.
So is there anything special about a piece of DNA from a repetitive origin that takes on a new function?
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 29
29
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 30
Regulatory elements from obile Elements
[Yass is a small town in New South Wales, Australia.]
Co-option event, probably due to favorable genomic context
30
11
Origins, Current Function, Cooption
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 31
31
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 32
Britten & Davidson Hypothesis: Repeat to Rewire!
Enhancer structure reminder
32
The Road to Co-Option
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 33
Transposition Event
Random Mutations
Neutral decay
PotentialCo-OptionStates
33
12
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 34
Assemby Challenges
34
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 35
Inferring Phylogeny Using Repeats
[Nishihara et al, 2006]
35
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 36
Transposons as Genetics Engineering Tools
Human Gene Therapy
36
13
Repeats: fun conspiracy theories1. Repeats wreck so much havoc in the genome, by inserting themselves, deleting segments between instances and more – they make the genome feel like a “rolling sea”. Maybe it is because of them that enhancers evolved to work irrespective of distance and orientation?
2. When the last active copy of a repeat dies, all instances of the repeat are now decaying. Wait long enough and they lose resemblance to each other. Look in 200My and you never know they belonged to the same repeat family. So… if half the genome is recognizable as repetitive now, how much of the genome originated from repeats? Most of it?
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 37
37
Repeats: fun conspiracy theories3. If repeats do significantly accelerate the rate of creation of novel functional (gene/regulation) elements – how many functional elements today came from repeats (including old ones we no longer can recognize as such)? Most?
4. Is that why our genome “tolerates” these elements?
5. You make a conspiracy theory…
6. You think of ways* to solve one!
* Computationally. Evolution is mostly computational business.
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 38
38
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 39
II. Simple Repeats
•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.
•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.
•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.
•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.
•Highly variable between species: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.
AAAAAAAAA
CACACACAC
CAACAACAA
39
14
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 40
DNA Replication
40
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 41
Simple Repeats Create Funky DNA structures
41
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 42
These Bumps Give The DNA Polymerase Hiccups
42
15
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 43
Expandable Repeats and Disease
43
Restriction Enzymes• Restriction enzymes recognize and make a cut within
specific DNA sequences, known as restriction sites. • This is usually a 4-6 base pair palindromic sequence.• Naturally found in different types of bacteria• Bacteria use restriction enzymes to protect themselves
from foreign DNA • Many have been isolated and sold for use in lab work
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 44
blunt end
sticky end
44
DNA Fingerprint BasicsDNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.
45
45
16
DNA fragments are then separated based on size using gel electrophoresis.
46
46
DNA Fingerprinting can be used in paternity testing or murder cases.
47
47
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 48
There are Tracks for it
48
17
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 49
Interspersed vs. Simple RepeatsFrom an evolutionary point of view transposons and simple repeats are very different.
Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).
Different instances of the same simple repeat most often do not.
49
Now you really know most everythingIn the Genome:Genes (up to 5% of genome)
coding and non coding (exons, introns)Gene regulation (15% of genome)
proteins: transcription factors, chromatin remodelers, ...RNA genes: microRNAs, antisense, guide RNAs…DNA elements: TF binding sites, promoters, enhancers, ...
Repetitive sequences (50% of genome)Interspersed repeats (transposons that hop around)Simple repeats (local replication “sore spots”)
Categories are not mutually exclusive.Function comes & goes with evolution = mutation + selection
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 50
50
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 51
Class Topics(0) Genome context:
cells, DNA, central dogma(1) Genome content / genome function:
genes, gene regulation, epigenetics, repeats, SARS-CoV-2(2) Genome sequencing:
technologies, assembly/analysis, technology dependence(3) Genome evolution:
evolution = mutation + selection, main forces of evolution:Neutral evolution, Negative selection, Positive selection
(4) Population genomics:Human migration, paternity testing, forensics, cryptogenomics
(5) Genomics of human disease:personal genomics, GxE disease types, deep dive monogenics
(6) Comparative Genomics :Genomics of amazing animal adaptations, ultraconservation
good day!
51
18
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
52http://cs273a.stanford.edu [Bejerano Winter 2020/21]
52
Human Genome Project
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 53
1990: Start
2001: Draft
2003:“Finished” (really 50 years to W&C)$3 billion dollar
3 billion basepairs
53
DNA Sequencing: Plummeting Costs
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 54
54
19
100 million species
7 billion individuals
1013 cells in a human
We could never sequence enough
or sequence just an active portion
55http://cs273a.stanford.edu [Bejerano Winter 2020/21]
55
DNA Sequencers: Room-size to USB-size
http://cs273a.stanford.edu [Bejerano Winter 2020/21] 56
Illumina HiSeqX
PacBio RS II Oxford Nanopore MinIon
ABI 3730xl
1st gen
3rd gen
2nd gen
3rd gen
56