Upload
jdazuelos
View
213
Download
0
Embed Size (px)
Citation preview
8/9/2019 L20Biol261Genomics2014F
1/39
Lecture 22, Genomics
Genetic Analyses(1)FORWARD analysis began with classical transmission methods tracing the
inheritance of mutant alleles or chromosomes.
(2) REVERSEanalysis begins with mutating a reported gene sequence(knockout), tracing its inheritance and identifying its phenotypicexpression.
I find it sometimes very difficult to tell what someone means when they talk aboutgenes because we dont share the same definition, says developmental geneticistWilliam Gelbert of Harvard University in Cambridge, Massachusetts.
You have already struggled with multiple definitions of a gene, based on transmissionand cytological genetics. Now, using your understanding of molecular genetics, propose
a gene definition and explain why it would be consistent with the requirements of agene (see lecture 11), practical and comprehensive.
8/9/2019 L20Biol261Genomics2014F
2/39
2
Increasingly,forward genetics, change of functioninsertions, and gene knockouts or eventually
replacement, all depend on knowing something aboutsequence, and locationof a genein a genome
- TODAY Genomics and its derivatives:
8/9/2019 L20Biol261Genomics2014F
3/39
Genome: a (one) completeset of an organisms genetic
information, or usually, one complete set of chromosomes
(monoploid), sometimes, nuclear DNA content.
The original definition ofGenomics included:
(1)Mappingchromosomes
(2)Sequencingchromosomes and identifying genes
(3) Analyzing thefunctionsof entire genomes
Currently genomics is divided into several fields of study:
Structural genomics-the study of genome structure
Comparative genomics-the study of genome diversity/evolution
Bioinformatics- information from sequence structure
Functional genomics-transcriptome(complete set of RNAs
transcribed from a genome) and theproteome(the complete set
of proteins transcribed from a genome).
3
8/9/2019 L20Biol261Genomics2014F
4/39
But first, Sequencing the Human Genome
Clone by Clone method:Map first,
sequence later (publicly fundedHuman Genome Project)
Shotgun Method:Sequence
first, map later (Celeracorporation)
4
8/9/2019 L20Biol261Genomics2014F
5/39
Genetic mapsof chromosomes are based on recombination
frequencybetween markers:
Low density- limit ~ 1% recombination is a practical limit-
limited by the number of bodies you have to measure- for mostvisibly-expressed genes in eukaryotic breeding studies.
Higher densitygenetic maps use restriction sites,andgene
localization probesas landmarks.
5
8/9/2019 L20Biol261Genomics2014F
6/39
Genetic mapsof chromosomes start with
ordinal distance based on recombination
frequency between visible markers.
Low density Cytogenic (cytological,
ideogram) mapsare based on the location
of markers within or near cytological
features, microscopically visible.
6
8/9/2019 L20Biol261Genomics2014F
7/39
Low Density Genetic mapsof chromosomes are based on
recombination frequency between visible markers.
Cytogenic (cytological, ideogram) mapsare based on the location of
markers within or near cytological features.
High density mapsintegrate cytological and physical maps.
Anchor markerscorrelate the cytogenic mapsto the physical.Chromosome fragments
can beidentified by migration pattern (RFLP) , taggedusing PCRto amplify short,
unique(200-500bp)Sequence TaggedSites(STS), or short cDNAsequence probes
(ExpressedSequence Tags orEST) and Short (orSimple) Sequence Length
Polymorphisms (SSLPor short repetitive elements). These tagged fragments of known
sequence can be related to the cytogenic map (probewith complementary).
Physical maps are measured in base pairs, kilobase pairs or megabase pairs, they often
show the location of overlapping genomic fragment clones (contigs)and unique
sequences (STS).
7
8/9/2019 L20Biol261Genomics2014F
8/39
1996-7:Anchoring the
physical map:
2335 microsatellite (SSLP)
sites,16,000 STSmarkedloci &RFLPs used to map
1600 human genes
15Human chromosome 1 fig 4-20
Restriction
fragments
8
8/9/2019 L20Biol261Genomics2014F
9/39
Ordered Clone by Clone method:
map first sequence later:Screen large
clones (BAC) from a chromosome
library for known sites (restrictionsites, known genes, or other sequence
to anchor the map.
HindIII digest, agarose gel
electrophoresis on fragmented clone:
Stain, characterize fragments by
migration distance.
Share a partial fragment ? = overlap
(different clones share sequence)
use overlaptoorient the clone
fragments into a map
9
8/9/2019 L20Biol261Genomics2014F
10/39
Physical maps, are built by reconstructing the order of fragments cut by restriction enzymes. The
first cloning vector is usually a YAC. For example, five YACS were known to hybridize to 1
chromosome band ( 17q2 ). A restriction enzyme cutting an 8 base palindrome sequence having a
low sequence probability (on average 48 or every 66,000 bases) was used to cut the chromosome
fragment. The fragments were denatured and the 5 single stranded radioactive YACSwerehybridized to the blots of the digest to visualize the target chromosome fragments. The
autoradiogram is below. Order of band fragments?
1
2
3
Chromosome
band map of
RFLP fragments
The exposed
photographic image of
lanes (columns) corr-
esponding to the same
chromosome DNA
tested with 5 different
YAC probes.
10
+
-
8/9/2019 L20Biol261Genomics2014F
11/39
Map first sequence
later: clone by clone,or ordered clone
approach - public.
Minimum tiling path-fewest clones
necessary to get a
complete sequence
11
8/9/2019 L20Biol261Genomics2014F
12/39
Whole genome Shotgun method
Overlap: - the overlap
is determined directly
by sequence,not
indirectly by fragment
length.
Most of the genome is
sequenced many
(10-15) times to get the
correct overlap
Small insert clones
are prepared
directly from DNA
& sequenced
12
8/9/2019 L20Biol261Genomics2014F
13/39
(2)Paired end reads:each
clone isprimed fromtwo different ends of a
vector, which is known
and PCR the
intervening sequence,
producing an endtagged linear sequence.
End tagged, multiple
inserts may then be
overlapped to produce
a sequence contig
13
Where do the fragments overlap ?
8/9/2019 L20Biol261Genomics2014F
14/39
The overlapof sequences in regions of identity can be
used to make contiguous sequences.
(1) GATCTCGCCGCGTTGGAGAAGGACTACGAGGAGGTTGGCTCTGAGTCCGAC
TCTGAGTCCGACCCGTATCC
(2) ATGATGATGATGAGGATGGCGATGATGGTGACGAGTACTAG
AGGATGGCGATGATGGTGACGAGTACTAGAGGAGTCGTCGTCGTCTGGGGGCT
(3) TGATGTTCTGTGTGTCAAGGCCTGATTGATAACTGCTGCTATCCCATGATCTGCCAGTGT
14
8/9/2019 L20Biol261Genomics2014F
15/39
Problem?Repetitive sequence gaps,
Using clones containing
fragments of different sizes
(different restriction enzymes)
there will be overlap
15
8/9/2019 L20Biol261Genomics2014F
16/39
8/9/2019 L20Biol261Genomics2014F
17/39
Sequencing DNA fragments
8/9/2019 L20Biol261Genomics2014F
18/39
DNA SEQUENCING- Sanger Method: A cloned fragment of DNA
is sequenced by using:
(1)aspecificprimerpiece of DNA(oligonucleotide) to replicate theDNAfrom a known, pre-defined starting point.
(2)a spikeof radioactive dideoxy-nucleotides(ddATP or ddCTP,
ddGTP, ddTTP) are incorporated with excess normaldATP, dCTP,
dGTP, dTTP , + DNA polymerase,
The ddNTPs are randomly
incorporated and terminate
the strand elongation
(3)electrophoresis, visualizethe DNA
(4)read the fragment order (by size)
(5)reconstruct the complementaryoriginal sequence
17
8/9/2019 L20Biol261Genomics2014F
19/39
18
8/9/2019 L20Biol261Genomics2014F
20/39
8/9/2019 L20Biol261Genomics2014F
21/39
Automated Sequencing
uses flourescent tags for
each ddNTP reaction.
The sequencing reaction
can be done in a tube, and
it is read by a light
detector
20
8/9/2019 L20Biol261Genomics2014F
22/39
Pyrosequencingrequires single strand DNA (template), a DNA
primer DNA polymerase and dNTP. Read the sequence by the
chemiluminescence, powered by ATP produced by sulfurlase.
21
8/9/2019 L20Biol261Genomics2014F
23/39
The basic techniques for sequencing entire genomes:
(1) libraries (whole genome) (2) cloning vectors (3) PCR (4) DNA
sequencing machines (5) chromosome maps (6) computers
22
8/9/2019 L20Biol261Genomics2014F
24/39
The DNA sequence is the base for computer - assisted analyses
Structural genomicsinvolves the analysis of gene
sequence, gene number, order andphysical nature of chromosomes.
Comparative Genomics- similarity and divergence among genes with a
similar function in different species
Bioinformaticsis the use of computer analysis forstructural or functional genomics.
Proteomics the study of all the proteins of an organism.
Transcriptomics - transcript studies
Functional genomicsstudies the function of genes
gene expression, interactions between gene and proteins, and
between proteins
23
8/9/2019 L20Biol261Genomics2014F
25/39
Genomics is the study of genomes in their entirety.
Most Bacteria are now known by their sequence or
partial sequence, viruses can be sequenced in a day or two
Over 100 eukaryotic genomes have been sequenced including
Human MouseYeast Several fungi
Malaria parasite Mosquito
Arabidopsis Rice
Poplar tree
Many other species have a great deal of cDNA and gene sequences
especially ESTs - partial sequences of cDNA clones.
24
8/9/2019 L20Biol261Genomics2014F
26/39
8/9/2019 L20Biol261Genomics2014F
27/39
8/9/2019 L20Biol261Genomics2014F
28/39
27
93% gene similarity - many mutations are rearrangements
8/9/2019 L20Biol261Genomics2014F
29/39
Bioinformatics: (broadly)computational challenges
in biologyor (narrowly) the information content of
the genome. A first objective being theidentification of binding sites or thefunctional
elements
gene annotation.
28
8/9/2019 L20Biol261Genomics2014F
30/39
Bioinformatics(the information content) collates multiple sources of
information, including comparative genomic (BLAST search), cDNA
sequence and ORF to annotate a candidate gene sequence.
Sequenceinformation
29
8/9/2019 L20Biol261Genomics2014F
31/39
A codingtranscriptome(ics) represents that small percentage of thegenetic code that is (apparently) transcribed into RNA moleculesestimated to be less than 5% of thegenome in humans Adams J. (2008)Nature Education 1:1
It now appears that the majority of the human genome is transcribed(introns and intervening sequence), and the vast majority of sequences are non-protein coding (Frith et al.,2005). The proportion of transcribed sequencesthat are non-protein-coding appears to be greater in mammals compared tonematodes or drosophila.
Frith et al., 2005. E.J.H.G. 13:894-897
8/9/2019 L20Biol261Genomics2014F
32/39
The ENCODE PROJECT: identify and map all the transcribed regions of the human
genome including regulatory regions, replication origins, DNA methylation and histone
methylation sites etc..
The pilot project found among other interesting findings:
On average- per coding region:
- There are 5.4 different transcriptsper coding region, over half showed transcription
from both strands-exact reverse direction complements.
- 63% of the mouse genome is transcribed, 1-2% have recognizable exons- 41% span introns, 22% span intergenic regions.
-Majority of the human genome is transcribed
-Genome- islands of protein coding sequence - interwoven and overlapping
-transcription units spanning the genome
Gene definition ?Micro rNA - regulatory gene ?
31
8/9/2019 L20Biol261Genomics2014F
33/39
Proteomicsthe study of the proteome, the sequence andexpression of all proteins
25-35,000
genes,
100,000-30
0,000
differentproteins ?
(alternativesplicing, RNAediting, oralternative
transcriptioninitiation andterminationsites).Unknown fraction
32
8/9/2019 L20Biol261Genomics2014F
34/39
Genomic sequencing has made possible a new approach to genetics called
functional genomics, which focuses on genome-wide patterns of geneexpression and the mechanisms by which gene expression is coordinated
DNA microarray (or chip) - a flat surface about the size of a postage stampwith up to 100,000 distinct spots, each containing a different immobilizedoligotide DNA sequence, of all the known genes in a genomeor all the knowncDNA from a genomesuitable for hybridization with DNA or RNA isolated from cellsgrowing under different conditions
Functional Genomics- study of expression patterns
33
8/9/2019 L20Biol261Genomics2014F
35/39
unknown
known
Relate - which genes are
active in the tissue
34
Every yeastgene was
cloned and
sampled are
spotted onto
glass slides
8/9/2019 L20Biol261Genomics2014F
36/39
35
Binding (color) intensity indicates RNA concentration- gene activity
8/9/2019 L20Biol261Genomics2014F
37/39
DNA chips use synthetic DNA, oligonucleotides, that
can be spotted at a density of 106/cm2 so fragments
from all human or other eukaryotic organisms can beannealed to the chip.
36
8/9/2019 L20Biol261Genomics2014F
38/39
29Transcriptional
regulation of ~
2500 genes
showingsignificant
changes during
the first 2.75
hours of C.
elegansdevelopment
Time minutesbefore (-2) to minutes
after (165) the 4 cell stage in
development (gastrulation)
Transcript
abundance relative
to a non-dividing
cell 37
8/9/2019 L20Biol261Genomics2014F
39/39
A glossaryof types of DNA sequence:
1. Full length cDNA- complement of the mRNA
2. Full length (eukaryotic) gene clone(exons, introns, flanking regions).
3. Restriction fragments, Restriction Fragment Length Polymorphisms4. SNPs - Single Nuclear Polymorphism(s)- nucleotide polymorphism
5. PCR clone- partial gene sequence
6. Large genomic clones:BACs or YACs may have the sequence of
many adjacent genes
7. Satellite DNA- mid and highly repetitive DNA including VNTRs,mini and microsatellite DNA
8.STS(sequence tagged sites)short unique sequences used to hybridize
to chromosomes
9. Expressed sequence tags ESTs -partial sequence of cDNAs used as
probes to ID chromosome locations, correlate RFLP and cytologicalmaps.
10. SSLP (short (simple) sequence length polymorphisms)-short
repetitive sequences use to anchor a map
34