Upload
natalia-ronatowicz
View
229
Download
0
Embed Size (px)
Citation preview
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 1/57
Genes, Genomes & Microarrays
5BBB0231: Gene Cloning & expression
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 2/57
Molecular approaches to neurobiology
Principle of forward/reverse genetics
The Central Dogma
genomes, genes, mRNA, proteins
Finding interesting developmental genes
microarray approaches
other methodologies
Technologies to determine gene function
Cloning genes
RT-PCR
in silico mining
In s itu hybridisation
Functional Interference
Electroporation
Transgenic animalsmRNA knockdown
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 3/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 4/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 5/57
Watson & Crick model of DNA
structure
1955
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 6/57
DNA structure
Doublestranded
Stabilised by
base pairs
(A-T; C-G)through
hydrogen
bonds
Antiparallel
5’
5’
3’
3’
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 7/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 8/57
A = T, two hydrogen bonds
G ≡ C, three hydrogen bonds
Watson-Crick base pairing
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 9/57
DNA structure Right handed helix
10 nucleotides per turn
1 turn = 3.4 nm
Helix diameter = 2 nm
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 10/57
Hydrogen bonds in base pairs are not
covalent, and can be disrupted by heat or
chemicals
Heat Cool
“DNA
Hybridisation”
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 11/57
Hydrogen bonds in base pairs are not
covalent, and can be disrupted by heat or
chemicals
AlkaliLow [Na+]
Cool
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 12/57
DNA packaged into chromosomes.
In humans 46 chromosomes (22
pairs and XX or XY).3.2 x 109 nucleotide pairs
1.8 metre DNA in every nucleus
(4 µM diameter).
If each nucleotide was 1 mmapart …
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 13/57
What is in the human genome?
Human = 1 Trillion Cells
Each Cell = 3.2 x109 bp @ 0.34nm/bp
= 1 metre DNA per cell
= 1 Trillion metres
= 1 billion Km
6.7 x to the Sun and back!!!!= approx 1 light hour of information
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 14/57
DNA is supercoiled. In eukaryotic cells,
extra coiling is required to allow
packaging into nucleus
Beads on a stringappearance of
nucleosomes
F t i t t
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 15/57
From genome to interactome
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 16/57
Primary Reasons for Sequencing the Human Genome
• Complete sequencing of all genes
• Determine intron/exon structure of all genes
• Reveal non-coding regulatory sequences
• Identify polymorphisms
• Develop methodology for sequencing other genomes
• Uncover the unexpected
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 17/57
Step 1: Make a genomic library from a DNA sample
Comparison of different DNA cloning vectors which could be used
-----------------------------------------------------------------------------------------
Vector Host Structure Insert size (kb)
-----------------------------------------------------------------------------------------
Plasmids E. coli Circular plasmid 1-10
Cosmids E. coli Circular plasmid 35–45
BAC E. coli Circular plasmid Up to 300
PAC E. coli Circular plasmid 100–300
YAC S. cerevisiae Linear chromosome 100–2000
-----------------------------------------------------------------------------------------
How to sequence big genomes
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 18/57
YACsBACs orPACs
… . .
G A T C C A T C T A A T A C A …
. .
sequence
Plasmidsubclones
Step 3: Sub-assembly and Sequence
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 19/57
A (BAC) View of the Human Genome Project
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 20/57
Genebuild: 37.1 (hg19)
Known genes: 22,286
Novel genes: 34Pseudogenes: 12,308RNA genes: 9,922Gene transcripts 142,707
Base Pairs: 3,272,480,987
Current State of the Human Genome
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 21/57
Centromeres are highlyrepetitive DNA sequencesthat are difficult tosequence using currenttechnology. Millions (possiblytens of millions) of basepairs long
Telomeres, are highly repetitive:(TTAGGG)n
Vary in length between species from afew hundred bp in ciliates to
thousands of bp in vertebrates
Sequencing is still incomplete !telomere
telomere
centromere
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 22/57
The National Center forBiotechnology Information
www.ncbi.nlm.nih.gov
Hold all genomes in a databaseknown as Genbank, along withsequences of known andhypothetical genes.
Database lets you examine:
DNA sequence
Mutations and diseases
associated with particular genes
Body tissues in which this gene isactivated
papers written about this gene.etc etc
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 23/57
Genome Browsersat
University of California, Santa Cruz (UCSC)http://genome.ucsc.edu
and ENSEMBLhttp://www.ensembl.org
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 24/57
What other organisms have been sequenced and why?
Study evolution and diversity among species
Disease organisms (bacteria, viruses, parasites)
Vector of disease (eg. mosquito)
Commercial (fish and crops)
Laboratory model organisms for developmental biology,disease and genetics
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 25/57
What are the comparative genome sizes of humans and other organisms being studied?
organism estimatedsize
estimatedgene number
average genedensity
chromosomenumber
Homo sapiens (human) 3.2 giga bases ~22,000 1 gene per
100,000 bases
46
Rattus norvegicus (rat) 2.75 giga bases ~22,000 1 gene per100,000 bases
42
Mus musculus (mouse) 2.5 giga bases ~22,000 1 gene per100,000 bases
40
Tetraodon nigroviridis (pufferfish) 380 mega bases ~20,000 1 gene per 14,000
bases
21
Drosophila melanogaster (fruit fly) 1.8 mega bases 13,600 1 gene per 9,000bases
8
Arabidopsis thaliana (plant) 1.25 mega bases 25,500 1 gene per 4000bases
10
Caenorhabditis elegans (roundworm) 97 mega bases 19,100 1 gene per 5000bases
12
Saccharomyces cerevisiae (yeast) 12 mega bases 6300 1 gene per 2000bases
32
Escherichia coli (bacterium) 4.7 mega bases 3200 1 gene per 1400bases
1
H. Influenzae (bacterium) 1.8 mega bases 1700 1 gene per 1000bases
1
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 26/57
A Revision of Basic Gene Structure
transcription start site
promoter
Transcription factors and RNA polymerase II assemble at the promoter
What are the classic sequence signals that identify a promoter?
Down-stream or 3’
Up-stream or 5’
exons
introns
CpG
Other control sequences which interact with the promoter maybe found elsewhere………to follow
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 27/57
Transcript production and processing: before translation intoprotein, the message must be modified
Export from nucleus
………translation
Notes….stability, export and length
transcription start site
5’
3’ 5’
3’ 1 2 3 4 5
Primary RNA transcript
transcription
5’ 3’
Splicing and processing
AAAAACAP
Mature mRNA
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 28/57
5' UTRExon 1 142 bp
Intron 1 130 bp
Exon 2 223 bp
Intron 2 850 bp
Exon 3 262 bp3' UTR
Genomic organisation of the b-globin geneccagggc tgggca taaaa gtcag ggcag agcca tctatt gctt ....
ACATTTG CTTCTG ACACA ACTGT GTTCA CTAGC AACCTC AAACA GACAC C
ATGGTGCATC TGACTC CTGAG GAGAA GTCTG CCGTT ACTGCC CTGTG GGG
CAAGGTG AACGTG GATGA AGTTG GTGGT GAGGC CCTGGG CAGgttggta t
caaggtt acaaga caggt ttaag gagac caata gaaact gggca tgtgg a
gacagag aagact cttgg gtttc tgata ggcac tgactc tctct gccta t
tggtcta ttttcc caccc ttag GCTGCTG GTGGTC TACCC TTGGA CCCAG
AGGTTCT TTGAGT CCTTT GGGGA TCTGT CCACT CCTGAT GCTGT TATGG G
CAACCCT AAGGTG AAGGC TCATG GCAAG AAAGT GCTCGG TGCCT TTAGT G
ATGGCCT GGCTCA CCTGG ACAAC CTCAA GGGCA CCTTTG CCACA CTGAG T
GAGCTGC ACTGTG ACAAG CTGCA CGTGG ATCCT GAGAAC TTCAG Ggtgag
tctatgg gacgct tgatg ttttc tttcc ccttc ttttct atggt taagt t
catgtca taggaa gggga taagt aacag ggtac agttta gaatg ggaaa c
agacgaa tgattg catca gtgtg gaagt ctcag gatcgt tttag tttct t
ttatttg ctgttc ataac aattg ttttc ttttg tttaat tcttg ctttc tttttttt tcttct ccgca atttt tacta ttata cttaat gcctt aacat t
gtgtata acaaaa ggaaa tatct ctgag ataca ttaagt aactt aaaaa a
aaacttt acacag tctgc ctagt acatt actat ttggaa tatat gtgtg c
ttatttg catatt cataa tctcc ctact ttatt ttcttt tattt ttaat t
gatacat aatcat tatac atatt tatgg gttaa agtgta atgtt ttaat a
tgtgtac acatat tgacc aaatc agggt aattt tgcatt tgtaa tttta a
aaaatgc tttctt ctttt aatat acttt tttgt ttatct tattt ctaat a
ctttccc taatct ctttc tttca gggca ataat gataca atgta tcatg c
ctctttg caccat tctaa agaat aacag tgata atttct gggtt aaggc a
atagcaa tatctc tgcat ataaa tattt ctgca tataaa ttgta actga t
gtaagag gtttca tattg ctaat agcag ctaca atccag ctacc attct g
cttttat tttatg gttgg gataa ggctg gatta ttctga gtcca agcta g
gcccttt tgctaa tcatg ttcat acctc ttatc ttcctc ccaca gCTCCT
GGGCAAC GTGCTG GTCTG TGTGC TGGCC CATCA CTTTGG CAAAG AATTC A
CCCCACC AGTGCA GGCTG CCTAT CAGAA AGTGG TGGCTG GTGTG GCTAA T
GCCCTGG CCCACA AGTAT CACTAAGCTCGCT TTCTTG CTGTC CAATT TCT
ATTAAAG GTTCCT TTGTT CCCTA AGTCC AACTA CTAAAC TGGGG GATAT T
ATGAAGG GCCTTG AGCAT CTGGA TTCTG CCTAA TAAAAA ACATT TATTT T
CATTGCA atgatgt atttaa attat ttctg aatat tttac taaa......
What is the structure of a gene?
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 29/57
What is the structure of a gene?
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 30/57
Features of a mature mRNA [messenger RNA = protein coding RNA]
7mG
5’ UTR 3’ UTR PROTEIN CODING SEQUENCE
AAAAAAAAAAAAAAAAAAA
poly A tail
5’ 3’
X
AUG STOP
ALL PROTEIN CODING mRNAs HAVE A POLY A TAIL AT THE 3’ END OF TRANSCRIPT
NOTE: 3’UTR REGIONS ARE NOT UNDER AS MUCH SELECTIVE PRESSURE AS
PROTEIN CODING SEQUENCE
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 31/57
28s
18s
+ve
-ve
Gel electrophoresis of total RNA from a tissue
RNA is
attracted to
+ve
electrode
andseparated
by size;
small RNA
move
quickly
LARGE
SMALL
rRNA
rRNA
mRNA
mRNA is the backgroud ‘smear’
i.e thousands of mRNA molecules of different size [= different size proteins]
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 32/57
Microarrays
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 33/57
What are microarrays?
Tools to measure thousands of simultaneous gene expression levels
Definition:
A gene is said to be expressed if its mRNA or protein are
present in the tissue under study
How do microarrays work?
They rely on the biological principle of complementary hybridisation
AATTATAGCGGAGCGAACGAAG
TTAATATCGCCTCGCTTGCTTC
If we know the mRNA sequence we can build a probe for it with the
complementary DNA sequence
How do microarrays work?
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 34/57
How do microarrays work?
A small probe can detect the presence of its complementary mRNA in a
large population of mRNAs
• the sequence of entire genomes gives us access to thousands [potentially all]
mRNA sequences
• probes can be synthesised for all of the known and predicted mRNAs
• probes are typically 25-70 nucleotides long [25-70mers]
• probes are immobilised on a solid surface [e.g. silicon]
• thousands of probes produced on a silicon base = GeneChip
• if the mRNA from a cell is labelled with a fluorescent dye we can detect which
mRNAS become bound to their complementary probes
• spots of fluorescence on a GeneChip therefore = the genes that are expressed
Mi l tf
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 35/57
Microarray platforms
• spotted cDNA array on nylon membranes
• commercially produced radioactive labelling single channel
• multiple synthesised short oligonucleotides [25mers] on silicon
• commercially produced: Affymetrix
• single channed fluorescent label• between 11 and 20 probe per target gene
• upto 40,000 gene targets/sinle chi
• spotted cDNA or long oligos [70mers] on glass slide
• home grown or commercial
• two channel: simultaneous co-hybridisation of two samples
• two colour fluorescence
Th t t f Aff t i i
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 36/57
The structure of Affymetrix microarrays
•Recap: Affymetric GeneChips [microarrays] contain short
oligonucleotide probes syntesised directly onto a silicontarget using the same technology that Intel uses to make
CPUs
• this allows for a more densely-packed chip but means the
oligos must be shorter
• shorter oligonucleotides may cross-hybridise to the wrong
sequence
• to compensate each given gene [e.g. FGF8] is represented
by 11 different probes scattered across the chip
• the expression of a gene is derived from the overall average
of the reading from the 11 different probes
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 37/57
GeneChip® probe arrays are manufactured through a unique and
robust process, a combination of photolithography and combinational
chemistry
Tens of thousands of probes are printed onto a silicon base
Actual size
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 38/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 39/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 40/57
Data from an experimentshowing the expression of
thousands of genes on a
single GeneChip® probe
array.
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 41/57
Close up of a GeneChip image
The actual probes have been combined with positive controls that bind
to probes that spell out the chip type and form a border of expression
around the chip
Close up of a GeneChip “feature”
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 42/57
Close up of a GeneChip feature
• 1 feature [18 m square] = the region where probes of a single type are printed
• Each feature contains 100,000s of identical probes
Feature of a gene
that is expressed
Feature of a gene
that is not expressed
Th l l f fl f f t i th
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 43/57
The level of fluorescence of a feature is the sum
of the hybridisation of across the entire region
Affymetrix GeneChips compensate for cross hybridisation by having a
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 44/57
Affymetrix GeneChips compensate for cross hybridisation by having a
Perfect Match and MisMatch Probe feature adjacent to each other
Perfect Match
Mis Match
AATTATAGCGGGGCGAACGAAG
AATTATAGCGGAGCGAACGAAGPM:
MM:
Each individual gene is represented by 11 different probes on an Affymetric microarray
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 45/57
PM
MM
1 2 3 4 5 6 7 8 9 10 11
Probe [25mer] set for 1 gene
Signal = [PM –
MM]-[outliers]/n
i.e. Signal is function of Efficacy of PM/MMOligonucleotide performance NOT absolute expression of gene
Each individual gene is represented by 11 different probes on an Affymetric microarray
The mRNA extracted from a target tissue must be labelled so that it
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 46/57
The mRNA extracted from a target tissue must be labelled so that it
can fluoresce when bound to its complementary probe sequence
1. Extract total RNA from cells or tissue
2. Convert the mRNA only to first strand cDNA
3. An oligo dT primer binds to the poly A tail of mRNA
4. Second strand cDNA is made from first strand
5. The process is done to all the mRNAs from the cell simultaneously
6. Using the double stranded cDNA new complementary DNA are made
7. The new cDNAs contain a biotin label
8. The biotin label can easily be detected using streptavidin and antibodies
9. Biotin-labelled cDNAs [representing the mRNAs] bind to their complementary probes
immobilised on GeneChip
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 47/57
Synthesis of biotin-labelled cRNA rhombomere extracts
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 48/57
Total RNA
Biotin-labelled cRNA
Fragmented cRNA
Bacterial mRNAs spiked
Bacterial cRNAs spiked
Hybridise to arrays
E M 1 2 3 4 5 M 1 2 3 4 5 M 1 2 3 4 5 M 1 2 3 4 5
SET1 SET2 SET3 SET4
M 1 2 3 4 5 M 1 2 3 4 5 M 1 2 3 4 5
To minimise the effect of ‘noise’ the probe sets for a
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 49/57
To minimise the effect of noise the probe sets for a
single gene are randomised across the GeneChip
•The intensity and location of the probe features is read by a scanner
•The readings are assimilated into expression values for every single genes
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 50/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 51/57
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 52/57
Hybridiseto microarray
Wash offnon-specific
Scan forexpressed genes
Data analysis Confirm differentiallyexpressed genes
Any labelled mRNA that is not tightly-bound to aprobe must be washed off
Ti A Ti B
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 53/57
Tissue A Tissue B
Extract RNA and make labelled probes Extract RNA and make labelled probes
Hybridise to microarray Hybridise to microarray
Genes specific to A and B fluoresce at different positions
Compare the patterns of hybridisation
The output of the scanner is the gene name and a
l ti i l
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 54/57
relative expression value
1
39000
Here: 6 different tissues under study
Microarray data [I.e. the list of genes predicted to be expressed
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 55/57
in a tissue must be confirmed by another technique
• NORTHETN BLOT
• QUANTITATIVE RT-PCR
• IN SITU HYBRIDISATION
• Rnase PROTECTION ASSAY
GeneChips [microarrays] exist for a wide range of species
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 56/57
Yeast
Drosophila
C. elegans Bacteria
8/17/2019 Genes, Genomes & Microarrays
http://slidepdf.com/reader/full/genes-genomes-microarrays 57/57
Complex statistis are required to analyseand compare a multi GeneChip study
Real differences in gene expression between 2
cells or tissues must be distinguished from
natural or technical variation
To achieve this each cell or tissue in a particular conditionmust be represented more than once
i.e BIOLOGICAL REPLICATESPreferably each condition in MINIMUM OF TRIPLICATE
For example: knockout mouse versus wild typeKO Mouse - use 3 individuals and put on 3 different chips+/+ mouse – use 3 individual and put on 3 different chips
Compare the data between the 2 data sets