Upload
dodat
View
213
Download
1
Embed Size (px)
Citation preview
GenomicaSequenziamento del genoma
Dott.ssa Inga Prokopenko
The Human Genome ProjectThe Human Genome Project
Before HGP� 1975 – method for DNA sequencing introduced
Frederick Sangerhttp://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif
The Human Genome Project
� 1996 – large-scale human genome sequencing attempt
� 1998 – “Celera Genomics”
o new approach
o public project competitor
o HG in 3 years for $300 mln.
o led by C. Venter
http://kidblog.files.wordpress.com/2007/06/craig-venter-scientist-and-businessman.jpg
4
Genome Sequencing Project
…GTGACGTCGTCGTCG….sequencing Project
Sequencing: To find out the sequence of nucleotides in a genomic sequence
DNA
Engineering Society meets BCCB
The Human Genome ProjectEngineering Society meets BCCB
The Human Genome Project
HGP� 2003– Completion of HGP
Session 04 ~ 18/11/07Session 04 ~ 18/11/07
http://www.sanger.ac.uk/Info/Press/gfx/030414_hgp_300.jpg
6
Complete Genome Sequencing
1. Copy the DNA sequence,
2. randomly cut it into fragments up to 600 bp,
3. insert them into cloning vectors (BACs),
4. sequence the fragments, and
5. re-assemble the fragments.
� Genome Assembly is like the jigsaw puzzle
7
The Human Genome
No Bioinformatics
No Human Genome
�developed GigAssembler to assemble the public human genome fragments
� awarded the 2003 Benjamin Franklin
James Kent
�developed whole genome shotgun sequencing
� awarded the 2004 Max-Planck Prize
Gene Myers
Sequencing Methods
DNA Sequencing Methods� Maxam-Gilbert method
� Sanger method
� Whole genome sequencing strategieso dye-terminator sequencing
o automated high-throughput analyzer
o shotgun sequencing and chromosome walking
� New DNA sequencing methods
Maxam-Gilbert Method
http://homepages.strath.ac.uk/~dfs99109/BB211/MGSeq.html
� Technically complex
� Requires radioactive labeling
� Extensive use of hazardous chemicals
� Difficult to scale up, because doesn't use primed DNA synthesis, thus limited to sequences adjacent to restriction sites
Sanger Method
http://www.scq.ubc.ca/wp-content/uploads/2006/08/sequencing2.gif
�Uses DNA polymerase to synthesize a new strand of DNA
�Requires DNA primer
� Amplification of ssDNA in the presence of labeled ddNTPs
� Separation according to fragment size by gel electrophoresis
Sanger Method
• Amplicon is a target sequence
• Primer extension is done by DNA polymerase
• Repetition doubles the amount of amplicon at each step
• Randomly ddNTPs are added =>no other extension occurs
Sanger Method
Example of early sequencing of BacteriophageFrom F. Sanger, 1977
Sanger Method Extension
• In 1986 L.Hood and L.Smith described a method using base-specific fluorescent sequence tags (fluorescent dyes)
• Automation of DNA sequencing• Applied Biosystems added to this step the
capillary process for fragment separation (replaces flat sheet gels)
• 1998 - ~1Mb could be sequenced in 1 day
Dye-terminator Sequencing
1. DNA Preparation
2. Sequencing Reaction
3. Termination
4. Capillary Electrophoresis
5. Computer Analysis
DNA Preparation
Break open cells to access DNA Purify DNA from
cell debris
http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html
Sequencing Reaction
Strand Separation
Primer Annealing
Elongation
Termination
primer
DNA Polymerase
http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html
Standard Nucleotides
ddNTP incorporation leads to chain growth termination
Dye-labeled dideoxynucleotides
Termination
http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html
Laser Photo cell
Capillary Tube
Capillary Electrophoresis
http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html
Computer Analysis
http://www.wiley.com/college/pratt/0471393878/student/animations/dna_sequencing/index.html
Screening for BRCA1 185delAG mutant for breast
cancer susceptibility
Mass spectrometry DNA sequencing•High throuput•More data per sample => use of mixtures of samples•High specificity and sensitivity•In the figure: DNA strands of a heterozygote
Applications of mass spectrometry to DNA sequencing:
• Measurement of allele frequencies in population, detection of alleles in individuals by identification of SNPs (samples pooling, to ~3% accuracy)
• Characterisation of individual genotypes (diagnostics and pharmacogenomics)
• Measurement of individual haplotypes (on single DNA molecule, not on two chromosomes)
• Non-invasive prenatal diagnostics on the small amount of fetal DNA that leaks into maternal blood (even having 95-99% of maternal DNA as background, it’s possible to detect SRY gene demonstrating that fetus is male)
Whole Genome Sequencing Strategies
� Top-down approach
� Shotgun Sequencing
� Chromosome walking
Top-down approach
� DNA library generation
� Gene mapping by
genetic markers
� Sequencing reaction
� Electrophoresis
� Analysis
http://www.scq.ubc.ca/?p=392
Shotgun Sequencing
• The "whole-genome shotgun" method, involves breaking the genome up into small pieces, sequencing the pieces, and reassembling the pieces into the full genome sequence.
� Generation DNA library
� Multiple sequencing events
� Sequence alignment o jigsaw puzzle analogy
o uncertainty!
Shotgun Sequencing
http://www.scq.ubc.ca/?p=392
Chromosome Walking
Fig. 8-24, Lodish et al (4th edintion)
� Pyrosequencing
� Massive Parallel Sequencing
� In vitro clonal amplification
� Sequencing by hybridization
� Sequencing by ligation
� Many more…
New DNA Sequencing Methods
Pyrosequencing
http://student.ccbcmd.edu/courses/bio141/lecguide/unit6/metabolism/energy/images/atp.gif
http://www.pyrosequencing.com/DynPage.aspx?id=7454
the movie ...
� Why do we need them?o To lower the cost of sequencing
o To decrease time for sequencing
o Parallelization of sequencing
o To increase efficiency and accuracy
New DNA Sequencing Methods
Nuova generazione dei metodi di sequenziamento del DNA
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Solexa Gene Analyzer
Processamento dei campioni DNA
Generazione dati
Analisi dei dati
Flow cell imaging by microsopy
Processamento dati: Solexa screening
Solexa:Sequensing by synthesis (SBS)
Colonia di ~1000 frammenti DNA single-stranded
Disegno sperimentale
Analisi dei dati
Jigsaw Puzzle del genoma
• 1.5 Gbp della sequanza letti con 36bp reads vuol dire jigsaw puzzle
• con 42 millioni di pezzi… alcuni di essi non saranno posiziona correttamenete
Sequenziamento e resequenziamento del DNA
• Genome Analyzer produce velocemente ed in modo economico una quantità dei dati di alta qualità
• Possibili modi di uso:– identificazione e riconferma degli SNPs– Identificazione dei riarragiamenti dei
cromosomi, inclusi Copy Number Variations (CNVs)
– Mappaggio dei break points– Identificazione dei polimorfismi rare
Proprietà avanzate del Genome Analyzer
• Accuratezza alta• Efficacia alta – gigabasi dei dati per run
con i soli 100ng del DNA ed il prezzo piùbasso per base di sequenziamento
• Semplice elaborazione – un operatore elabora 1 run in 4 ore
Sequence Statistics
How big is it?� The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
� The average gene consists of 3000 bases,
� The largest known human gene is coding for dystrophin.
� The total number of genes is estimated at 30,000
� Functions are unknown for over 50% of discovered genes.
How big is it?
� An analogy to the human
genome is that of a book that is:
� Over one billion
words long!
How big is it?� Bound into 5000 volumes of 300 pages each
How big is it really?� This sequence fits into a cell nucleus the size of a
pinpoint
What do you do with this information?
�Find what regions of the Genome are
actually used by the body.
� These are so called “coding regions”
� There are also “exons” and “introns”
� And repeats
How much do we use?� Less than 2% of the genome codes for proteins.
� Repeated sequences that do not code for proteins ("junk DNA")
o 50% of the human genome.
� Repetitive sequences o no direct functions,o chromosome structure and dynamics.
What is next?http://www.youtube.com/watch?v=XuUpnAz5y1g
http://www.youtube.com/watch?v=gkQJ26DAxfsEthical
http://www.youtube.com/watch?v=QorIzoDgIPY
http://www.youtube.com/watch?v=PXeCDnfh0GA
The Human Genome ProjectThe Human Genome Project
What can the HGP be useful for?
� epigenetics
� genetic regulation
� investigating genetic diseases
� personalized medicine
� genetic engineering
� genomes as personal ID -> forensics
� genomics, proteomics, interactomics,
other-omics ...
The Human Genome ProjectThe Human Genome Project
Other genomes – comparative genomics
� Many more genomes were sequenced:
o Human
o Mouse (M. musculus)
o Chimp (P.troglodytes)
o human pathogens
o many more ...
Completed On going...
Prokaryotic 554 1380
Eukaryotic 76 878
Archea 49 57
The Human Genome ProjectThe Human Genome Project
Epigenetics:Chromatin structure and modifications
� The structure of chromatin (closed –
heterochromatin or open – euchromatin)
can influence the expression of genes
� The histone modification code
� Regulation of gene expression
http://www.youtube.com/watch?v=lUESmHDrN40
The Human Genome ProjectThe Human Genome Project
Epigenetics: Histone code
� Activating the
chromatin or switching it
of by a combination of
histone tail modification
The Human Genome ProjectThe Human Genome Project
Epigenetics: DNA methylation
� CpG methylation of human DNA
o 5mC as an additional letter of the
genetic code
o control of gene expression and
chromatin structure
o part of the epigenetic regulation
5-methylcytosine
The Human Genome ProjectThe Human Genome Project
OMIM database � Online Mendelian Inheritance in Man
o Catalogues the genetic diseases of human
� A step towards the personalized
medicine
� So far there are ~18250 known
genetic disorders
The Human Genome ProjectThe Human Genome Project
DNA forensics
� DNA can be used as a personal ID
o Used in forensics
o Paternity test
o Detection of potential
biohazards
o Identification of crime victims
The Human Genome ProjectThe Human Genome Project
Besides genomics – other “omics”
� There are other interesting projects similar to HGP
o Other “omics”
� proteomics
� interactomics
� transcriptomics
� more ...http://omics.org/
The Human Genome ProjectThe Human Genome Project
Genetic engineering
� One day we will be able to get rid of the
genetic diseases simply by exchanging the
bad part of the DNA with a good one
o Project in UK
o Ethical issues
o Tissue engineering
The Human Genome ProjectThe Human Genome Project
Session 04 ~ 18/11/07Session 04 ~ 18/11/07
Costs and Ethics
http://campus.queens.edu/faculty/jannr/Genetics/images/dnatech/machines4sequencing.jpg
� HGP – $2.7 billion
o US – DOE and NIH
o UK, Canada, Japan, Sweden, Germany and others
� Sequencing – $500 million and 5 years
� Nowadays – $1 million and 100 days
� ~3% budget for ethical, social issues
The Human Genome ProjectThe Human Genome Project
How much?
The Human Genome ProjectThe Human Genome Project
Ethical issues� Amount of money for the project
� Discrimination
o Insurance companies
o Employers
� Newborn genetic screening
� Genetic engineering (playing God)
The Human Genome ProjectThe Human Genome Project
Huge price� Why HGP?
� Why waste money on non-coding DNA?
� Big science vs. small science
vs.
The Human Genome ProjectThe Human Genome Project
Playing God
� Genetic engineering
� Therapeutic vs. enhancement
� Non-heriditable vs. heriditable
78
Databases
Databases store biological data in various forms
➨ SequencesDNA, RNA (Nucleic Acids)ProteinsStructural dataX-ray crystallography data
➨ Expression DataTranscription of all genes
➨ Interactions DataProtein-Protein interactionsReceptor Binding DataSubstrate Binding Data
➨ Metabolic pathways data➨ Literature database: PubMed
79
The Explosion of Data
With annotations, this adds up to about 300 terabyte
80
Disease Fighting
Escherichia Coli is bacteria lives in the human intestines
Genome ofK12
Genome ofO157:H7
Strain K12 not pathogenic
Strain O157:H7 pathogenic
81
Drug Discovery
Protein-protein docking Protein-ligand docking
Given two biological molecules determine whether they interact.
I.e., do the molecules fit together in any energetically favorable way?