Upload
prof-wim-van-criekinge
View
2.072
Download
0
Embed Size (px)
Citation preview
FBW22-09-2015
Wim Van Criekinge
Lab for Bioinformatics and computational genomics
Lab for Bioinformatics and computational genomics
10 “genome hackers” mostly engineers (statistics)
42 scientiststechnicians, geneticists, clinicians
>100 people Hardware/software engineers,
mathematicians, molecular biologists
What is Bioinformatics ?
• Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers)– Sequence analysis?– Molecular modeling (HTX) ?– Phylogeny/evolution?– Ecology and population studies?– Medical informatics?– Image Analysis ?– Statistics ? AI ?– Sterkstroom of zwakstroom ?
• Medicine (Pharma)– Genome analysis allows the targeting of genetic
diseases– The effect of a disease or of a therapeutic on RNA and
protein levels can be elucidated– Knowledge of protein structure facilitates drug design– Understanding of genomic variation allows the tailoring
of medical treatment to the individual’s genetic make-up
• The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health)
Promises of genomics and bioinformatics
Bioinformatics: What’s in a name ?
• Begin 1990’s• “Bio-informatics”:
Computing PowerGenbank(Log)
Time (years)
Bioinformatics: What’s in a name ?
• Begin 1990’s• “Bio-informatics”:
– convergence of explosive growth in biotechnology, paralled by the explosive growth in information technology
• Not new: > 30 years that people use “computers” in biology
• In silico biology, database biology, ...
Time (years)
Happy Birthday …
PCR + dye termination
Suddenly, a flash of insight caused him to pull the car off the road and stop. He awakened his friend dozing in the passenger seat and excitedly explained to her that he had hit upon a solution - not to his original problem, but to one of even greater significance. Kary Mullis had just conceived of a simple method for producing virtually unlimited copies of a specific DNA sequence in a test tube - the polymerase chain reaction (PCR)
Math
Informatics
Bioinformatics, a scientific discipline …
Theoretical Biology
Computational Biology
(Molecular)Biology
Computer Science
Bioinformatics
Math Algorithm Development
Informatics
Interface Design
Bioinformatics, a scientific discipline …
AI, Image Analysisstructure prediction (HTX)
Theoretical Biology
Sequence Analysis
Computational Biology
(Molecular)Biology
Expert Annotation
Computer ScienceNPDatamining
Bioinformatics
Math Algorithm Development
Informatics
Interface Design
Bioinformatics, a scientific discipline …
AI, Image Analysisstructure prediction (HTX)
Theoretical Biology
Sequence Analysis
Computational Biology
(Molecular)Biology
Expert Annotation
Computer ScienceNPDatamining
BioinformaticsDiscovery Informatics – Computational Genomics
Doel van de cursus
• Meer dan een inleiding tot ... het is de bedoeling van de cursus een onderliggend inzicht te verschaffen achter de verschillende technieken.
• Naast het gebruik van recepten, wat terug te vinden is in delen van de syllabus laat een inzicht in – de werking van databanken – en de achterliggende algoritmen
• toe – om wisselende interfaces op nieuwe
problemen toe te passen.
Inhoud Lessen: Bioinformatica
Examen
• Theorie – Vier inzichtsvragen over de cursus (inclusief !!)
• Practicum (“open-book”)– Viertal oefeningen die meestal het schrijven
van een programma veronderstellen
• Puntenverdeling 50/50
Cursus
• Syllabus 25 Euro– Syllabus
• V|Podcasts• Weblems – Screencasts
21
biobixwvcrieki
biobix.bebioinformatics.be
• Timelin: Magaret Dayhoff …
naturetheHumangenome
Setting the stage …
Genome Size
DOGS: Database Of Genome Sizes
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106 C.elegans = 100 x 106 Drosophila = 180 x 106 Human/Rat/Mouse = 3000 x 106 Lily = 300 000 x 106
With ... : 99.9 %To primates: 99%
Biological Research
Adapted from John McPherson, OICR
And this is just the beginning ….
Next Generation Sequencing is here
Basics of the “old” technology
• Clone the DNA.• Generate a ladder of labeled (colored) molecules
that are different by 1 nucleotide.• Separate mixture on some matrix.• Detect fluorochrome by laser.• Interpret peaks as string of DNA.• Strings are 500 to 1,000 letters long• 1 machine generates 57,000 nucleotides/run• Assemble all strings into a genome.
Basics of the “new” technology
• Get DNA.• Attach it to something.• Extend and amplify signal with some color
scheme.• Detect fluorochrome by microscopy.• Interpret series of spots as short strings of DNA.• Strings are 30-300 letters long• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.
Next Generation Technologies
• 454–Emulsion PCR–Polymerase–Natural Nucleotides
• 20-100Mb for 5-15k –1% error rate–Homopolymers
One additional insight ...
Read Length is Not As Important For Resequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
% o
f Pai
red
K-m
ers
with
Uni
quel
y As
sign
able
Loc
atio
n
E.COLI
HUMAN
Jay Shendure
Two Short Read Techologies
• Illumina GA
• ABI SOLID
Technology Overview: Solexa/Illumina Sequencing
ABI Solid
Dressman 2003
ABI SOLID
ABI SOLID
Paired End Reads are Important!
Repetitive DNAUnique DNA
Single read maps to multiple positions
Paired read maps uniquely
Read 1 Read 2
Known Distance
Next next generation sequencingThird generation sequencing
Now sequencing
Complete genomics
Complete genomics
Pacific Biosciences: A Third Generation Sequencing Technology
Eid et al 2008
Pacific Biosciences: A Third Generation Sequencing Technology
Nanopore Sequencing
The genome fits as an e-mail attachment
107 106 105 104 103 102 101 1108109
Full genome bp
GENETIC
Whole-genomesequencing
Enrichment seq(Exome) PCREnrichment
Targeted Panels
Instrument and Assay providers
CLIA Lab service providers
NXT GNT DXS• GNT
– Dedicated Team & Network– Operational: Location– Professionalized
• DXS– Content engine– Product 1 established– Pipeline for n+1
• NXT– Workflow management– Bioinformatics– Epigenetics
NCBI (educational resources)
Weblems
• What ?– Web-based problemes (over de huidige les
en/of voorbereiding op volgende les)• When ?
– Einde van elke les• How ?
– Oplossingen online via screencasts– Practicum– Voorbedereiding op het practicum examen ...
Niet alle problemen vereisen noodzakelijk programmacode ...
Weblems
W1.1: To which phyla do the following species belong (a) starfish (b) ginko tree (c) scorpion
W1.2: What are the common names for the following species (a) Orycterophus afer (b) Beta vulagaris (c) macrocystis pyrifera
W1.3: What species has the smallest known genome ? And is genome size related to number of genes ?
W1.4: What are the 5 latest genomes published ? How complete is “coverage” ?
W1.5: For approximately 10% of europeans, the painkiller codeine is ineffective because the patients lack the enzyme that converts codeine into the active molecule, morphine. What is the most common mutation that causes this condition ?