View
214
Download
1
Category
Tags:
Preview:
Citation preview
New issues in storage and analysis
Christophe RoosChristophe Roos - - MediCel ltdMediCel ltdchristophe.roos@medicel.fi
Annotating genomes with functional information: automatic but without
errors?
High throughput data acquisition
Spring 2002Christophe Roos - 6/6 Functional genomics
Genome annotation
• Annotations is the sum of all non-sequence information that can be connected to any sequence
Gene
Phylogenetic inference Connectors to other mapsMetabolic profiles
Cofactors and metabolitesSequence homologs in other genomes Metabolic map locator
Sequence
Genome location
Expression info
Functionalchemistry
Structure
Raw images Numericalvalues
Cluster genes SSassignments
Structureannotation
Electrondensity
Rawdata
Experimentaldata
Spring 2002Christophe Roos - 6/6 Functional genomics
Genome annotation
• Primary sources of information about what genes do are laboratory experiments. It may take several experiments for one data point.
• All that data should ideallically be associated – hyperlinked among DBs.– Magpie is an environment for genome annotation
• Compare genomes to learn how their structure affects function– Bacteria have modules of genes functioning together organised in ‘operons’
– Higher organisms need to pack the DNA to fit it in the nucleus. Activating a gene means unpacking and is not efficient if it is done for each gene separately
Spring 2002Christophe Roos - 6/6 Functional genomics
Functional genomics
• High throughput technologies give us long lists of the parts of systems (chromosomes, genomes, cells, etc). We can now analyse how they work together to produce the complexity of the organisms.
• The function of the genome is– Metabolism: metabolic pathways convert chemical energy derived from food
into useful work in the cell.
– Regulation: regulatory pathways are biochemical mechanisms that control what genomic DNA does. It switches genes on and off in a controlled way.
– Signalling: signalling pathways control the movement of information (chemicals) from one component to another on many levels
– Construction
• Functional genomics tries to map these pathways
Spring 2002Christophe Roos - 6/6 Functional genomics
Analysing the activity of the genome• Genomics: look at transcriptional activity of genes
– Transcription: When a gene is transcriptionally active, it means that messenger RNA (mRNA) is synthesised. The amount of mRNA from each active gene varies over time.
– Turnover: Different mRNA species have different half-lives.– Translation: When a mRNA is produced, it does not imply that the
corresponding protein is translated. Transcripts can also be produced for storage and later use.
– Technically feasible: it is possible to isolate all mRNAs from cells and to quantitate it within certain limits.
• Proteomics: look at proteins instead of transcripts– Limited: Presently acceptable efficiency comes at the expenses of
incufficient quality– Closer to ’reality’ since the proteins are the players
Spring 2002Christophe Roos - 6/6 Functional genomics
EST: Expressed sequence tags
• ESTs are partial sequences of cDNA clones. cDNA clones are DNA synthesised in vitro using mRNA as template.
– Why? cDNA is more stable than mRNA
– How? cDNA can be made ‘en masse’ starting from total cellular mRNA isolates. cDNA libraries are specific for tissue, developmental time, stimulation etc.
– Therefore, looking at cDNA is looking at mRNA is looking at active genes.
– To look at cDNA means sequencing (part of) it.
• Clones are picked at random (10’000-200’000)
• Sequenced from one or both ends once (no proofreading)
• Sequences entered into EST sequence databases
Spring 2002Christophe Roos - 6/6 Functional genomics
EST: Expressed sequence tags
• constucting a clone by inserting a piece of DNA into a ’vector’.
• the vector and its insert will behave as an independent unit (’plasmid’) in the bacterial host and carries some additional genes to allow for selection (only those bacterial with the vector will survive on antibiotics)
• Amplify and sequence
• Iterate (in parallell)
Spring 2002Christophe Roos - 6/6 Functional genomics
DNA hybridisation
• DNA is a double-helix and can be separated by denaturing treatment into two strands. Each strand becomes ’sticky’ and attempts to renature with homologous single-strand sequences to form hybrids.
• Single-strand DNA from all known genes of a given species can be attached to a matrix, then probed with labelled cDNA molecules from a given sample. Only complementary probes will hybridise and can be detected if they have been previously labelled (radioactivity, fluorescent stain, ...)
• The technique can be multiplexed:– High density arrays carrying sticky probes from a full genome
– Parallel hybridisation with cDNA from various sources
Spring 2002Christophe Roos - 6/6 Functional genomics
The process of using microarraysBuilding the Chip:
MASSIVE PCR PCR PURIFICATION and PREPARATION
PREPARING SLIDES PRINTING
Preparing RNA:
CELL CULTURE AND HARVEST
RNA ISOLATION
cDNA PRODUCTION
Hybridising the Chip:
POST PROCESSING
ARRAY HYBRIDIZATION
PROBE LABELING
DATA ANALYSIS
Spring 2002Christophe Roos - 6/6 Functional genomics
The output: the image raw data
laser 1laser 2
emission
scanning
analysis
overlay images and normalise
cDNA is prepared from two samples (in this example) and labelled, each sample with a distinct color. Then the array is hybridised with the doubble probe and the signal is recorded as images
Spring 2002Christophe Roos - 6/6 Functional genomics
Problems in image analysis
• Noise
• Spot detection and intensity
• Alignment if overlay
Spring 2002Christophe Roos - 6/6 Functional genomics
A set of experiments on yeast...
• Each row represents one gene
• Each column represents one experiment– The columns have been
organised into related sets of experiments (ALPH, ELU,...)
• The colors indicate gene activity (from high to absent)
Spring 2002Christophe Roos - 6/6 Functional genomics
Clustering the resulting data
• Looking at 10’000 genes is not easy
• Group genes into clusters of genes that behave the same way over a set of several experiments– Hierarchical clustering
– K-means clustering
– Self-organising maps (SOM)
– Etc.
Spring 2002Christophe Roos - 6/6 Functional genomics
The overall process with microarrays
• Microarray data has to be used in a larger frame of experimentation
Spring 2002Christophe Roos - 6/6 Functional genomics
Making a model of the data
Sequence Structure Function
Interaction Network Function
Genome Transcriptome Proteome
1. Elements2. Binary relations3. Networks
Pathway
Assembly Neighbour Cluster
Hierarchical TreeGenome
Spring 2002Christophe Roos - 6/6 Functional genomics
Comparing networks
• Gain new biological information by comparison of networks
• What is the metrics?
• How is it done? Is it simply a problem of graph isomorphism
Pathway vs. Pathway
Pathway vs. Genome
Genome vs. Genome
Cluster vs. Pathway
Spring 2002Christophe Roos - 6/6 Functional genomics
Biological graph comparison
• Search heuristically for clusters of correspondence
A - aB - bC - cD - d. . .. . .
Clusteringalgorithm
A BC
D
E G
H
K
F
I
J
A BC
D
E G
H
K
F
I
J
a bc
d
e g
h
k
f
i
j
a bc
d
e g
h
k
f
i
j
Graph 1 Correspondences Graph 2
Spring 2002Christophe Roos - 6/6 Functional genomics
Example: genomic, metabolic, structural
Genome-pathway comparison, which reveals the correlation of physical coupling of genes in the genome - operon structure (a) and functional coupling (b) of gene products in the pathway
E. coli genome
hisL hisG hisD hisC hisB hisH hisA hisF hisI
yefM yzzB
Spring 2002Christophe Roos - 6/6 Functional genomics
Example: genomic, metabolic, structural
Pentose phosphate cycle
Purine metabolism
HISTIDINE METABOLISM
2.4.2.17
3.6.1.31
3.5.4.19
5.3.1.16
2.4.2.-4.2.1.1
92.6.1.9
3.1.3.15
3.5.1.-
2.6.1.-Phosphoribulosyl-Formimino-AICAR-P
Phosphoribosyl-Formimino-AICAR-P
Phosphoribosyl-AMP
Phosphoriboxyl-ATPPRPP
5P-D-1-ribulosyl-formimine
Imidazole-Glicerol-3P
Imidazole-acetole P
L-Histidinol-P
1.1.1.23
2.1.1.-
6.3.2.11
2.1.1.22
6.3.2.11
3.4.13.5
3.4.13.20
3.4.13.3
4.1.1.22
4.1.1.28
1.4.3.61.2.1.31.1413
53.5.2.-3.5.3.5
N-Formyl-L-aspartate
Imidazoloneacetate
Imidazole-4-acetate
Imidazoleacetaldehyde Histamine
Carnosine
Aneserine
1.1.1.23
6.1.1
1-Methyl-L-histidine
L-Hisyidinal
L-Hisyidinal
5P Ribosyl-5-amino 4-Imidazole carboxamide(AICAR)
L-Histidine
Hercyn
Spring 2002Christophe Roos - 6/6 Functional genomics
Example: genomic, metabolic, structural
……..NE, TYROSINE AND TRYPTOPHAN BIOSYNTHESIS Tyrosine metabolism
Alkaloid biosynthesis I
2.6.1.9 2.6.1.57
2.6.1.1 2.6.1.5
6.1.1.1
1.4.3.2
2.6.1.9 2.6.1.57
2.6.1.1 2.6.1.5
4.1.1.48 4.2.1.20
4.2.1.20
4.2.1.20
Tryptophanmetabolism
5.3.1.242.4.2.184.1.3.272.5.1.19
2.7.1.71
1.1.99251.1.1.25
4.2.1.10
4.2.1.11
1.1.99251.1.1.24
4.2.1.91
4.2.1.51
2.6.1.57
2.6.1.572.6.1.92.6.1.5
1.4.1.20 2.6.1.1
1.4.3.2
6.1.1.20
4.2.1.91
4.2.1.51
1.14.16.1
1.3.1.43
Tyr-tRNA
4-Hydroxy-phenylpyruvate
Prephenate
Tyrosine
Pretyrosine
RNA Phenylalanine
5.4.99.5
4.6.1.4
Anthranilate
Histidine
N-(5-Phospho--v-ribosyl)-anthranilate
1-(2- Carboxy-Phenylamino)-1-deoxy-D-ribulose5-phosphate
(3-Indolyl)-Glycerolphosphate
Indole
L-Tryptophan
4.1.3.-
Folatebiosynthesis
Ubiquinone biosynthesis
Chorismate
4-Aminobenzoate
4.6.1.3
3-deoxy-D-arabino-heptonate
3-Dehydro-quinate
4.2.1.10
3-Dehydro-shikimate
Protocatechuate
Shikimate
Phenylpyruvate
SCOP hierarchical tree
1. All alpha2. All beta3. Alpha and beta (a/b) 3.1 beta/alpha (TIM)-barrel 3.2 Cellulases . . . . . . . 3.74 Thiolase 3.75 Cytidine deaminase4. Alpha and beta (a+b)5. Multi-domain (alpha and beta)6. Membrane and cell surface pro7. Small proteins8. Peptides9. Designed proteins10. Non-protein
Spring 2002Christophe Roos - 6/6 Functional genomics
More challenges?
The list of genes being activated or inactivated or that are unaffected when comparing two samples becomes more informative if the genes can be mapped onto maps from which functions can be deduced.
Spring 2002Christophe Roos - 6/6 Functional genomics
More challenges?
Recommended