Upload
sanam
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Functional Genomics I - Microarrays. Bioinformatics Dr. Víctor Treviño [email protected] A7-421. Transcriptomics Proteomics Metabolomics Genomics SNP (Single Nucleotide Polymorphisms ) CNV ( Copy Number Variation , CGH) Epigenomics. Functional Genomics Technologies. - PowerPoint PPT Presentation
Citation preview
BIOINFORMATICSDR. VÍCTOR TREVIÑ[email protected]
Functional Genomics I - Microarrays
FUNCTIONAL GENOMICS TECHNOLOGIES
Transcriptomics Proteomics Metabolomics Genomics
SNP (Single Nucleotide Polymorphisms) CNV (Copy Number Variation, CGH)
Epigenomics
MICROARRAYS Technology that provides measurments
of thousands of molecules in the same experiment and reasonable prices and precision
Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick)
Biological Question
ExperimentalDesign
MicroarrayExperiment
Pre-processing
Differential Expression Clustering Prediction
Biology: Verification and Interpretation
…
Image Analysis
Background
Normalization
Sumarization
Transformation
GENE EXPRESSION
Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed)
Gene Expression
MEASURING GENE EXPRESSION
100bp200bp
- + - + - +
RWPE-1 DU-145 PC-3
100
bp la
dder
mRNA, Gene X
http://www.bio168.com/mag/1B8B368B092A/20-3.jpg
107 c
opie
s
106 c
opie
s
105 c
opie
s
104 c
opie
s
103 c
opie
s
102 c
opie
s
10 c
opie
s
PCR
QPCR
MICROARRAY - HIBRIDISATION
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
http://www.well.ox.ac.uk/genomics/facilitites/Microarray/Welcome.shtml
DNA MICROARRAY TECHNOLOGY
www.niaid.nih.gov/dir/services/rtb/microarray/overview.asp
http://metherall.genetics.utah.edu/Protocols/Microarray-Spotting.html
http://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.html
http://www.nrc-cnrc.gc.ca/multimedia/picture/life/nrc-bri_micro-array_e.html
http://learn.genetics.utah.edu/units/biotech/microarray/genechip.jpg
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
MICROARRAYS – PROBE PRODUCTION
Affymetrix Images – 1 dyetwo-dyesMICROARRAY TECHNOLOGIES
MICROARRAY QUALITY
Affymetrix Spotted Arrays Inkjet arrays
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
mRNAExtraction
(and amplification)
Labelling
Hybridization
Scanning
StatisticalAnalysis
Image Analysis &Data Processing
PROCESS
Healty/Control Disease/TreatementREFERENCE TEST
Gene: A 1-1 B 1-0 C 3-3 D 0-3Gene: E 3-0 F 0-1 G 1-1 H 2-0Gene: I 2-2 J 0-0 K 3-0 L 2-1
Gene D 0.001Gene E 0.005Gene K 0.001
TWO-DYES
mRNA/cDNA
LabeledmRNA
DigitalImage
Microarray
Data
SelectedGenes
PRODUCTTEST
Gene: A 1 B 1 C 1 D 0Gene: E 4 F 1 G 1 H 2Gene: I 2 J 0 K 5 L 2
Sample
Gene D 0.001Gene E 0.005Gene K 0.001Gene J 0.003
ONE-DYE
MICROARRAY – LASER AND THE SCANNED IMAGE
Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico City Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
5m Laser 10m Laser
Pre-processing
Image Analysis
Background
Normalization
Sumarization
Transformation
Microarray - Pre-Processing Purpose
Output: Data File(unique "global relative" measure of expression for every gene with
minimal experimental error)
Input: Scanned Image File
MICROARRAY IMAGE ANALYSISTECHNOLOGIES
DNA Probes Oligos~2040nt
Target (cDNA, PCR products, etc.)
Copies per gene Usually 1Usually 3
OrganizationSectors (print-tip) n x m probsets
Probeset
mprobsets(~100)
ysectors(~=3)
x sectors (~=3) n probsets (~100)
Sectorsi x j spots (18x20)
Empty spotslanding lights
perfect match probes (pm)mismatch probes (mm)
Controls
MICROARRAY - IMAGE ANALYSISTECHNOLOGIES
10,000 genes* 2 dyes
* 3 copies/gene* ~40 pixels/gene
= 2,400,00 values
only 10,000 values
10,000 genes* 20 oligos
* 2 (pm,mm)* ~ 36 pixels/gene
= 14,400,00 values
only 10,000 values
RAW DATA
Image AnalysisPre-processing
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Addressing Done by GeneChip Affymetrix software
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Addressing (by grid, GenePix)
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Segmentation
Circular feature Irregular feature shape
Finally compute Average
Background Reduction
Extraction:
DeterminingBackground
2-Color
Results (GenePix).gpr file "results" for one array
10,000 genes~ 30,000 values
(.gal files 1 file for a "list" of array)
Affymetrix
Results.cel file "results" for one array
(raw - no background reduced)
10,000 genes~ 400,000 values
Image Analysis
IMAGE ANALYSIS
Segmentation(Spot detection)
BackgroundEstimation
ValueValue = Spot Intensity – Spot Background
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1100209
-7..
9882..
2298
Sample 198
42092..
9711..
28
[email protected] TRANSFORMATION – TWO DYES
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1100209
-7..
9882..
2298
Sample 198
42092..
9711..
28 G=Sample 1
R=
Sam
ple
1
G=Sample 1
R=
Sam
ple
1
Log2
Log2
[email protected] TRANSFORMATION – TWO DYES
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1100209
-7..
9882..
2298
Sample 198
42092..
9711..
28
(log2 scale)
RG
1 value?
22
2
GRLogA
GRLogM
A
M
MA-PlotG=Sample 1
R=
Sam
ple
1
8 10 12 14 16
-4-3
-2-1
01
(log2(G)+log2(R)) / 2
log2
(R)-l
og2(
G)
A
M
"With-in"(2 color technologies)
Normalization – 2 dyes
(assumption: Majority No change)
Normalization – 2 dyes
(assumption: Majority No change)
Before
After
"With-in"(2 color technologies)
Normalization – 2 dyes"With-in" Spatial
(2 color technologies)
Before NormalizationAftter loess
Global Normalization
Aftter loessby Sector (print-tip)
Normalization
[email protected] TRANSFORMATION – ONE DYE
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1100209
-7..
9882..
2298
Log2
7 8 9 10 11 12
0.0
0.5
1.0
1.5 density(x = log2(t[, 15] + 200), adjust = 0.475)
N = 3840 Bandwidth = 0.1051
Den
sity
9 10 11 12 13 14 15 16
0.0
0.2
0.4
0.6
0.8
1.0
log intensity
dens
ity
10 11 12 13 14 15
0.0
0.2
0.4
0.6
0.8
x
dens
ity
Before normalization After normalization
Between-slides
Normalization – 1 or 2 dyes
quantileMAD (median absolute deviation)
scaleqspline
invariantset
loess
Sumarization = "Average"(Intensities)
Summarization – AffymetrixOligonucleotide dependent technologies
Usual Methods:• tukey-biweight• av-diff• median-polish
PMMM
The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
MICROARRAYS – FILTERING / TREATING UNDEFINED VALUES Some spots may be defective in the printing
process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)
Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods
(warning)
MICROARRAY – DATA FILTERING More than 10,000 genes Too many data increases Computation Time and
analysis complexity Remove
Genes that do not change significantly Undefined Genes Low expression
Keeping Large signal to noise ratio Large statistical significance Large variability Large expression
Image Analysis`
Background Subtraction
Normalization
Summarization
Transformation
Data Processing
BackgroundDetection & Subtraction
a)
Filtering
Microarray
ImageScanning
SpotDetection
IntensityValue
Affymetrix
Two-dyes
b) Image Analysis and Background Subtraction
c)
Transformation
BetweenWithin
d)
A=log2(R*G)/2
M=
log2
(R/G
) Normalization
MICROARRAY PRE-PROCESSING SUMMARY
MICROARRAY REPOSITORIES
MICROARRAY APPLICATIONS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
MICROARRAY DATA MATRIX
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
….
….
MICROARRAYS – WHAT CAN BE DONE WITH DATA? Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic
Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection …
Differential Expression
Positive Negative
SamplesA
SamplesB
SamplesA
SamplesB
Gene Selection
µ=dµ=d
Exp
ress
ion
Leve
l
DIFFERENTIAL EXPRESSION
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
p-value FDR q-Value
Biomarker Detection
Positive Negative
SamplesClass A
SamplesClass B
SamplesClass A
SamplesClass B
µ=dµ=d
Gene Selection
Exp
ress
ion
Leve
l
Biomarker Discovery
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
A C G B H E D I K M LSamples
Co-ExpressedGenes
Unsupervised Sample ClassificationH
J2.b
HJ0
He0
He2
.b
Hh6
.tw
Hh4
.b
Hh2
.b
Hh4
.tw
Hh2
.tw
Hh0
Hh6
.b
IL-8WNT-5b2BLKBIRC4I -TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF -C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF -B ChainTSSC3IGF-I IAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP I IWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gam maPKC al phaCXCR-4BNIP3
HJ2
.b HJ0
He0
He2
.b
Hh6
.tw
Hh4
.b
Hh2
.b
Hh4
.tw
Hh2
.tw Hh0
Hh6
.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
HJ2.b HJ0
He0
He2.b
Hh6.tw Hh4.b
Hh2.b
Hh4.tw
Hh2.tw
Hh0
Hh6.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP I IWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
a
B
Low
High
Expression
HJ2
.b HJ0
He0
He2
.b
Hh6
.tw
Hh4
.b
Hh2
.b
Hh4
.tw
Hh2
.tw Hh0
Hh6
.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
HJ2
.b
HJ0
He0
He2
.b
Hh6
.tw
Hh4
.b
Hh2
.b
Hh4
.tw
Hh2
.tw Hh0
Hh6
.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
123456789b
UNSUPERVISED CLASSIFICATION
Genes Associated to Survival Times and Risk
Positive NegativeGene Selection
+
+
++++++++
+++++
Kaplan-Meier Plot
Time
Haz
ard
1.0
0.0
+
+
++++++++
+++++
Kaplan-Meier Plot
Time
Haz
ard
1.0
0.0
0.0 0.0
SURVIVAL TIMES
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
Regression: Gene Association to outcome
Positive NegativeGene Selection
Dep
ende
nt V
aria
ble
Gene Expression
Dep
ende
nt V
aria
ble
Gene Expression
Slope ≠ 0 Slope = 0
REGRESSION
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
M M M M M
M M M M M M M M
M M M M M
M M M
M M M
M M M
X X
Unmethylated Fraction Hypermethylated FractionSample Control Sample Control
Cleavage withmethylation-sensitive
restriction enzymeCleavage with
TasI Csp6I
CpG specificAdaptor Ligation Adaptor Ligation
CpG specificcleavage with
McrBC
Cleavage withmethylation-sensitive
restriction enzyme
Adaptor-specificamplification
Adaptor-specificamplification
Unmethylated fraction Hypermetylation fraction
Cy5(red)
Cy3(green)
Cy5(red)
Cy3(green)
Microarray Microarray
CPG METHYLATION
Labelling DetectionHybridisation
AA CG CC……
SNP1SNP2SNP3
3'
T
3'
T
3'
G
3'
C
3'
G
3'
GT G
GC
5'
5'
5'
5'SNP1
SNP2
SNP3
Products of 1nt primerextension (in solution)
Capture
C TGA
5'
GC
5'
CG
AA CG CC…
…SNP1SNP2SNP3
5'5'5'5'
+
Transcribed RNA+ reverse transcriptase
5' 5'
GCGCA^C
5'5'
TA C^AExtension
ddNTPs(one labelled)
5'
TA
5'
TA
5'
GC
5'
CG
5'
GC
5'
GC
AA CG CC……
SNP1SNP2SNP3
Extension(1nt)
+
Labelled ddNTPsPCR products+ DNA polymerase
TC GA
SNP1 SNP2 SNP3a
b
c
Chromatin Immuno-Precipitation(ChIP-on-Chip)
Precipitation ofAntibody-TF-DNA
complex
Fusion ofTag sequenceinto TF gene
Labelling ofprecipitated
DNA
MicroarrayHybridisation
IncubationDNA-Tagged TF
Transcription Factor Tag
Antibodyagainst
tag peptide
(1) ACGGCTAGTCACAAC...(2) GCTAGTCACAACCCA...(3) GCTAGTCCGGCACAG......
Sample
Spotted Hybridized
(1) (2) (3)
PATHOGEN/PARASITES DETECTION
EXAMPLE 1: DIFFERENTIAL EXPRESSIONPlacenta 1 Placenta 2
mRNA ExtractionReference Pool
Labelling
MicroarrayHybridization(by duplicates)
Scanning &Data Processing
Detection ofDifferentially
Expressed Genes
Validation andAnalysis
Green GreenRed Red
t-test H0: µ = 0p-values correction: False Discovery Rate
Comparison With Known Tissue Specific Genes
ImageAnalysis
WithinNormalization
(per array)
BetweenNormalization
(all arrays)
(controls)
(Dr. Hugo Barrera)
a b
c dPlacenta/Reference Control/Control
51 52 56 54
(a) Microarray Experiment
Ratio(log2)
10 -6
Plac
enta
(b) T1dbase
T1 score
1 0
Lung
Th
alam
us
Amyg
dala
Sp
inal
Cor
d Te
stis
K
idne
y Li
ver
Pitu
itary
Th
yroi
d C
ereb
ellu
m
Hyp
otha
lam
us
Cau
date
Nuc
leus
E
xocr
ine
Panc
reas
Ly
mph
Nod
e Fr
onta
l Cor
tex
Stom
ach
Bre
ast
Bon
e M
arro
w
Panc
reat
ic Is
lets
U
teru
s O
vary
Sk
in
Hea
rt
Skel
etal
Mus
cle
Pros
tate
Th
ymus
Sa
livar
y G
land
Tr
ache
a
Plac
enta
2R
eplc
ate
2
Plac
enta
2R
eplic
ate
1Array:
Plac
enta
1 R
eplic
ate
1
Plac
enta
1R
eplic
ate
2
OTHER MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
ANTIBODIES MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
PROTEIN MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
CARBOHYDRATE MICROARRAY
SMALL-MOLECULE MICROARRAYS