Upload
skylar
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Reading and Pre-Processing Microarrays. Bioinformatics Dr. Víctor Treviño [email protected]. Data processing of Placental Microarrays Dr. Hugo A. Barrera Saldaña Paper in Mol. Med. 2007 . Search PubMed for Trevino V. Exercise. Example 1: Differential Expression. Reference Pool. - PowerPoint PPT Presentation
Citation preview
BIOINFORMATICSDR. VÍCTOR TREVIÑ[email protected]
Reading and Pre-Processing Microarrays
EXERCISE Data processing of Placental
Microarrays Dr. Hugo A. Barrera Saldaña Paper in Mol. Med. 2007.
Search PubMed for Trevino V
EXAMPLE 1: DIFFERENTIAL EXPRESSIONPlacenta 1 Placenta 2
mRNA ExtractionReference Pool
Labelling
MicroarrayHybridization(by duplicates)
Scanning &Data Processing
Detection ofDifferentially
Expressed Genes
Validation andAnalysis
Green GreenRed Red
t-test H0: µ = 0p-values correction: False Discovery Rate
Comparison With Known Tissue Specific Genes
ImageAnalysis
WithinNormalization
(per array)
BetweenNormalization
(all arrays)
(controls)
(Dr. Hugo Barrera)
a b
c dPlacenta/Reference Control/Control
51 52 56 54
(a) Microarray Experiment
Ratio(log2)
10 -6
Plac
enta
(b) T1dbase
T1 score
1 0
Lung
Th
alam
us
Amyg
dala
Sp
inal
Cor
d Te
stis
K
idne
y Li
ver
Pitu
itary
Th
yroi
d C
ereb
ellu
m
Hyp
otha
lam
us
Cau
date
Nuc
leus
E
xocr
ine
Panc
reas
Ly
mph
Nod
e Fr
onta
l Cor
tex
Stom
ach
Bre
ast
Bon
e M
arro
w
Panc
reat
ic Is
lets
U
teru
s O
vary
Sk
in
Hea
rt
Skel
etal
Mus
cle
Pros
tate
Th
ymus
Sa
livar
y G
land
Tr
ache
a
Plac
enta
2R
eplc
ate
2
Plac
enta
2R
eplic
ate
1Array:
Plac
enta
1 R
eplic
ate
1
Plac
enta
1R
eplic
ate
2
Data downloaded from URL: http://chipskipper.embl.de/iner-embo-course/index.htm
1. 2 dyes, 2 slides per assay (each containing different probes, same sample in both slides, oligo or cDNA arrays ?). 48 grids, 24x24 spots
2. .grd files contain the "initial grid" specification for the slides3. .adf files contain the "annotations" of the genes.4. Files: 51,52,53,54,55,56. 5xa is the slide 1 and 5xb the slide 2 of each
assay.5. Some assays use the same rna sample (techincal replicates). See table
in next slide.6. One dye is Placental RNA and the other is a reference pool of different
organs RNA
GOALS:7. Detect Differential Expressed Genes8. Focus on Placental Specific Genes (growth hormone family?)
Contact:Dr. Hugo A. Barrera Saldana(81) 83294050 ext. 2871, 2872, 2587(81) 81238249 (particular), 0448110778789 (mobile)Secretario de Investigacion, Regulacion y [email protected]
SLIDES' SCANNINGSGROUP SLIDE CY3 (GREEN) CY5(RED) COMMENTS
1a 52 A V Sample Control
1b 52 B V Sample Control
2a 51 A V Sample Control
RIGHT TOP GROUP
2b 51 B V Sample Control
RIGHT BOTTOM GROUP
3a 56 A V Control Muestra
3b 56 B V Control Muestra
4a A 54 V Control Muestra
4b B 54 V Control Muestra
5a A 55 V Control Control
LEFT TOP GROUP
5b B 55 V Control Control
LEFT BOTTOM GROUP
6a A 53 V Control Control
6b B 53 V Control Control Pending Questions:1) Slides from group 1 and 2 should be 52 and 51, which is which?2) Are the slides from Group 5 and 6 Control vs Control?
1) In which case we have only 2 independent samples3) Group 5 should be slide 55, A and B, isn't?
[email protected] ANALYSIS Download and use SpotFinder from TM4 Suite
http://www.tm4.org Download Images (51.zip or 55.zip from
http://bioinformatica.mty.itesm.mx/?q=node/68) Read BOTH Images together using SpotFinder
Mark file 1 as "Cy3" = Green Mark file 2 as "Cy5" = Red
Create Grid Metarows = 12, Metacolumns = 4 Rows = 24, Columns = 24 Pixels = 450 (of the 24 x 24 spots) Spacing = 18 (between metacolumns and metarows)
Adjust each of the 24 Grids to correct positions Right mouse button in a grid Right mouse button in a blank section to move all
grids Save the grid
IMAGE ANALYSIS Use Gridding and Processing
Adjust (save grid first, in mac adjust doesn´t work well) Process
Copy images 1 From the grid adjust 1 From the RI plot 1 From the data (figure) 2 From the QC view (A and B) What does they represent?
Export to .mev file Open .mev file in excel Remove comment lines Compute signal:
Signal A = Cy3 Green = MNA - MedBkgA = Media del spot A - Mediana del fondo B
Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondo B
Plot Signal A vs Signal B Copy image in a word file
DO NOT SAVE THE modified .MEV FILE
RESULTS Upload .mev file to google groups
identifying the Slide name and team Next week, we will process all your
uploaded data for processing
COLUMNS WITHIN .MEV FILE UID IA IB R C MR Print-tip
Normalization MC Print-tip
Normalization SR SC FlagA FlagB SA SF QC QCA QCB
BkgA BkgB SDA SDB SDBkgA SDBkgB MedA MedB MNA Signal Ch. A = Cy3
[Green] MNB Signal Ch. B = Cy5 [Red] MedBkgA Background Ch. A MedBkgB Background Ch. B X Y PValueA PValueB
COLUMNS WITHIN GENEPIX .GPR FILE Block Print-tip Normalization Column Row Name ID X Y Dia. F635 Median F635 Mean F635 SD B635 Median B635 Mean B635 SD % > B635 + 1 SD % > B635 + 2 SD F635 % Sat. F532 Median F532 Mean F532 SD B532 Median B532 Mean B532 SD % > B532 + 1 SD % > B532 + 2 SD
F532 % Sat. Ratio of Medians Ratio of Means Median of Ratios Mean of Ratios Ratios SD Rgn Ratio Rgn R² F Pixels B Pixels Sum of Medians Sum of Means Log Ratio Flags Normalize F1 Median - B1 F2 Median - B2 F1 Mean - B1 Signal -
Background F2 Mean - B2 Signal -
Background SNR 1 F1 Total Intensity Index "User Defined"http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html#gpr
MIDAS TM4 http://www.tm4.org/midas.html Project New Read Data Single Data File
Specify your .mev file OperNormalization
LOWESS Write Output
No virtual Execution
• ReportsPDF
MIDAS TM4
MIDAS - PROBLEMS only ~ 9,000 data generated for 54a Output is different
Spotfinder+MidasChipskipper + R (Bioconductor)
This problem exemplify that the right software + right parameters is needed foreach experiment (ChipSkipper was designed by the microarray slide provider).
51a.txt
51b.txt
56a.txt
56b.txt
52a.txt
Same Sample??Same Image??Same Scan??
52b.txt
55A.txt
controls
55B.txt
controls
53A.txt
controls
53B.txt
controls
54a.txt
54b.txt
SUMMARY 2 independent samples
51a+52a, 54a+56a 51b, 54b+56b (52b has problems)
It seems that no bias is present per subgrid (not shown)
Raw values will be used (no-normalised)
g51a a bit differentto g52a
g52a seems to be more "noisy"
54a and 56a looks more correlated in both g and r
(This is was computed normalizing each channel independently)
Averages = [Log(Cy3) + Log(Cy5)] / 2
M (ratios) = Log("Cy5" / "Cy3") = Log(Sample/Reference)
GENESSELECTEDSLIDES A:
(t-test vs mean=0)
fdr <= 10%fold >= 2
NEXT SESSION Lun 21 6-9pm Juev 24