The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Embed Size (px)

DESCRIPTION

- measurement = truth + error - error = bias + variance Error model Normalization Experimental replicate (techniques and biological) and statistics Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 don´t have the same efficient Variance is often normally distributed, ex : instrumentation imperfection and biological variation Statistics background

Citation preview

The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais Herig Summary - Statistics background - Introduction to microarray - Pre-processing microarray data - Statistics analysis - Applications on the LGE - Gene Chip - measurement = truth + error - error = bias + variance Error model Normalization Experimental replicate (techniques and biological) and statistics Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 dont have the same efficient Variance is often normally distributed, ex : instrumentation imperfection and biological variation Statistics background - Standard deviation Mean : Standard deviation : mean(x) Gaussian function Assume data with one outlier: x = (8, 85, 7, 9, 5, 4, 13, 6, 8) The mean of all xs, i.e. (x 1 +x x K )/K, is affected by the outlier: mean(x) = (7.5) The median of all xs, i.e. the middle value of (x 1 +x x K ), is not (if < 50% values are outliers): x ordered = (4,5,6,7,8,8,9,13,85) median(x) = 8.0 Use the median instead of the mean if you expect artifacts. (If there are a lot of measurements and the errors are symmetrically distributed the median will give the same result as the mean without outliers.) - Mean vs median : - Quantiles Mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value. Q p =30% x=(0,10,40,25,15,50,70,60) x=(0,10,15,25,40,50,60,70) ordered values Quantil(x ; 30%) = (0,10,15) 1 quartil = 10 3 quartil = 60 Median = (25+40)/2 = 32.5 Introduction to microarray -Three different microarray technologies : - Spotted cDNA microarrays (500 to 2500 bp) - Spotted oligonucleotide microarrays (30 to 70 bp) - Affymetrix chips (25 bp) - Can be used to : - Differential gene expression studies, gene co-regulation studies, gene function identification studies. time-course studies, dose-response studies, clinical diagnosis, Two color architecture Probes: 30-meros, 90% at 550 bases downstream extremidade 3 Targets: 10ug cRNA biotinilado Codelink architecture (one color) higher frequency, more energy lower frequency, less energy excitation red laser green laser emission overlay images Scanning A B C H G F D E a b c d e f g h i j k Scarpari, Leandra 2006 Tese Doutorado Ludwig flags : (0) Int