View
170
Download
0
Category
Tags:
Preview:
Citation preview
Epidemiology modeling(Microarray, NGS & qRT-PCR)
Theme: Transcriptional Program in the Response of Human Fibroblasts to Serum.
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013Etienne.gnimpieba@usd.edu
Data manipulation Gene expression data analysisOMIC World
DNA
E
DNA
mRNA
E Degradatio
n
Degradation
Translation
Transcription
Gene Repressi
on
S P
Catalyse
Genomics
FunctionalGenomics
Transcriptomics
Proteomics
Metabolomics
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data manipulation Gene expression data analysisOMIC World
GENOMICS
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisOMIC World
Genomics is the sub discipline of genetics devoted to the mapping,
sequencing ,and functional
analysis of genomicsGenomics can be said to have appeared in the 1980s, and took off in the 1990s with the initiation of genome projects for several biological species.
The most important tools here are microarrays and bioinformatics
DNA microarrays allow for rapid measurement and visualization of differential expression between genes at the whole genome scale. If technique implementation is
quite complicated, it’s principle is very easy. Here are described the major steps involved in this process
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisProcess
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisProcess
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisMicroarray Production Process
High density filters(macroarrays)
Glass slides (microarrays) Oligonucleotides chips
Detail: Detail: Detail:
Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm
•2400 clones by membrane•radioactive labelling•1 experimental condition by membrane
•10000 clones by slide•fluorescent labelling•2 experimental conditions by slide
•300000 oligonucleotides by slide•fluorescent labelling•1 experimental condition by slide
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisMicroarray Production Process
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France) Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:
Data Manipulation Gene Expression Data AnalysisMicroarray Production Process
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
• Image analysis (genepix)• Normalization (R)• Pre-treatment• Differential expression• Clustering• Data mining• Annotation
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
• How to select columns• How to use functions• How to anchor a cell value in a function• How to copy the function result and not the
function itself• How to sort data by columns• How to search and replace
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Plan
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics: Pre-Treatment
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
1. Open the file containing the experiment series (your expression matrix) in Excel software, using the tabulation character as the column separator.
2. For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the MEAN Excel function. Verify that the value obtained is equal to zero.
3. If it is not the case, remove from each experiment log2(Ratio) value the corresponding mean value. Be careful, for missing values (empty cells), replace empty contents by the NULL or NA string, in order to avoid introducing a zero value in Excel calculation in this cell. Indeed, a missing value is different from a true null one!
4. Once this operation has been done, verify that the final mean value is equal to zero, this in order to avoid errors with Excel handling. Be careful, with decimal separator handling in Excel version (dot or coma)!
Centering and Scaling Data
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (1)
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:
The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.
The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).
Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.
SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (2)
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
When the SAM macro is launched in the tool bar (“SAM”), a setting window appears. For further information on the various options you can choose, the best is to refer to the SAM manual. However, the first important things to do is to indicate if the data source has been transformed in log2 or not, then, as data bootstrapping uses a random generator, you need to initialize it several times by creating a various number of seeds.
Once all the chosen iterations have been done, SAM displays a plot representing each gene thanks to its score in the real distribution compared to the random distributions. Therefore, the differentially expressed genes are the ones moving away from the 45° slope line.
First, display the delta table. This table indicates for each delta value, the number of putative differentially expressed genes, the significant genes, and the number of false positive genes estimated using the False Discovery Rate (FDR). The user fixes the delta value according to the number of false positive or significant genes he wants to obtain.
To choose the delta value, get back to the SAM plot sheet and display the “SAM plot controller” by clicking on the SAM macro button.
The SAM Plot Controller window lets you fix the delta value you want: “Manually Enter Delta”. Then if you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes in the “SAM output” sheet according to the delta value you chose.
This sheet summarizes the selected parameters and gives you the list of induced and repressed genes.
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisGEPAS: Gene Expression Pattern Analysis Suite
• Frouin, V. & Gidrol, X. (2005) • CBB group (Berlin)
• Transcriptome ENS (France)
Verify the availability of the data file in your folder name FibroGEPAS.txt
Open the dataset for description Open GEPAS portal on http://
www.transcriptome.ens.fr/gepas/index.html Click on “Tools” Preprocessing
- Preprocess DNA array data files: log-transformation, replicate handling, missing value imputation, filtering and normalization- Filtering
Viewing Clustering Differential expression Classification Data mining
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Microarray Dataset: Mining and Gene Profile Analysis using online Tools
Kruer Lab
Plan • Gene Expression Measurement • Microarray Process• Gene Expression Data Stores• Data Mining / Querying• Data Analysis• Example: ATP13A2 Profile in Stress
Conditions
Gene Expression MeasurementGene
expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Higher-plex techniques: SAGEDNA microarrayTiling arrayRNA-SeqNGS
Low-to-mid-plex techniques: Reporter geneNorthern blotWestern blotFluorescent in situ hybridizationReverse transcription PCR
DatabaseMicroarray Experiment
SetsSample Profiles Date Reported
ArrayExpress at EBI 24,838 708,914 October 28, 2011
ArrayTrack™ 1,622 50,953 February 11, 2012
caArray at NCI 41 1,741 November 15, 2006
Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011
Genevestigator database 2,500 65,000 January 2012MUSC database ~45 555 April 1, 2007Stanford Microarray database 82,542 Not reported October 23, 2011
UNC Microarray database ~31 2,093 April 1, 2007
UNC modENCODE Microarray database ~6 180 July 17, 2009
UPenn RAD database ~100 ~2,500 September 1, 2007
UPSC-BASE ~100 Not reported November 15, 2007
SAGE GEOGUDMAP (421) MGIBIOGPS
Gene expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Gene Expression Measurement
Data Mining / Querying
• Problem specification• Query• Extraction• Storage • Load• Pretreat / prepare for analysis
Gene expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Data Analysis • Question-Answer
– Experimental condition profile: group comparison– Annotation profile: systems biological involved– Clustering profile: co-regulation– Time course profile: time variation– …
• Descriptive – Boxplot (SD, MEAN, MEDIAN, )– Scatter plot
• Predictive / inference (clustering)• Modeling (machine learning, simulation)
Gene expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
• 3 Questions – What is the right dataset (experimental
condition)?– Is dataset is ready for analysis (quality)?– What is the expression profile for a given gene?– Significant differential expression in groups
comparison• Tools– ArrayExpress (EBI)– Boxplot – GEO2R (LIMMA, profile graph,)
Gene expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Data Analysis
Boxplot Gene
expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Data Analysis
Example: ATP13A2 Profile in Stress Conditions
• Specification: ATP13A2 profile in stress conditions
• Data querying: – GEO– Array Express – Gene Atlas
• Data analysis: – Online: GEO2R, Genospace, …– Desktop: R, ArrayTrack, …
Gene expression technologies
Microarray process
Gene expression data stores
Data mining / quering (pb-query-extraction-load-store-pretreat)
Data analysis (Question-Answer, descriptive, predictive, modeling)
Example: ATP13A2 profile in stress conditions
Resolution Process
Context
Specification & Aims
Lab #2
Preprocessing Viewing Clustering Differential expression Classification Data mining
24
Statement of problem / Case study: The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to
serum, was explored with a complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in this complex multicellular response than had previously been appreciated.
Gene Expression Data Analysis
16 Vishwanath R. Iyer, Scince, 1999
Conclusion: ?
Aim: The purpose of this lab is to initiate on gene expression data analysis process. We simulated the application on “Transcriptional Program in the Response of Human Fibroblasts to Serum” . Now we can understand how a researcher can come to identify a significant expressed gene from microarray dataset.
T1. Gene expression overview
T2. Excel used in GenomicsObjective: used of basic excel functionalities to solve some
gene expression data analysis needs
Acquired skills- Gene expression data overview- Excel Used for genomics- Microarray data analysis using GEPAS
T1.1. Review of genomics place in OMIC- world T1.2. Microarray data technics and process T1.3. Data analysis cycle and tools
T2.1. Colum manipulation, functions used, anchor, copy with function, sort data, search and replaceT2.2. Experiment comparison: Data pre-treatmentT1.3. Differential expressed gene from replicate experiments (SAM)T2. GEPAS: Gene expression analysis
pattern suiteObjective: used of the GEPAS suite to apply the whole microarray data analyzing process on fibroblast data.
http://www.transcriptome.ens.fr/gepas/index.html
Expression Profile Clustering:
Slide Scanning:
Target Preparation:
Hybridization:
Recommended