Session ii g1 overview genomics and gene expression mmc-good

Preview:

Citation preview

Microarray Dataset: quick mining and gene profile analysis using online tools

Dr. Etienne Z. GNIMPIEBA

Sioux Falls, March 2013

Etienne.gnimpieba@usd.edu

Plan Gene expression measurement Microarray processGene expression data storesData mining / queringData analysisExample: ATP13A2 profile in stress

conditions

Gene expression measurement

Higher-plex techniques: SAGEDNA microarrayTiling arrayRNA-SeqNGS

Low-to-mid-plex techniques: Reporter geneNorthern blotWestern blotFluorescent in situ hybridizationReverse transcription PCR

What is a Microarray?

“A DNA microarray is a multiplex technology consisting of thousands of oligonucleotide spots, each containing picomoles of a specific DNA sequence.”

Used to quantitate mRNA or DNAMany applications:

◦mRNA or DNA levels◦SNP identification◦ChIP-on-Chip

Hypotheses

Microarrays are usually hypothesis-generating:◦ They highlight specific genes or features that are

particularly interesting for follow-up experiments◦ There are many interesting exceptions

Biomarkers Pathway analyses

This does not reduce the importance of experimental design◦ the low statistical power of array studies make good

design even more important and very challenging

Microarray process (1/3)• Image analysis

(genepix)• Normalization (R)• Pre-treatment• Differential

expression• Clustering• Data mining• Annotation

Microarray process (2/3)

Microarray process (3/3)High density

filters(macroarrays)

Glass slides (microarrays)

Oligonucleotides chips

Detail: Detail: Detail:

Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm

•2400 clones by membrane•radioactive labelling•1 experimental condition by membrane

•10000 clones by slide•fluorescent labelling•2 experimental conditions by slide

•300000 oligonucleotides by slide•fluorescent labelling•1 experimental condition by slide

Gene expression data management

DatabaseMicroarray Experiment 

Sets

Sample Profiles Date Reported

ArrayExpress at EBI 24,838 708,914 October 28, 2011

ArrayTrack™ 1,622 50,953 February 11, 2012

caArray at NCI 41 1,741 November 15, 2006

Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011

Genevestigator database 2,500 65,000 January 2012

MUSC database ~45 555 April 1, 2007

Stanford Microarray database 82,542 Not reported October 23, 2011

UNC Microarray database ~31 2,093 April 1, 2007

UNC modENCODE Microarray database ~6 180 July 17, 2009

UPenn RAD database ~100 ~2,500 September 1, 2007

UPSC-BASE ~100 Not reported November 15, 2007

SAGEGEOGUDMAP (421)MGIBIOGPS

Data mining / querying

Problem specificationQueryExtractionStorage LoadPretreat / prepare for analysis

Data analysis (1/3)Question-Answer

◦ Experimental condition profile: group comparison

◦ Annotation profile: systems biological involved◦ Clustering profile: co-regulation◦ Time course profile: time variation◦ …

Descriptive ◦ Boxplot (SD, MEAN, MEDIAN, )◦ Scatter plot

Predictive / inference (clustering)Modeling (machine learning, simulation)

Data analysis (2/3)

3 Questions ◦What is the right dataset (experimental condition)?

◦ Is dataset is ready for analysis (quality)?

◦What is the expression profile for a given gene?

◦Significant differential expression in groups comparison

Tools◦ArrayExpress (EBI)

◦Boxplot

◦GEO2R (LIMMA, profile graph,)

◦….

Data analysis (3/3)

Boxplot

Example: ATP13A2 profile in stress conditions

Specification: ATP13A2 profile in stress conditions

Data querying: ◦GEO◦Array Express ◦Gene Atlas

Data analysis: ◦Online: GEO2R, Genospace, …◦Desktop: R, ArrayTrack, …

Significant differential expression !!!

Kerry Bemis slides

Recommended