111
Microarrays IST 444

Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarrays

IST 444

Page 2: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarrays

• What if no test tubes were needed to conduct an experiment?

• Hundreds, thousands or even millions of individual experiments are conducted in parallel, with very few reagents

• Microarrays combine genomics (study of all the genes in the genome) with experiments and eventually, diagnostics

Page 3: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarrays:Universal Biochemistry Platforms

PeptidesPeptides ProteinsProteins

Carbohydrates

LipidsLipids

Small moleculesSmall molecules

DNADNA

Page 4: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Types of microarrays include:

• DNA microarrays, such as cDNA microarrays and oligonucleotide microarrays, SNPs, CHiP

• MMChips, for surveillance of microRNA populations

• Protein microarrays (protein-protein interactions)• Tissue microarrays • Cellular microarrays (also called transfection

microarrays • Chemical compound microarrays • Antibody microarrays (proteomics)• Carbohydrate arrays (glycoarrays)

Page 5: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarrays Require Bioinformatics

• Microarrays combine genomics, silicon chip manufacturing, DNA and Protein chemistry, signal and image processing, statistics, software skills and miniaturized versions of traditional molecular biology experiments.

• Develop new software to analyze the results of the many possible experiments.

Page 6: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Biological Samples in 2D Arrays on Membranes or Glass Slides

Page 7: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

DNA Microarray Technologies• Gene expression profiling

- Monitoring expression levels for thousands of genes simultaneously.

• Three other common applications:• Array CGH (Comparative genomic hybridization)

- Assessing genome content in different cells or closely related organisms.

• SNP array (single nucleotide polymorphism) - Identifying single nucleotide polymorphism among alleles within or between populations.

• ChIP-on-chip (Chromatin immunoprecipitation) - Determining protein binding site occupancy throughout the genome.

• Methylation arrays (immonoprecipitate methylated DNA_-Determining which regions of DNA are methylated to determine epigenetics

Page 8: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

History of DNA Microarrays

• Microarrays descend from Southern and Northern blotting. Unknown DNA is transferred to a membrane and then probed with a known DNA sequence with a label

• In Microarrays, the known DNA sequence (or probe) is on the membrane while the unknown labeled DNA (or target) is hybridized and then washed off so only specific hybrids remain.

• Dot Blots of different genes in an array were used to assay gene expression as early as 1987.

• Complete genome of all Saccharomyces cerevisiae ORFs on a microarray published in 1997 by Lashkari et al. http://www.pnas.org/content/94/24/13057.

• www.bio.davidson.edu/Courses/genomics/chip/chip.html

Page 9: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

RNA Transcription as a Measure of Gene Expression

www.csu.edu.au/faculty/health/biomed/subjects/molbol/basic.htm

Transcription factors bind to the promoter and bind RNA polymerase

DNA strands separate and transcription is initiated

Transcription continues in the 3'-5' direction until the stop codons are reached

The completed RNA strand is released for post-processing

Page 10: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

What Can be Measured using Microarrays?

1. Amount of mRNA expressed by a gene.gene expression array, exon array, tiling array

2. Amount of mRNA expressed by an exon. exon array, tiling array

3. Amount of RNA expressed by a region of DNA. tiling array

4. Which strand of DNA is expressed. exon array, tiling array

5. Which of several similar DNA sequences is present in the genome.SNP array

6. How many copies of a gene is present in the genome.gene expression array, exon array, tiling array

7. Where a known protein has bound to the DNA. (ChIP on chip)promoter array, tiling array

Page 11: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Types of Microarrays

Exon 1 Exon 2 Exon 3UTR UTR

A cDNA microarray can be made from the unsequenced cDNA library. All the other types require that the sequence be available.

oligo

exon exon exon

cDNA

chromosome sequence

CCGTTCACATTAGGATACCAGTTCAAGGCCGTTCACATTAGGATACCAGTTCAAGGAGGCCGTTCAGTTCACATTA

tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile tile

promoter

CCGTTCACA AAGGCCGTT

CCGTGCACA AAGGACGTTSNP

cDNA sequence

Page 12: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Spotted vs. in situ synthesized arrays

• The DNAs can be chemically synthesized or made by PCR and then mechanically spotted on the array. – The amount spotted can vary. – Method is more flexible and less expensive.

• The DNA can be chemically synthesized directly on the array (Affymetrix). – This can be more consistent– Shorter pieces are used.

Page 13: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Format of an Affymetrix Array

http://cnx.rice.edu/content/m12388/latest/figE.JPG

Page 14: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Print Technology

1. The cDNA or oligo can be printed on the slide using an "arraying robot" which deposits a drop of liquid containing the material at each spot. (gene expression only)40,000+ spots

2. Oligos (all the same length) can be synthesized on the slide using:i) inkjet technologyii) photolithography1,000,000+ spots

3. There are other technologies that give similar types of results (e.g. "beads").

Page 15: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Spotted 2-Channel Array

http://www.anst.uu.se/frgra677/bilder/micro_method_large.jpg

Spotted arrays are printed on coated microscope slides.

2 RNA samples are converted to cDNA. Each is labelled with a different dye.

Page 16: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

SNP Analysis Using the Illumina, Inc.

GoldenGate™ Assay

• Allele Specific Extension and Ligation

• PCR Amplification

• Hybridization to the Universal Sentrix® Array Matrix

Page 17: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

AG

illumiCode’ AddressAllele Specific Extension &Ligation

Universal PCR Sequence 1

Universal PCR Sequence 2

Universal PCR Sequence 3’

Allele Specific Extension and Ligation

Genomic DNA [T/C] Ligase [T/A]Polymerase

Custom Oligo Pool All (OPA)

96-1,536 SNPs multiplexed

Total oligos in reaction – 288-4,608

Page 18: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

A illumiCode #561Amplification Template

PCR with Common Primers

PCR Amplification

Cy3 Universal Primer 1

Cy5 Universal Primer 2

Universal Primer P3

Page 19: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

/\/\/\

/

/\/\/\

/

/\/\/\

/

illumiCode #561

illumiCode #217

illumiCode #1024

Hybridization to Sentrix® Array Matrix

/\/\/\

/

/\/\/\

/

A/A G/G C/T

SNP #561 SNP #217 SNP #1024

Page 20: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Sentrix® Array Matrix

10 m1.5 mm 400 m

Page 21: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

The Illumina BeadStation 500G permits high throughput analysis of thousands of SNP DNA markers in hundreds of genotypes in less than one week.

Page 22: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Gene Expression Microarrays

Page 23: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Two-color fluorescent scan of a yeast microarray containing 2,479 elements (ORFs). Red and Green probes interact with a single target. Yellow probes interact with

both targets and empty probes with neither target.

Lashkari D A et al. PNAS 1997;94:13057-13062©1997 by The National Academy of Sciences of the USA

Page 24: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Tiling Array

• Genome array consisting of overlapping probes

• Finer Resolution

• Better at finding RNA in the cell– mRNA

• Alternative splicing• Not Polyadenylated

– microRNA

Page 25: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Tiling Arrays

http://en.wikipedia.org/

Page 26: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Tiling Array

http://en.wikipedia.org/

Page 27: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

© Affymetrix Inc.

SNP Array

Page 28: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

* Most common genetic variation in human genome. Occur about every one thousand base pairs in genome

* Genome-wide SNP maps now available (millions in database)

Single Nucleotide Polymorphisms (SNPs)

Page 29: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Affymetrix Standard Tiling

C-T-C-C-A-A-A-A-A-A-A-T-T-T-C-A-T-T-C-T

C-T-C-C-A-A-A-A-A-A-C-T-T-T-C-A-T-T-C-T

C-T-C-C-A-A-A-A-A-A-G-T-T-T-C-A-T-T-C-T

C-T-C-C-A-A-A-A-A-A-T-T-T-T-C-A-T-T-C-T

Substitution position

Page 30: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

ChIP-on-chip array(Chromatin

ImmunoPrecipitation )

Page 31: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Antibody

ChIP-on-chip array

Page 32: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

ChIP-on-chip array

Page 33: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

3. ChIP-on-chip array

Page 34: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Bioinformatic approaches for analysis

• Measuring 10000s of data points simultaneously

• High dimensional data– 10 Exp x 50K = 500K

• How to find real differences over the noise

• Statistical approaches

Tumor Normal

Page 35: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Bioinformatic approaches for analysis

• Class Comparison– Which genes are up or

down in tumors v normal, untreated v treated

• Class Discovery– Within the tumor

samples, are there subgroups that have a specific expression profile?

• Class prediction, pathway analysis etc

Tumor Normal

Page 36: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Challenges in microarray analysis

• Different platforms– Ilumina, Affymetrix, Agilent….

• Many file types, many data formats• Need to learn platform dependent methods and

software required

Page 37: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Public databases• Many sources for public

data – labs, consortia, government

• Publications require that data files including raw files be made public

• GEO –http://www.ncbi.nlm.nih.gov/geo/

• Array Express - http://www.ebi.ac.uk/arrayexpress/#ae-main[0]

Page 38: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Streamlined Analysis

Normalize

normal tumor tumor normal normal tumorID_REF VALUE ABS_CALL VALUE ABS_CALL VALUE ABS_CALL VALUE ABS_CALL VALUE ABS_CALL VALUE ABS_CALL

AFFX-BioB-5_at 210.6 P 234.6 P 362.5 P 389 P 305.6 P 330.5 PAFFX-BioB-M_at 393 P 327.8 P 501.4 P 816.5 P 542 P 440.8 PAFFX-BioB-3_at 264.9 P 164.6 P 244.7 P 379.7 P 261.3 P 303.7 PAFFX-BioC-5_at 738.6 P 676.1 P 737.6 P 1191.2 P 917 P 767.9 PAFFX-BioC-3_at 356.3 P 365.9 P 423.4 P 711.6 P 560.3 P 484.9 PAFFX-BioDn-5_at 566.3 P 442.2 P 649.7 P 834.3 P 599.1 P 606.9 PAFFX-BioDn-3_at 3911.8 P 3703.7 P 4680.9 P 6037.7 P 4653.7 P 4232 PAFFX-CreX-5_at 6433.3 P 5980 P 7734.7 P 10591 P 8162.1 P 8428 PAFFX-CreX-3_at 11917.8 P 9376.7 P 11509.3 P 16814.4 P 13861.8 P 13653.4 PAFFX-DapX-5_at 12.2 A 44.3 M 31.2 A 37.7 P 33.3 A 12.8 AAFFX-DapX-M_at 57.8 M 42.5 A 79 M 48.8 P 39.5 A 39.2 AAFFX-DapX-3_at 29.8 A 6.2 A 23.4 A 28.4 A 3.2 A 7.6 AAFFX-LysX-5_at 15.3 A 16.2 A 15.6 A 16.7 A 3.1 A 3.9 AAFFX-LysX-M_at 33.2 A 12 A 17.7 A 37.3 A 49.2 A 9.1 AAFFX-LysX-3_at 40.7 M 10.7 A 36.2 A 22.1 A 22.8 A 28.2 AAFFX-PheX-5_at 7.8 A 3 A 7.6 A 5.6 A 5 A 6.4 AAFFX-PheX-M_at 4.2 A 4.8 A 6.8 A 6.1 A 3.7 A 5.5 AAFFX-PheX-3_at 54.2 A 39.6 A 19.4 A 16.1 A 44.7 A 31.2 AAFFX-ThrX-5_at 8.2 A 11.2 A 13.2 A 9.5 A 8.5 A 7.5 AAFFX-ThrX-M_at 38.1 A 30.6 A 37.6 A 7.2 A 26.9 A 36.3 AAFFX-ThrX-3_at 15.2 A 5 A 15 A 8.3 A 36.8 A 11.5 AAFFX-TrpnX-5_at 11.2 A 11.8 A 22.2 A 22.1 A 8.9 A 35.6 AAFFX-TrpnX-M_at 9 A 8.1 A 9.1 A 8.7 A 8.1 A 12 AAFFX-TrpnX-3_at 19.8 A 12.8 A 11.8 A 43.2 M 17.4 A 10 AAFFX-HUMISGF3A/M97935_5_at 82.7 P 120.7 P 92.7 P 46.4 P 55.9 P 46.5 PAFFX-HUMISGF3A/M97935_MA_at 397.6 P 416.7 P 244.8 A 181.4 A 197.5 A 192.3 AAFFX-HUMISGF3A/M97935_MB_at 206.2 P 303 P 300.8 P 253.5 P 195.3 P 216 PAFFX-HUMISGF3A/M97935_3_at 663.8 P 723.9 P 812.1 P 666.1 P 629.4 P 754.1 PAFFX-HUMRGE/M10098_5_at 547.6 P 405.9 P 6894.7 P 3496.1 P 1958.5 P 5799.4 PAFFX-HUMRGE/M10098_M_at 239.1 P 175.8 P 3675 P 1348.6 P 695.9 P 2428.2 PAFFX-HUMRGE/M10098_3_at 1236.4 P 721.4 P 9076.1 P 7795.9 P 4237.1 P 7890 PAFFX-HUMGAPDH/M33197_5_at 19508 P 19267.1 P 22892 P 26584 P 29666.6 P 25038.1 PAFFX-HUMGAPDH/M33197_M_at 18996.6 P 20610.4 P 21573.7 P 29936 P 30106.6 P 22380.2 PAFFX-HUMGAPDH/M33197_3_at 18016.4 P 17463.8 P 20921.3 P 26908.3 P 28382.2 P 21885 PAFFX-HSAC07/X00351_5_at 23294.6 P 21783.7 P 18423.3 P 21858.9 P 23517.1 P 19450.3 PAFFX-HSAC07/X00351_M_at 25373.1 P 24922.8 P 22384.2 P 25760.2 P 27718.5 P 21401.6 PAFFX-HSAC07/X00351_3_at 20032.8 P 20251.1 P 20961.7 P 23494.6 P 23381.2 P 21173.3 P

Raw data Filter

ClassificationSignificance Clustering

Gene lists

Function(Genome Ontology)

(RMA)

•Present/Absent•Minimum value•Fold change

•t-test•SAM•Rank Product

•PAM•Machine learning

Page 39: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

Print or buy the microarray

Page 40: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

Print or buy the microarray

sequencing errorassembly errorcontamination

uniquesimilar hybridization rates

Page 41: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

obtain tissue sample

extract RNA

extract mRNA

label

normalize mRNA

Print or buy the microarray Create the labeled samples

Page 42: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

obtain tissue sample

extract RNA

extract mRNA

label

normalize mRNA

Print or buy the microarray Create the labeled samples

experimental design-number of biological replicates-technical replicatesblockssample pooling

Page 43: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

obtain tissue sample

extract RNA

extract mRNA

label

normalize mRNA

Print or buy the microarray Create the labeled samples

hybridize

Page 44: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

Obtain sequence info

select oligos

Print microarray

obtain tissue sample

extract RNA

extract mRNA

label

normalize mRNA

Print or buy the microrray Create the labeled samples

hybridize

hybridization design (multichannel)

Page 45: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

hybridize scan

detect spots

compute spot summary

detect background

detect bad spots

process image

remove array specific noise

Page 46: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray experiments

hybridize scan

detect spots

compute spot summary

detect background

detect bad spots

spot detection software

pixel mean, median ...background correction

detection limitbackground > foregroundbadly printed spotsflaws

process image using multiple scans

remove array specific noise normalization

Page 47: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Raw data are not mRNA concentrations

• tissue contamination• RNA degradation• amplification efficiency• reverse transcription

efficiency• Hybridization efficiency and

specificity• clone identification and

mapping• PCR yield, contamination

• spotting efficiency

• DNA support binding

• other array manufacturing

related issues

• image segmentation

• signal quantification

• “background” correction

Page 48: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Quality control: Noise and reliable signal

Arrays 1 ... n

Array level Gene levelProbe level

Probe level: quality of the expression measurement of one spot on one particular array

Array level: quality of the expression measurement on one particular glass slide

Gene level: quality of the expression measurement of one probe across all arrays

Page 49: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Probe-level quality control

• Individual spots printed on the slide• Sources:

– faulty printing, uneven distribution, contamination with debris, magnitude of signal relative to noise, poorly measured spots;

• Visual inspection:– hairs, dust, scratches, air bubbles, dark regions, regions with haze

• Spot quality:– Brightness: foreground/background ratio– Uniformity: variation in pixel intensities and ratios of intensities within

a spot– Morphology: area, perimeter, circularity.– Spot Size: number of foreground pixels

• Action:– set measurements to NA (missing values)– local normalization procedures which account for regional

idiosyncrasies.– use weights for measurements to indicate reliability in later analysis.

Page 50: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Spot IdentificationIndividual spots are recognized, size and shape might be

adjusted per spot (automatically fine adjustments by hand).

Additional manual flagging of bad (X) or non-present (NA) spots

poor spot quality

good spot quality

Different Spot identification methods: Fixed circles, circles with variable size, arbitrary spot shape (morphological opening)

NA

X

Page 51: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Spot Identification

Histogram of pixel intensities of a single spot

• The signal of the spots is quantified.

„Donuts“

Mean / Median / Mode / 75% quantile

Page 52: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Rafael A Irizarry,Department of

Biostatistics [email protected]

http://www.biostat.jhsph.edu/~ririzarrhttp://www.biocon

ductor.org

nci 2002

Spot Detection

Adaptive segmentation Fixed circle segmentation

---- GenePix

---- QuantArray

---- ScanAnalyze

Spot uses morphological opening

Page 53: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray Analysis – Data Preprocessing

• Objective– Convert image of thousands of

signals to a a signal value for each gene or probe set

• Multiple step– Image analysis– Background and noise

subtraction– Normalization– Expression value for a gene or

probe set

• Image analysis and background noise usually done by proprietary software

Gene 1 100Gene 2 150Gene 3 75.Gene10000 500

Page 54: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Array Level Quality Control

• Problems:– array fabrication defect– problem with RNA extraction– failed labeling reaction– poor hybridization conditions– faulty scanner

• Quality measures:– Percentage of spots with no signal (~30% excluded spots) – Range of intensities– (Av. Foreground)/(Av. Background) > 3 in both channels– Distribution of spot signal area– Amount of adjustment needed: signals have to substantially

changed to make slides comparable.

Page 55: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Gene-level Quality Control

Gene g• Poor hybridization in the

reference channel may introduce bias on the fold-change

• Some probes will not hybridize well

to the target RNA

• Printing problems: such that all

spots of a given inventory well

have poor quality.

•A well may be of bad quality – contamination

•Genes with a consistently low signal in the reference channel

are suspicious

Page 56: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Normalization• Corrects for variation in

hybridization etc• Assumption that no

global change in gene expression

• Without normalization– Intensity value for gene

will be lower on Chip B– Many genes will appear

to be downregulated when in reality they are not

Gene 1 100Gene 2 150Gene 3 75Gene10000 500

507532250

Treated Control

Page 57: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Gene

mRNA Samples

gene-expression level or ratio for gene i in mRNA sample j

M =Log2(red intensity / green intensity)

Function (PM, MM) of MAS, dchip or RMA

sample1 sample2 sample3 sample4 sample5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

A =average: log2(red intensity), log2(green intensity)

Function (PM, MM) of MAS, dchip or RMA

Gene expression data

Page 58: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Data Data (log scale)

Scatterplot of Data

Message: look at your data on log-scale!

Page 59: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Use a log transformation of the ratio data:

Scatter plot of all genes in a simple comparison of two control (A) and two treatments (B: high vs. low glucose) showing changes in expression greater than 2.2 and 3 fold (lines).

X Axis is Average spot intensity on a log scale and Y Axis is specific spot intensity.

Page 60: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Statistical power• t test

– Test hypothesis that the two means are not statistically different

– Adding “confidence” to the fold change value

• Mean• Standard deviation• Sample size• Calculates statistic• You choose cutoff or

threshold– Give me gene list at a cutoff

of p <0.05» 95% confidence that

the mean for that gene between control are treated are different

Page 61: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Experimental Design – Very important!!!

• Sample size– How many samples in test and

control• Will depend on many factors

such as whether tissue culture or tissue sample

• Power analysis

• Replicates– Technical v biological

• Biological replicates is more important for more heterogeneous samples Need replicates for statistical analysis

• To pool or not to pool– Depends on objective

• Sample acquisition or extraction– Laser captured or gross

dissected

• All experimental steps from sample acquisition to hybridization– Microarray experiments are

very expensive. So, plan experiments carefully

Page 62: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

t tests• Results might look like

– At a p<0.05, there are 300 genes up and 200 genes down regulated• 95% confidence that the

means of these genes in the two groups is different

– At a p < 0.05, x genes up and y genes down with a fold change of at least 3.0

Page 63: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Multiple Comparisons

• In a microarray experiment, each gene (each probe or probe set) is really a separate experiment

• Yet if you treat each gene as an independent comparison, you will always find some with significant differences– (the tails of a normal distribution)

Page 64: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Multiple Comparisons

• Microarrays have multiple comparison problem• p <= 0.05 says that 95% confidence means are

different; therefore 5% due to chance• 5% of 10000 is 500

– 500 genes are picked up by chance– Suppose t tests selects 1000 genes at a p of 0.05– 500/1000 ;Approximately 50% of the genes will be

false– Very high false discovery rate; need more confidence– How to correct? – Correction for multiple comparison– p value and a corrected p value

Page 65: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Corrections for multiple comparisons

• Involve corrections to the p value so that the actual p value is higher

• Bonferroni http://en.wikipedia.org/wiki/Bonferroni_correction

• Benjamin-Hochberg

• Significance Analysis of Microarrays– Tusher et al. at Stanford

Page 66: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Gene Expression Microarray experiments

obtain numerical summary for each gene or exon on each array

sampleclassification

clustering genesand samples

differential expression analysis

Page 67: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Gene Expression Microarray experiments

obtain numerical summary for each gene or exon on each array

sampleclassification

clustering genesand samples

t-tests, ANOVABayesian versions of aboveFourier analysis of time seriesFalse discovery and nondiscovery rates

differential expression analysis

robust methods to down weight outliersdata imputation (filling in missing data)

discriminant analysissupport vector machinessupervised learning

unsupervised learninghierarchical clusteringk-means clusteringheatmaps

Page 68: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

From Data to Knowledge

Gene

mRNA Samples

sample1 sample2 sample3 sample4 sample5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

Once data is of high quality and systematic, non-biological effects are removed, the result is a gene expression matrix

This is still just data, not knowledge.Use this data to answer a scientific question.

Page 69: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Supervised Analysis

Learning from examples, classification– We have already seen groups of healthy and

sick people. Now let’s diagnose the next person walking into the hospital.

– We know that these genes have function X (and these others don’t). Let’s find more genes with function X.

– We know many gene-pairs that are functionally related (and many more that are not). Let’s extend the number of known related gene pairs.

Known structure in the data needs to be generalized to new data.

Page 70: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Unsupervised analysis

= clustering– Are there groups of genes that behave

similarly in all conditions?– Disease X is very heterogeneous. Can we

identify more specific sub-classes for more targeted treatment?

No structure is known. We first need to find it. Exploratory analysis.

Page 71: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Supervised analysis

Calvin, I still don’t know the difference between cats and dogs …Oh, now I get it!!

Don’t worry!I’ll show you once more:

Class 1: cats Class 2: dogs

Page 72: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Unsupervised analysis

Calvin, I still don’t know the difference between cats and dogs …

I don’t know it either.

Let’s try to figure it out together …

Page 73: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Supervised analysis: setup

• Training set– Data: microarrays– Labels: for each one we know if it falls into our

class of interest or not (binary classification)

• New data (test data)– Data for which we don’t have labels. – These are genes without known function

• Goal: Generalization ability– Build a classifier from the training data that is

good at predicting the right class for the new data.

Page 74: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

One microarray, one dotExp

ress

ion

of g

en

e 2

Expression of gene 1

Think of a space with #genes dimensions (yes, it’s hard for more than 3).

Each microarray corresponds to a point in this space.

If gene expression is similar under some conditions, the points will be close to each other.

If gene expression overall is very different, the points will be far away.

Page 75: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

A heatmap

samples of different regions of the brain in humans and chimpanzees

sample clusters show that different regions of the brain cluster more closely than different species

gene clusters show that some genes differentiateamong brain regions while other differentiate the 2 species.

Page 76: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Heat Map: 2-D Cluster Analysis

Page 77: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Genome Wide Association Studies

• GWAS involves rapidly scanning markers across genome (≈0.5M or 1M) of many people (≈2K) to find genetic variations associated with a particular disease.

• A large number of subjects are needed because (1)associations between SNPs and causal variants are expected to show low odds ratios, typically below 1.5 (2)In order to obtain a reliable signal, given the very large number of tests that are required, associations must show a high level of significance to survive the multiple testing correction

• Such studies are particularly useful in finding genetic variations that contribute to common, complex diseases

Page 78: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Genome Wide Association Studies

• GWAS involves rapidly scanning markers across genome (≈0.5M or 1M) of many people (≈2K) to find genetic variations associated with a particular disease.

• A large number of subjects are needed because– associations between SNPs and causal variants are

expected to show low odds ratios, typically below 1.5

– (2)In order to obtain a reliable signal, given the very large number of tests that are required, associations must show a high level of significance to survive the multiple testing correction

• Such studies are particularly useful in finding genetic variations that contribute to common, complex diseases like Autism.

Page 79: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Look for Association with Diseases and SNPs

Many issues with data- includingPopulation diversities and need for T-test corrections and log transformationsbecause of many variables and fewer Samples (Bonferoni)

Page 80: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

A recurrent mutation in the BMP type I receptor ACVR1 causes inherited and sporadic fibrodysplasia ossificans progressivaEileen M. Shore, Meiqi Xu, George J. Feldman, David A. Fenstermacher, The FOP International Research Consortium, Matthew A. Brown, and Frederick S. Kaplan Nature Genetics 2006

Collect 13 individuals from five families with FOP ectopic bone formation

Genome-wide linkage analysis with 400 microsatellite markers

Higher resolution linkage analysis with Affymetrix 10K SNP mapping

chip (in Facility)

Candidate gene sequencing identifies a new SNP in BMP

receptor

Page 81: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Predictive Value of Gene Expression

• Lymphoma dataset– This dataset is the gene expression in the

three most prevalent adult lymphoid malignancies: B-CLL,FL and DLBCL.

– This study produced gene expression data for p=4,682 genes in n=81 mRNA samples.

29 × B-CLL 9 × FL43 × DLBCL

http://genome-www.stanford.edu/lymphoma

Page 82: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Correlation Matrix

Page 83: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Personal Genomics

• Many companies are marketing SNP chips as useful- promising future information as it becomes available.– deCODEme.com – Navigenics– 23andMe – Knome– http://thepersonalgenome.com/

• Who will tell these people what the data means?

• http://www.nytimes.com/2009/01/11/magazine/11Genome-t.html?pagewanted=all

Page 84: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

IN-DELS and CNVs

• Insertions, Deletions and Copy Number Variation

• Clone-based comparative genomic hybridization (Array CGH)– Test and reference DNA are differentially fluorescent

labeled and hybridized to the array.– cons: low resolution (Cannot find small CNV region)

• SNP genotyping array– pros: Higher resolution– Cons: poor signal-to-noise ratio of hybridization

Page 85: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Hidden Markov Model designed for high resolution CNV detection in

whole genome SNP genotyping data

Page 86: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

CNV Analysis of Array Data

• Log R ratio (LRR): total fluorescent intensity signals from both sets of probe/allele at each SNP

• B Allelle Frequence (BAF) : relative ratio of the intensity signals between two probes/allele at each SNP

• Accurate model for log R ratio and B Allele Frequency• + Population allele frequency + distance between

adjacent SNPs + family information

Page 87: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

CNV Data Analysis

Page 88: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Genome Wide Association Studies

• GWAS– http://grants.nih.gov/grants/gwas/– http://www.nature.com/scitable/topicpage/

Genetic-Variation-and-Disease-GWAS-682

• Personal Genomes– http://www.nytimes.com/2009/01/11/

magazine/11Genome-t.html?pagewanted=all

Page 89: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Functional Genomics

• Take a list of "interesting" genes and find their biological relationships

– Gene lists may come from significance/classfication analysis of microarrays, proteomics, or other high-throughput methods

• Requires a reference set of "biological knowledge"

Page 90: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Genome Ontology

• 3 hierarchical sets of terminology– Biological Process– Cellular Component (location within cell)– Molecular Function

• about 1000 categories of functions

Page 91: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 92: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 93: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Biological Pathways

Page 94: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Microarray Databases

• Large experiments may have hundreds of individual array hybridizations

• Core lab at an institution or multiple investigators using one machine - data archive and validate across experiments

• Data-mining - look for similar patterns of gene expression across different experiments

• Microarray experiments are complex and this shares data

Page 95: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Using Public Databases

• Gene Expression data is an essential aspect of annotating the genome

• Publication and data exchange for microarray experiments

• Data mining/Meta-studies

• Common data format - XML

• MIAME (Minimal Information About a Microarray Experiment)

Page 96: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Transcriptome:Gene Expression Technologies

• cDNA (EST) libraries

• SAGE

• Microarray

• rt-PCR

• RNA-seq

Page 97: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

The Cancer Genome Anatomy Project

• CGAP has collected a large amount of cDNA and related data online

• http://cgap.nci.nih.gov/

• cDNA libraries from various tissues– search for genes– compare expression levels

Page 98: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 99: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 100: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

SAGE

• Serial Analysis of Gene Expression is a technology that sequences very short fragments of mRNA (10 or 17 bp) that have been randomly ligated together

• The short ‘tags’ are assigned to genes and then relative counts for each gene are computed for cDNA libraries from various tissues

Page 101: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

SAGE Genie

• SAGE Anatomic Viewer

• SAGE Digital Gene Expression Displayer

• Digital Northern

• SAGE Experiment Viewer

Page 102: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 103: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 104: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

GEO Microarray database at NCBI

• Microarray experiments– Defined arrays– Published results– Also lots of inconclusive experiments– Tools to search for specific genes– Unreliable to search for tissue or disease in

experiment description text

Page 105: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 106: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 107: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 108: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 109: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Array Express at EMBL

Page 110: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments
Page 111: Microarrays IST 444. Microarrays What if no test tubes were needed to conduct an experiment? Hundreds, thousands or even millions of individual experiments

Antibodies on ArraysMiniature Western Blots