35
Analysis of High- throughput Gene Expression Profiling

Analysis of High-throughput Gene Expression Profiling

  • Upload
    jubal

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Analysis of High-throughput Gene Expression Profiling. Why to Measure Gene Expression. 1. Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. 2. Sets of genes whose expression rises and falls - PowerPoint PPT Presentation

Citation preview

Page 1: Analysis of High-throughput Gene Expression Profiling

Analysis of High-throughput Gene Expression Profiling

Page 2: Analysis of High-throughput Gene Expression Profiling

Why to Measure Gene Expression

1. Determines which genes are induced/repressed inresponse to a developmental phase or to anenvironmental change.2. Sets of genes whose expression rises and fallsunder the same condition are likely to have arelated function.3. Features such as a common regulatory motif can bedetected within co-expressed genes.4. A pattern of gene expression may be used as anindicator of abnormal cellular regulation.

• A useful tool for cancer diagnosis

Page 3: Analysis of High-throughput Gene Expression Profiling

Why to Measure Gene Expression in Large Scale?

Transitional vs. High-throughput Approaches

Page 4: Analysis of High-throughput Gene Expression Profiling

Techniques Used to Detect Gene Expression Level

• Microarray (single or dual channel)Microarray (single or dual channel)• SAGESAGE• EST/cDNA libraryEST/cDNA library• Northern Blots• Subtractive hybridisation• Differential hybridisation• Representational difference analysis (RDA)• DNA/RNA Fingerprinting (RAP-PCR)• Differential Display (DD-PCR)• aCGH: array CGH (DNA level)

High-throughput High-throughput

Page 5: Analysis of High-throughput Gene Expression Profiling

Basic Information of Microarray, SAGE and cDNA Library

Page 6: Analysis of High-throughput Gene Expression Profiling

(DNA) Microarray1. Developed around 1987.2. Employ methods previously exploited in immunoassay co

ntext – specific binding and marking techniques.3. Two types of probes:

Format I:Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II:Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ(on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are anufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips.

Page 7: Analysis of High-throughput Gene Expression Profiling

Microarray

• Single Channel: sub-type classification • Dual Channel: differential expression ge

ne screening

• Tissue microarray• Protein microarray• ……

Page 8: Analysis of High-throughput Gene Expression Profiling

Array CGH

• Detecting DNA copy variation via microarray approach

• A hotspot in recent research works, especially in Cancer research

Page 9: Analysis of High-throughput Gene Expression Profiling

Microarray Analysis

gene discovery pattern discoveryinferences about biological processesclassification of biological processes

Which genes are up-regulated, down-regulated, co-regulated, not-regulated?

Page 10: Analysis of High-throughput Gene Expression Profiling

SAGE

• Experimental technique assigned to gain a quantitive measure of gene expression.

• ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site).

• The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene.

Page 11: Analysis of High-throughput Gene Expression Profiling

SAGE

Tags are isolated and concatermized.

Relative expression levels can be compared between cells in different states.

Page 12: Analysis of High-throughput Gene Expression Profiling

SAGEmap (http://cgap.nci.nih.gov)

Page 13: Analysis of High-throughput Gene Expression Profiling

SAGE: comparing two relational libraries

Page 14: Analysis of High-throughput Gene Expression Profiling

EST library (UniGene)

Page 15: Analysis of High-throughput Gene Expression Profiling

Gene expression info from Unigene Library

Page 16: Analysis of High-throughput Gene Expression Profiling

An Example of In-house EST Library Analysis

Page 17: Analysis of High-throughput Gene Expression Profiling

The Algorithms and Challenges of High-throughput Gene Expression Analysis

Page 18: Analysis of High-throughput Gene Expression Profiling

Seeing is believing?

No, need to correct errors.

Page 19: Analysis of High-throughput Gene Expression Profiling

SAGE:

• A typical experiment requires ~30,000 gene expression comparisons where normal and a diseased cell is compared.

• The results were subject to the size and reliabilities of the SAGE libraries.

• Statistical measures are used to filter out candidate genes to reduce the dimensionality of the data but it is tedious and time consuming to play with these measures until a good set is found.

Page 20: Analysis of High-throughput Gene Expression Profiling

SAGE

• TPM: a simple normalization methodTPM=Count*1000,000/TotalCount

• Bayesian approach http://cancerres.aacrjournals.org/cgi/content/full/59/21/5403

Page 21: Analysis of High-throughput Gene Expression Profiling

Microarray: Sources of errors

• systematic• random

log

sig

nal i

nten

sity

log RNA abundance

Page 22: Analysis of High-throughput Gene Expression Profiling

Sources of Errors (Cont.)

• Printing and/or tip problems• Labeling and dye effects (differing amounts of

RNA labeled between the 2 channels)• Differences in the power of the two lasers (or

other scanner problems) • Difference in DNA concentration on arrays (pl

ate effects)• Spatial biases in ratios across the surface of t

he microarray due to uneven hybridization• cDNA array cannot distinguish alternatively

spliced forms

Page 23: Analysis of High-throughput Gene Expression Profiling

Errors that cannot be corrected by statistics

• Competitive hybridization of different targets on the chip

• Failure to distinguish different splicing forms

• Misinterpretation of time course data when there are not sufficient points

• Misinterpretation of relative intensity

Page 24: Analysis of High-throughput Gene Expression Profiling

Does clustered time course really mean co-expression?

Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html

Yes, you can studyknown system (such as cell cycle) this way; but, how about the unknown systems?

Page 25: Analysis of High-throughput Gene Expression Profiling

Normalization by iterative linear regression

fit a line (y=mx+b) to the data set set aside outliers (residuals > 2 x s.e.)

repeat until r2 changes by

< 0.001

then apply slope and intercept to the original dataset

D Finkelstein et al. http://www.camda.duke.edu/CAMDA00/abstracts.asp

Page 26: Analysis of High-throughput Gene Expression Profiling

average signal {log2 (Cy3 + Cy5)/2}

ratio

{lo

g 2 (C

y5 /

Cy3

)} Loess function fit line

0

Normalization (Curvilinear)G Tseng et al., NAR 2001

Page 27: Analysis of High-throughput Gene Expression Profiling

After Normalization ……

• Differentially Expressed (DE) Gene screeing– T-test– T-statistics– SVM

• Clustering– Hierarchical– SOM– K-means

• Network (Pathway) analysis– BioCarta, KEGG, GO databases– Bayesian network learning– Topology – …

Page 28: Analysis of High-throughput Gene Expression Profiling

Bioinformatics challenges

1. data management2. utilizing data from multiple experiments3. utilizing data from multiple groups

* with different technologies* with only processed data

available

Page 29: Analysis of High-throughput Gene Expression Profiling

Bioinformatics Analysis of Integrated Analysis of Gene Expression Profiling

Page 30: Analysis of High-throughput Gene Expression Profiling

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression

Daniel R. et al. PNAS, 2004(101), 9309-9314 T-test Q values (estimated false discovery rates) were calculated as

where P is P value, n is the total number of genes, and i is the sorted rank of P value.

Page 31: Analysis of High-throughput Gene Expression Profiling

Cont. Meta-Profiling.

The purpose of meta-profiling is to address the hypothesis that a selected set of differential expression signatures shares a significant intersection of genes (a meta-signature), thus inferring a biological relatedness.

Page 32: Analysis of High-throughput Gene Expression Profiling

67 genes were screened by mata-analysis

Page 33: Analysis of High-throughput Gene Expression Profiling

Integrated Cancer Gene Expression Map

Page 34: Analysis of High-throughput Gene Expression Profiling

7 genes were discovered by the system

Page 35: Analysis of High-throughput Gene Expression Profiling

THANX!!