22
Statistical Analysis of cDNA Microarray Data: Challenges and Solutions Toni Reverter CSIRO – Livestock Industries AAHL Seminar - 12 Dec. 2002

Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

  • Upload
    ayla

  • View
    30

  • Download
    3

Embed Size (px)

DESCRIPTION

Statistical Analysis of cDNA Microarray Data: Challenges and Solutions. Toni Reverter CSIRO – Livestock Industries. AAHL Seminar - 12 Dec. 2002. Logical. cDNA. Distribution. Quantitative Computer Sci. Statisticians Mathematicians ……. Non-Q Biochemists Physiologists Pathologists ……. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Statistical Analysis ofcDNA Microarray Data:Challenges and Solutions

Toni Reverter

CSIRO – Livestock Industries

AAHL Seminar - 12 Dec. 2002

Page 2: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Challenges

Time Dependent Data Dependent Human Dependent

Chronology Paradigm Skill Integration

Distribution

Source Size

Logical1800s – DATA

30-60s – METHODS

50-70s – SOFTWARE

1980s – COMPUTER

cDNA

QuantitativeComputer Sci.StatisticiansMathematicians …….

Non-QBiochemistsPhysiologistsPathologists …….

Historical Excitement Balance Interdisciplinary

AAHL Seminar - 12 Dec. 2002

EGG BANANA

“banana omelette”

Page 3: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Historical

•Traditionally: Statistics grew alongside Agriculture

“Introduction to Statistical Analysis”

•Nowadays: Statistics alongside (Bio)Technology

•Law of Large Numbers•Central Limit Theorem•Pythagoras Theorem

SST = SSM + SSE

d ab

Hysterical

AAHL Seminar - 12 Dec. 2002

Page 4: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Excitement (source of)

Eg. Always log spot intensities and ratios

T Speed. “Hints and Prejudices” •Biochemist: My software does it, therefore it’s great!•Statistician: Well, I need further evidence to be convinced

0)ln(

01

)(

x

xx

n

jj

n

jj xxx

nl

11

2)()( )ln()1()(1

ln2

1)(

Eg. Keren Byrne’s Data

AAHL Seminar - 12 Dec. 2002

Page 5: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Balance

•Too many Statisticians:

Evidence: It takes 1 ship, 10 days to cross the oceanQuestion: How many days does it take for 10 ships to cross the ocean?

Evidence: It takes 1 builder, 10 days to build a wallQuestion: How many days does it take for 10 builders to build a wall?

AAHL Seminar - 12 Dec. 2002

Page 6: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Balance

•Too many Statisticians:

PHD SCHOLARSHIPStatistical Science Program

MATHEMATICAL SCIENCES INSTITUTETHE AUSTRALIAN NATIONAL UNIVERSITY

Stipend $22,771 (2002 rate, indexed annually, tax free)

A PhD Scholarship (APAI) is being offered by the Mathematical Sciences Institute at The ANU. An ARC Linkage Grant held by Professors Peter Hall (ANU) and Don Poskitt (Monash University), in conjunction with BAE Systems, Melbourne, will fund the scholarship.

The research problem is in the area of stochastic control applied to ship motion, and involves the development and implementation of both parametric and nonparametric methods. The successful applicant will have a strong interest in statistical methodology, computational techniques, theoretical analysis, and the development of statistical research problems.

AAHL Seminar - 12 Dec. 2002

Page 7: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Balance

•Too many Biochemists:

Treated?No Yes

No

Yes

100

120

150

120

Die

d?

Survival Rates:

Treated = 150/270 = 55.55%

Non-Tr = 100/220 = 45.45%

Women?No

Yes

60

100

30

60

No Yes

Survival Rates:

Treated = 30/90 = 33.33%

Non-Tr = 60/160 = 37.50%12.5% Decrease!

Men?No

Yes

40

20

120

60

No Yes

Survival Rates:

Treated = 120/180 = 66.66%

Non-Tr = 40/60 = 66.66%No Difference!

AAHL Seminar - 12 Dec. 2002

22%Increase!

Page 8: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions
Page 9: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Balance

•Too many Biochemists:

*****

***

**

** *

**

*

*****

***

**

** *

**

*

r = 0.87

r = 0.00

r = 0.00

x

y

AAHL Seminar - 12 Dec. 2002

Page 10: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Interdisciplinary Skills

Minimal knowledge of the application discipline is needed

…..failing that, the Statisticians will win, ..…but with the wrong weapons.

1. Amount of Expression = Amount of Response2. Same cut-off point to judge all genes3. Over-emphasis in normalization (Thus, reject “Boutique Arrays”)4. Over-emphasis in variance stabilization

AAHL Seminar - 12 Dec. 2002

Page 11: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Human Dependent

Challenges

Interdisciplinary Skills

Ex.2: Ralf Moser’s Data

*****

***

**

** *

**

*

**

**

** *

*

*

*

*

% Lung Disease

Wt Gain, Kg

Ex.1: What’s a Steer?

Minimal knowledge of theapplication discipline is needed:“Animal Breeding & Genetics”

Options:1. % Gain vs. % Disease2. Medians instead of Means3. Regression coefficients*

AAHL Seminar - 12 Dec. 2002

Page 12: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Disease

Wt Gain, KgO

B

A

AB

O: Control (Untreated)A: Treatment AB: Treatment BAB: Both Treatments

Model: O = A = + B = + AB = + + +

)()(. ABAAB

AABA GLogRLog

G

RLogM

estimates

The ratio:

A - AB = -( + )

AAHL Seminar - 12 Dec. 2002

Page 13: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Error

M

M

M

M

M

M

M

M

M

M

M

M

BAB

ABB

AAB

ABA

AB

BA

OAB

ABO

OB

BO

OA

AO

101

101

110

110

011

011

111

111

010

010

001

001

.

.

.

.

.

.

.

.

.

.

.

.

EXM

MXXX TT 1)(ˆ

O

B

A

AB

AAHL Seminar - 12 Dec. 2002

Page 14: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

O

B

A

AB

O

B

A

AB

O

B

A

AB

Reference Loop All-Pairs

Variance of Estimated Effects(Relative to the All-Pairs)

Reference

1132

Loop

4/31

8/31

All-Pairs

1121

Main effect of AMain effect of BInteraction ABContrast A-B

AAHL Seminar - 12 Dec. 2002

Page 15: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Probability of both Female?

Case 1. No Information …………………………1/4

Case 2. The one on the left is female …………1/2

Case 3. One of them is female ………….………1/3

AAHL Seminar - 12 Dec. 2002

Page 16: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

EXM

EWwGgSsXM MXXX TT ̂)(

3 Equations

MW

MG

MS

MX

w

g

s

WWGWSWXW

WGGGSGXG

WSGSSSXS

WXGXSXXX

'

'

'

'

''''

''''

''''

''''

> 35,000 Equations !

AAHL Seminar - 12 Dec. 2002

Page 17: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Clever Programming Tailored to your needs

N=1

for filename in R16T0S1.gpr R16T0S2.gpr R16T24S1.gpr R16T24S2.gpr S32T0S1.gpr S32T0S2.gpr S32T24S1.gpr S32T24S2.gprdo

# Get valid readings, compute log ratios

awk 'NR>30 && $NF>=0 && $4!="no_spot" && \ substr($4,1,5)!="score" && substr($4,1,5)!="custo" && \ substr($4,1,6)!="spotre" && $9>$12 && $18>$21 \ {print $4, $9-$12, $18-$21, \ log($9-$12)/log(2.0), log($18-$21)/log(2.0)}' \ $filename | sort > junk1

awk '$2!=$3 {print $0, $4-$5, 0.5*($4+$5)}' junk1 > junk2

# get the median of log ratios

REC=`wc -l junk2 | awk '{print int($1/2)}'`MED=`sort -n +5 junk2 | awk -v rec=$REC 'NR==rec {print $6}'`echo "Median of file" $filename " = " $MED

# Global normalization: substract the median to each log ratio

awk -v median=$MED -v slide=$N \ '{print "Slide_"slide, int(slide/2+.5), $1, $6-median}' junk2 | \ sort +2 > dat.$N

N=`expr $N + 1`

done

cat dat.1 dat.2 dat.3 dat.4 dat.5 dat.6 dat.7 dat.8 > total.dat

AAHL Seminar - 12 Dec. 2002

Page 18: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Clever Programming Tailored to your needs

T24 - T0

-4

-2

0

2

4

-4 -2 0 2 4

Resistant

Dise

ase

Interaction Solutions

Your Needs: “Important values are…”1. Away from (0,0)2. In quadrants 1 and 4.

Generate a new variable:

+1.0*[(R24-R0)+(S0-S24)] if R0<R24 & S0>S24

+0.5*[(R24-R0)+(S24-S0)] if R0<R24 & S0<S24

-0.5*[(R0-R24)+(S0-S24)] if R0>R24 & S0>S24

-1.0*[(R0-R24)+(S24-S0)] if R0>R24 & S0<S24

…then apply model-based clustering.

AAHL Seminar - 12 Dec. 2002

Page 19: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Clever Programming Tailored to your needs

AAHL Seminar - 12 Dec. 2002

-4

-2

0

2

4

6

-4 -2 0 2 4 6

Su

sce

ptib

le

Resistant

Differential Expression T24-T0

Page 20: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Solutions

Clever Programming Tailored to your needs

Get to know/use all the available options

1. t-Statistics: StandardPenalised

2. Clustering: Location-Based (k-Means, …)Model-Based (Mixtures of Distributions)

3. ANOVA (Linear Models)

High

Medium

Low

Keren’s

Ralf’s

AAHL Seminar - 12 Dec. 2002

Page 21: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions

Conclusions

Statistical Analysis of cDNA Microarray Data:

GENERAL:1. Still in its infancy (…possibly even embryonic stage)2. Many decisions have a heuristic rather than a theoretical foundation3. No hope for a “One size fits all” software4. Safer to aim towards “Tailor to one’s needs”5. Integration of interdisciplinary skills is a must

LIVESTOCK SPECIES:1. Tailing humans (…at the moment)2. Strong background knowledge of genetics accumulated3. Journals will soon be inundated4. CLI has the opportunity to participate

AAHL Seminar - 12 Dec. 2002

Page 22: Statistical Analysis of cDNA Microarray Data: Challenges and Solutions