Batch effect correction: How do we compare against ComBat? - Yalchin Oytam

Batch effect correction: How do we compare against ComBat?

Yalchin Oytam* & Fariborz Sobhanmanesh

Synopsis

Batch Effects: •Uncorrected (or under-corrected) Detrimental reduction in power of test; distortion to multiplicity correction •Over-corrected False positives; distortion to multiplicity correction

Novel method, which: •Quantifies the probability of under/over correction •Enables to experimenter to choose confidence/risk (p-value) as constraint for batch removal

AIM: Benchmark the novel method against ComBat

Summary: •Discuss batch effects •Introduce performance criteria •Compare the two methods

Batch Effects?

•Definition

•Structured technical noise / distortion common to all replicates in a processing batch.

•And, vary markedly from batch to batch. • Pervasive and persistent under best practice.

•Not remediable by normalisation techniques. • Typically account for 20-45% of the power in the measurement data!

Impact of batch effects

Rep1 Rep2 Rep3 Rep4 Treat1 t11 + B1 t12 + B2 t13 + B3 t14 + B4

Treat2 t21 + B1 t22 + B2 t23 + B3 t24 + B4

Treat3 t31 + B1 t32 + B2 t33 + B3 t34 + B4

Treat4 t41 + B1 t42 + B2 t43 + B3 t44 + B4

Treat5 t51 + B1 t52 + B2 t53 + B3 t54 + B4

Treat6 t61 + B1 t62 + B2 t63 + B3 t64 + B4

Control c1 + B1 c2 + B2 c3 + B3 c4 + B4

•Differences between B1, B2, B3, and B4 inflate within-treatment variances, diminishing power of any between-treatment comparison test.

•Different genes are affected differently, distorting rank of p-values, and hence distorting multiplicity correction (FDR).

“What if treatments are not distributed across batches?”

Method: Principal Component Analysis

CSIRO Overcoming the challenges of multiplicity and batch effects

Method: Principal Component Analysis

CSIRO Overcoming the challenges of multiplicity and batch effects

A snapshot of batch correction software

Benchmarking – ComBat vs Our Method

• Two dimensions: Noise Rejection and Signal Preservation

•Noise Rejection: Guided PCA (third party quantification of batch noise in data). Reese et al. 2013

•Signal Preservation: data variance after batch correction/ raw data variance

•Ideal: Reject all batch noise, without removing any biological variance.

Benchmarking – Cell Data

gPCA p-value for batch effect presence in raw data = 0.008

Benchmarking – Animal Data


Benchmarking – Combat’s “Native” Dataset





Thank you

CAFHS/Genomics Yalchin Oytam Research Scientist Phone: +61 2 9490 5077 Email: [email protected]

Contact Us Phone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Acknowledgements Konsta Duesing Mike Buckley Bill Wilson Maxine McCall

Technology

Batch effect correction: How do we compare against ComBat? - Yalchin Oytam