School of SOCIAL AND COMMUNITY MEDICINE University of BRISTOL ARIES Methylation Pre-processing and...

Preview:

Citation preview

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

ARIESMethylation Pre-

processingand Clean up

Geoff Woodward

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Overview

Initial QC Normalisation Batch Correction Data MWAS (Methylome Wide Assoc. Study) Results

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC

Probe p-value confidence in detection

• background• -ve controls

overall QC indicator• High background• Low signal• Poor stringency

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: Control Probes

Mixture of dependent/independent Sample independent

• Staining (Biotin/DNP)• Hybridisation (synthetic target)• Extension (hairpin)

Sample dependent• Bisulfite conversion (HindIII site)• G/T mismatch (non-spec.)• Specificity & Non-polymorphic• Negative

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: LIMS

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

LIMS Control DashBoard

Real time Jscript/JSON Zoom & scroll All Illumina controls

probes +ve & -ve

Area Max Median Min

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Intial QC: MDS Start pre-processing

What’s affecting the data?• Failures• controls

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Initial QC: MDS Remove Controls/Failures Remove Sex Chromosomes

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Sample Confirmation

Genotyping 65 SNP probes Kmeans clustering

• Call genotype Cross reference with SNP data Calculate % match

• Fully automated in pipeline• Stored in LIMS

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation Why?

Cancer vs. Control – not req. More sensitive differences...

Quantile? Rank & scale according to ref dist. (av.)

Not appropriate: Type I & II assays differ

• Medians – opposite ends of β scale• SD (across reps.) smaller in Type I probes• Interrogate different subsets of the genome

– Type II > proportion in open-sea– Type I > proportion in gene promoters

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 1

Subset Within Array Normalisation (minfi) To address differences in dist:

• No. of CpGs in probe body indicates density/loc.• Dist. more similar in these groups

Approach• Reference quantiles:

– N random type I & II selected for each group– Split meth/unmeth channels

• Linear interpolation fit probes to ref. Doesn’t treat type I & II separately

• BUT does decrease difference

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 2 Touleimat & Tost

To address differences:• CpG region

– Shore / Shelf / Island / Open-sea

• Treat Type I & II separately Approach:

• reference quantiles– Type I used “anchors” for each region– More reliable / lower SD

• estimate target quantiles• Fit type II to target

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Method 3

Dasen (wateRmelon) Under review Separate QN of

• methylated Type I• unmethylated Type I• methylated Type II• unmethylated Type II intensities.

Both directions

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison wateRmelon metrics:

Imprinted DMRs• 237 probes within iDMRs• iDMR e=50% meth.• SE = SD / √ N

– SD of all 237 probes– N = number of samples

iDMRs

Raw 0.00431

Dasen 0.00241

Tost 0.00214

Swan 0.00428

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison

SNP probes• 63 highly polym. SNP probes• K-means clustering into 3 genotypes• SE like measure for each group

AA AB BB

Raw 9.025 e-05 1.910 e-04 5.145 e-05

Dasen 1.669 e-04 2.047 e-04 2.321 e-05

Tost 8.253 e-05 5.242 e-04 1.541 e-04

Swan Na Na na

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Normalisation: Comparison wateRmelon metrics:

X-Chromosome Inactivation• 11,232 probes• T-test all probes for sex differences• ROC analysis

– using p-val for sex diff.

• 1 – AUC – 0 being the perfect predictor & best sex separation

X-Inact.

Raw 0.0947

Dasen 0.0889

Tost 0.0892

Swan 0.4952

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Density Plots

Metrics are great but how do they really effect the data?

All typeI typeII

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Density Plots

Normalised distributions All typeI typeII

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Scatter Plot

Pepsi Plot – you’ll see why! Raw (x) vs. Normalised (y)

• typeI typeII

SWAN Tost dasen

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Comparison: Scatter Plot

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: Exp. Design

Bisulphite Conversion Excess of samples > 48 Redundant controls QC and PCR

MSA4 Plate Well dictates chip position (Robot) Randomised

• Min. 4 of each time point• Max 1 control• Mix of gender

Infinium 450k Chips 12 arrays per chip

Throughput doubled

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: Metadata

LIMS tracking Every process All consumables

• ~20• Formamide to hyb. Buffers• > 1000 used so far!

All equipment• Fridge/centrifuge/PCR block

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction What are we seeing?

Bisulphite batch Correction

Many algorithms available• SVD/SVA/DWD

Gene expression

ComBat Chen C, Grennan K, Badner J, Zhang D, Gershon E, et al. (2011) Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods. PLoS ONE 6(2): e17238. doi:10.1371/journal.pone.0017238

Empirical Bayesian framework• Create a model matrix• Supply batch var• Standardise gene-wise

– Least squares approach

• Fits L/S model – find priors• Adjust to empirical parametric priors

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction Example data

Batch correct Tost norm. data use M values Convert back to β Values can escape 0-1 limit

• Scale• 0.02% of probes• Dist. unaffected.

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: BEFORE

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Batch Correction: AFTER

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Datasets ARIES pre-release:

Filtered probes SNP probes

Age group n

Cord 584

F7 598

TF3 (15) 64

F17 280

Antenatal 394

FOM 329

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS

Choice of servers: Epi-garrod BlueCrystal

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Epi-garrod

Request account via IT-services for: epi-garrod.bris.ac.uk

Relatively quiet server in the dept. No queuing system

Check htop before running jobs Cord data requires ~15% RAM

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Epi-garrod

Data: SAN

• Accessible from multiple servers /mnt/sscm3/ARIES_DATA/…

Permissions for this folder You must be a member of the aries group

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Blue Crystal

Request an account via: https://www.acrc.bris.ac.uk/login-area/apply.cgi

Queuing handled Data:

/gpfs/cluster/smed/alspac-shared/aries/… Again, permissions required:

Member of aries group

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Files

ALN_dasen_<<time_code>>_betas.Rdata ALN_tost_<<time_code>>_betas.Rdata <<time_code>>_manifest.Rdata fdata.Rdata MWAS.r

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

ALN_dasen_<<time_code>>_betas.Rdata

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

<<time_code>>_manifest.Rdata

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Fdata_new.RData

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

CpGassoc

CRAN http://cran.r-project.org/web/packages/CpGassoc/index.html

Tests for association between an independent variable and methylation

Option to include additional covariates Assesses significance with:

Holm (step-down Bonferroni) FDR methods

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r continued...

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

MWAS.r continued...

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Manhattan / QQ

Replicated the following studies results: 450K Epigenome-Wide Scan Identifies Differential DNA Methylation  in Newborns Related to

Maternal Smoking during Pregnancy.Bonnie R. Joubert, et.al., 

Gene hits: GFI1, AHRR, MYO1G, CYP1A1 "CYP1A1 plays a key role in the aryl hydrocarbon receptor

signaling pathway, which mediates the detoxification of the components of tobacco smoke." - Joubert, et.al.,

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Results file

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

BlueCrystal .bashrc

School ofSOCIAL AND COMMUNITY

MEDICINE

University ofBRISTOL

Any Questions?