Postmortem Human Brain RNA-seqAnalyses with Public and

Preview:

Citation preview

BAPTA

ABSTRACT

EXPRESSION PROFILES IN GTEx DATA

RNA EXPRESSED REGION VARIABILITY

SUMMARY

ACKNOWLEDGEMENTS

RNA DISJOINT EXON VARIABILITY

Postmortem Human Brain RNA-seq Analyses with Public and Private Data Peterson, A1,2*, Collado-Torres L2, Jaffe AE2,3,4

1JHSPH, 2Lieber Institute for Brain Development, 3Biostatistics and 4Mental Health JHSPH,*apeter65@jhu.edu

RNA-sequencing (RNA-seq) is a high-throughput method forquantifying gene expression levels that is dependent on high-quality RNA. We used RNA-seq data to explore expression profilesin various brain tissues and to address the effect of confoundingcaused by RNA quality differences.First, we assessed RNA expression variability across brain regionsthrough analysis of Genotype-Tissue Expression (GTEx) projectdata. We computed the mean base-pair level coverage for all brainsamples in GTEx and for each of the brain sub-tissues in thedataset using data from the recount2 project (Collado-Torres et al.2017c, Ellis et al. 2017). We noted differences in the meanexpressed region widths in the overall brain compared to sub-tissues at smaller cutoff values. We then compared the widthdistribution of known exons between the overall brain and thevarious sub-tissues.

WewouldliketoacknowledgethemembersofAndrewJaffe(LieberInstituteforBrainDevelopment,JohnsHopkinsMedicalCampus)andMinaRyten (UCLNeuro)labsforfeedbackontheexplanatoryfigures.

ToexamineRNAexpressionvariabilityacrossbrainregions,wefirstlookedatexpressedregionsdefinedbyaglobalcutoff.Wecomputedmeanbase-pairlevelcoverageforallbrainsamplesandforeachofthe13brainsub-tissuesinGTEx usingdatafromrecount2.Readswerescaledby40millionreadsof100base-pairs.

REFERENCES

Recount2containsgene,exon,exon-exonjunction,andexpressedregiondata.ThedataisavailableasRangedSummarizedExperiment objectsthatcanbeaccessedbydownloadingandloadingtheminanRsession.

Wecomparedthewidthdistributionofannotatedexonsinrecount2usingGTEx datasamplesfortheoverallbrainand13subtissues.

RNA QUALITY CONFOUNDINGUsingGTEx data,weidentifiedmeanexpressedregionwidthinoverallbrainsamplesandsamplesfrom11sub-tissues.

Weexaminedtheeffectofoutliersontheobserveddifferencesinmeanexpressedregionwidthinsub-tissuescomparedtooverallbrainsamples.

Weassessedifthedifferencesinmeanexpressedregionspersistedwhencomparingthepercentofthegenomeexpressedinoverallbrainandsub-tissues.

Weexaminedtheeffectofoutliersonannotatedexonsinoverallbrainandsub-tissuesamples.

Weassessedthepercentofthegenomethatwereannotatedexonsinoverallbrainsamplesandtheindividualsub-tissues.

Toassesstheeffectsofconfounding,wewillextendthequalitysurrogatevariableanalysis(qSVA)algorithm(Jaffeetal.2017)andperformacross-regionanalysisofRNA-qualitytissueinacase-controlstudy,comparingdegradationoftissueinpatientswithschizophreniatocontrolsusingBrainSeq consortiumdata.Applyingthismethod,wewillidentifyandmeasuretranscriptfeaturesthataresensitivetotissuedegradationinadifferentialexpressiondegradationdataset,createfactorstocontrolforRNAqualityconfoundinginanindependentprivatedataset,BrainSeq,andassesstheperformanceofthemodifiedqSVAalgorithm.

WeexploredRNAexpressionvariabilityacrossbrainregionsthroughtheanalysisofGenotype-TissueExpression(GTEx)projectdata.Uponnoticingpatterndifferencescomparingoverallbrainsamplestosub-tissues,weusedannotatedexonexpressiondatainrecount2toexplorethesedifferencesfurther.Takentogether,theseresultssuggesttheneedtodetermineanoptimalcutoffthatisspecifictoeachtissue,toensureminimalinclusionofnoise,particularlyatlowercutoffs.WearenowlookingtoassesstheeffectofRNAqualityconfoundinginprivatedata,byperformingcross-regionbrainanalysesofRNA-qualitytissue,comparingschizophrenicbrainstocontrolbrainsinBrainSeq consortiumdata.Ourinitialanalysiswillincludetwobrainregionsthatarewell-characterizedasalteredinschizophrenicpatients,thedorsolateralprefrontalcortex(DLPFC)andthehippocampus.

*Collado-Torres,L.,Nellore,A.,Frazee,A.C.,Wilks,C.,Love,M.I.,Langmead,B.,Irizarry,R.A.,Leek,J.T.,Jaffe,A.E.2017a.FlexibleexpressedregionanalysisforRNA-seq withderfinder.NucleicAcidsResearch,45(2),e9.doi:10.1093/nar/gkw852*Collado-torres L,NelloreA,andJaffe,AE.2017b.recountworkflow:Accessingover70,000humanrNA-seq sampleswithBioconductor.F1000Research,6:1558.*Collado-Torres,L.,Nellore,A.,Kammers,K.,Ellis,S.E.,Taub,M.A.,Hansen,K.D.,Jaffe,A.E.,Langmead,B.andLeek,J.T.2017c.ReproducibleRNA-seqanalysisusingrecount2.NatureBiotechnology 35(4),pp.319–321.*Ellis,S.E.,ColladoTorres,L.andLeek,J.2017.ImprovingthevalueofpublicRNA-seqexpressiondatabyphenotypeprediction.BioRxiv.*Jaffe,A.E.,Tao,R.,Norris,A.L.,Kealhofer,M.,Nellore,A.,Shin,J.H.,Kim,D.,Jia,Y.,Hyde,T.M.,Kleinman,J.E.,Straub,R.E.,Leek,J.T.andWeinberger,D.R.2017.qSVAframeworkforRNAqualitycorrectionindifferentialexpressionanalysis.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica 114(27),pp.7130–7135.

(Collado-Torresetal.,2017a)

𝑌 = 𝛼 + 𝛽𝐷𝑥 + 𝛾𝑟𝑒𝑔𝑖𝑜𝑛 + 𝛿𝐷𝑥 ∗ 𝑟𝑒𝑔𝑖𝑜𝑛 + 𝜀𝑞𝑆𝑉𝐴

(Collado-Torresetal.,2017b)

(Jaffeetal.,2017)