BAPTA
ABSTRACT
EXPRESSION PROFILES IN GTEx DATA
RNA EXPRESSED REGION VARIABILITY
SUMMARY
ACKNOWLEDGEMENTS
RNA DISJOINT EXON VARIABILITY
Postmortem Human Brain RNA-seq Analyses with Public and Private Data Peterson, A1,2*, Collado-Torres L2, Jaffe AE2,3,4
1JHSPH, 2Lieber Institute for Brain Development, 3Biostatistics and 4Mental Health JHSPH,*[email protected]
RNA-sequencing (RNA-seq) is a high-throughput method forquantifying gene expression levels that is dependent on high-quality RNA. We used RNA-seq data to explore expression profilesin various brain tissues and to address the effect of confoundingcaused by RNA quality differences.First, we assessed RNA expression variability across brain regionsthrough analysis of Genotype-Tissue Expression (GTEx) projectdata. We computed the mean base-pair level coverage for all brainsamples in GTEx and for each of the brain sub-tissues in thedataset using data from the recount2 project (Collado-Torres et al.2017c, Ellis et al. 2017). We noted differences in the meanexpressed region widths in the overall brain compared to sub-tissues at smaller cutoff values. We then compared the widthdistribution of known exons between the overall brain and thevarious sub-tissues.
WewouldliketoacknowledgethemembersofAndrewJaffe(LieberInstituteforBrainDevelopment,JohnsHopkinsMedicalCampus)andMinaRyten (UCLNeuro)labsforfeedbackontheexplanatoryfigures.
ToexamineRNAexpressionvariabilityacrossbrainregions,wefirstlookedatexpressedregionsdefinedbyaglobalcutoff.Wecomputedmeanbase-pairlevelcoverageforallbrainsamplesandforeachofthe13brainsub-tissuesinGTEx usingdatafromrecount2.Readswerescaledby40millionreadsof100base-pairs.
REFERENCES
Recount2containsgene,exon,exon-exonjunction,andexpressedregiondata.ThedataisavailableasRangedSummarizedExperiment objectsthatcanbeaccessedbydownloadingandloadingtheminanRsession.
Wecomparedthewidthdistributionofannotatedexonsinrecount2usingGTEx datasamplesfortheoverallbrainand13subtissues.
RNA QUALITY CONFOUNDINGUsingGTEx data,weidentifiedmeanexpressedregionwidthinoverallbrainsamplesandsamplesfrom11sub-tissues.
Weexaminedtheeffectofoutliersontheobserveddifferencesinmeanexpressedregionwidthinsub-tissuescomparedtooverallbrainsamples.
Weassessedifthedifferencesinmeanexpressedregionspersistedwhencomparingthepercentofthegenomeexpressedinoverallbrainandsub-tissues.
Weexaminedtheeffectofoutliersonannotatedexonsinoverallbrainandsub-tissuesamples.
Weassessedthepercentofthegenomethatwereannotatedexonsinoverallbrainsamplesandtheindividualsub-tissues.
Toassesstheeffectsofconfounding,wewillextendthequalitysurrogatevariableanalysis(qSVA)algorithm(Jaffeetal.2017)andperformacross-regionanalysisofRNA-qualitytissueinacase-controlstudy,comparingdegradationoftissueinpatientswithschizophreniatocontrolsusingBrainSeq consortiumdata.Applyingthismethod,wewillidentifyandmeasuretranscriptfeaturesthataresensitivetotissuedegradationinadifferentialexpressiondegradationdataset,createfactorstocontrolforRNAqualityconfoundinginanindependentprivatedataset,BrainSeq,andassesstheperformanceofthemodifiedqSVAalgorithm.
WeexploredRNAexpressionvariabilityacrossbrainregionsthroughtheanalysisofGenotype-TissueExpression(GTEx)projectdata.Uponnoticingpatterndifferencescomparingoverallbrainsamplestosub-tissues,weusedannotatedexonexpressiondatainrecount2toexplorethesedifferencesfurther.Takentogether,theseresultssuggesttheneedtodetermineanoptimalcutoffthatisspecifictoeachtissue,toensureminimalinclusionofnoise,particularlyatlowercutoffs.WearenowlookingtoassesstheeffectofRNAqualityconfoundinginprivatedata,byperformingcross-regionbrainanalysesofRNA-qualitytissue,comparingschizophrenicbrainstocontrolbrainsinBrainSeq consortiumdata.Ourinitialanalysiswillincludetwobrainregionsthatarewell-characterizedasalteredinschizophrenicpatients,thedorsolateralprefrontalcortex(DLPFC)andthehippocampus.
*Collado-Torres,L.,Nellore,A.,Frazee,A.C.,Wilks,C.,Love,M.I.,Langmead,B.,Irizarry,R.A.,Leek,J.T.,Jaffe,A.E.2017a.FlexibleexpressedregionanalysisforRNA-seq withderfinder.NucleicAcidsResearch,45(2),e9.doi:10.1093/nar/gkw852*Collado-torres L,NelloreA,andJaffe,AE.2017b.recountworkflow:Accessingover70,000humanrNA-seq sampleswithBioconductor.F1000Research,6:1558.*Collado-Torres,L.,Nellore,A.,Kammers,K.,Ellis,S.E.,Taub,M.A.,Hansen,K.D.,Jaffe,A.E.,Langmead,B.andLeek,J.T.2017c.ReproducibleRNA-seqanalysisusingrecount2.NatureBiotechnology 35(4),pp.319โ321.*Ellis,S.E.,ColladoTorres,L.andLeek,J.2017.ImprovingthevalueofpublicRNA-seqexpressiondatabyphenotypeprediction.BioRxiv.*Jaffe,A.E.,Tao,R.,Norris,A.L.,Kealhofer,M.,Nellore,A.,Shin,J.H.,Kim,D.,Jia,Y.,Hyde,T.M.,Kleinman,J.E.,Straub,R.E.,Leek,J.T.andWeinberger,D.R.2017.qSVAframeworkforRNAqualitycorrectionindifferentialexpressionanalysis.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica 114(27),pp.7130โ7135.
(Collado-Torresetal.,2017a)
๐ = ๐ผ + ๐ฝ๐ท๐ฅ + ๐พ๐๐๐๐๐๐ + ๐ฟ๐ท๐ฅ โ ๐๐๐๐๐๐ + ๐๐๐๐๐ด
(Collado-Torresetal.,2017b)
(Jaffeetal.,2017)