Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
BAPTA
ABSTRACT
EXPRESSION PROFILES IN GTEx DATA
RNA EXPRESSED REGION VARIABILITY
SUMMARY
ACKNOWLEDGEMENTS
RNA DISJOINT EXON VARIABILITY
Postmortem Human Brain RNA-seq Analyses with Public and Private Data Peterson, A1,2*, Collado-Torres L2, Jaffe AE2,3,4
1JHSPH, 2Lieber Institute for Brain Development, 3Biostatistics and 4Mental Health JHSPH,*[email protected]
RNA-sequencing (RNA-seq) is a high-throughput method forquantifying gene expression levels that is dependent on high-quality RNA. We used RNA-seq data to explore expression profilesin various brain tissues and to address the effect of confoundingcaused by RNA quality differences.First, we assessed RNA expression variability across brain regionsthrough analysis of Genotype-Tissue Expression (GTEx) projectdata. We computed the mean base-pair level coverage for all brainsamples in GTEx and for each of the brain sub-tissues in thedataset using data from the recount2 project (Collado-Torres et al.2017c, Ellis et al. 2017). We noted differences in the meanexpressed region widths in the overall brain compared to sub-tissues at smaller cutoff values. We then compared the widthdistribution of known exons between the overall brain and thevarious sub-tissues.
WewouldliketoacknowledgethemembersofAndrewJaffe(LieberInstituteforBrainDevelopment,JohnsHopkinsMedicalCampus)andMinaRyten (UCLNeuro)labsforfeedbackontheexplanatoryfigures.
ToexamineRNAexpressionvariabilityacrossbrainregions,wefirstlookedatexpressedregionsdefinedbyaglobalcutoff.Wecomputedmeanbase-pairlevelcoverageforallbrainsamplesandforeachofthe13brainsub-tissuesinGTEx usingdatafromrecount2.Readswerescaledby40millionreadsof100base-pairs.
REFERENCES
Recount2containsgene,exon,exon-exonjunction,andexpressedregiondata.ThedataisavailableasRangedSummarizedExperiment objectsthatcanbeaccessedbydownloadingandloadingtheminanRsession.
Wecomparedthewidthdistributionofannotatedexonsinrecount2usingGTEx datasamplesfortheoverallbrainand13subtissues.
RNA QUALITY CONFOUNDINGUsingGTEx data,weidentifiedmeanexpressedregionwidthinoverallbrainsamplesandsamplesfrom11sub-tissues.
Weexaminedtheeffectofoutliersontheobserveddifferencesinmeanexpressedregionwidthinsub-tissuescomparedtooverallbrainsamples.
Weassessedifthedifferencesinmeanexpressedregionspersistedwhencomparingthepercentofthegenomeexpressedinoverallbrainandsub-tissues.
Weexaminedtheeffectofoutliersonannotatedexonsinoverallbrainandsub-tissuesamples.
Weassessedthepercentofthegenomethatwereannotatedexonsinoverallbrainsamplesandtheindividualsub-tissues.
Toassesstheeffectsofconfounding,wewillextendthequalitysurrogatevariableanalysis(qSVA)algorithm(Jaffeetal.2017)andperformacross-regionanalysisofRNA-qualitytissueinacase-controlstudy,comparingdegradationoftissueinpatientswithschizophreniatocontrolsusingBrainSeq consortiumdata.Applyingthismethod,wewillidentifyandmeasuretranscriptfeaturesthataresensitivetotissuedegradationinadifferentialexpressiondegradationdataset,createfactorstocontrolforRNAqualityconfoundinginanindependentprivatedataset,BrainSeq,andassesstheperformanceofthemodifiedqSVAalgorithm.
WeexploredRNAexpressionvariabilityacrossbrainregionsthroughtheanalysisofGenotype-TissueExpression(GTEx)projectdata.Uponnoticingpatterndifferencescomparingoverallbrainsamplestosub-tissues,weusedannotatedexonexpressiondatainrecount2toexplorethesedifferencesfurther.Takentogether,theseresultssuggesttheneedtodetermineanoptimalcutoffthatisspecifictoeachtissue,toensureminimalinclusionofnoise,particularlyatlowercutoffs.WearenowlookingtoassesstheeffectofRNAqualityconfoundinginprivatedata,byperformingcross-regionbrainanalysesofRNA-qualitytissue,comparingschizophrenicbrainstocontrolbrainsinBrainSeq consortiumdata.Ourinitialanalysiswillincludetwobrainregionsthatarewell-characterizedasalteredinschizophrenicpatients,thedorsolateralprefrontalcortex(DLPFC)andthehippocampus.
*Collado-Torres,L.,Nellore,A.,Frazee,A.C.,Wilks,C.,Love,M.I.,Langmead,B.,Irizarry,R.A.,Leek,J.T.,Jaffe,A.E.2017a.FlexibleexpressedregionanalysisforRNA-seq withderfinder.NucleicAcidsResearch,45(2),e9.doi:10.1093/nar/gkw852*Collado-torres L,NelloreA,andJaffe,AE.2017b.recountworkflow:Accessingover70,000humanrNA-seq sampleswithBioconductor.F1000Research,6:1558.*Collado-Torres,L.,Nellore,A.,Kammers,K.,Ellis,S.E.,Taub,M.A.,Hansen,K.D.,Jaffe,A.E.,Langmead,B.andLeek,J.T.2017c.ReproducibleRNA-seqanalysisusingrecount2.NatureBiotechnology 35(4),pp.319–321.*Ellis,S.E.,ColladoTorres,L.andLeek,J.2017.ImprovingthevalueofpublicRNA-seqexpressiondatabyphenotypeprediction.BioRxiv.*Jaffe,A.E.,Tao,R.,Norris,A.L.,Kealhofer,M.,Nellore,A.,Shin,J.H.,Kim,D.,Jia,Y.,Hyde,T.M.,Kleinman,J.E.,Straub,R.E.,Leek,J.T.andWeinberger,D.R.2017.qSVAframeworkforRNAqualitycorrectionindifferentialexpressionanalysis.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica 114(27),pp.7130–7135.
(Collado-Torresetal.,2017a)
𝑌 = 𝛼 + 𝛽𝐷𝑥 + 𝛾𝑟𝑒𝑔𝑖𝑜𝑛 + 𝛿𝐷𝑥 ∗ 𝑟𝑒𝑔𝑖𝑜𝑛 + 𝜀𝑞𝑆𝑉𝐴
(Collado-Torresetal.,2017b)
(Jaffeetal.,2017)