7
David S. DeLuca, 1 Ayellet V. Segrè, 1 Timothy J. Sullivan, 1 Taylor R. Young, 1 Ellen T. Gelfand, 1 Casandra A. Trowbridge, 1 Julian B. Maller, 1,2 Taru Tukiainen, 1,2 Monkol Lek, 1,2 Lucas D. Ward, 1,3 Pouya Kheradpour, 1,3 Benjamin Iriarte, 3 Yan Meng, 1 Cameron D. Palmer, 1,4 Tõnu Esko, 1,4,5 Wendy Winckler, 1 Joel N. Hirschhorn, 1,4 Manolis Kellis, 1,3 Daniel G. MacArthur, 1,2 Gad Getz 1,6 ; UNC/NCSU: Andrey A. Shabalin, 7 Gen Li, 8 Yi-Hui Zhou, 9 Andrew B. Nobel, 8 Ivan Rusyn, 10,11 Fred A. Wright 9 ; Univ. of Geneva: Tuuli Lappalainen, 12,13,14,15,16,17 Pedro G. Ferreira, 12,13,14 Halit Ongen, 12,13,14 Manuel A. Rivas, 18 Alexis Battle, 19,20 Sara Mostafavi, 19 Jean Monlong, 21,22,23 Michael Sammeth, 21,22,24 Marta Melé, 21,22,25 Ferran Reverter, 21,26 Jakob M. Goldmann, 21,27 Daphne Koller, 19 Roderic Guigó, 21,22,28 Mark I. McCarthy, 18,29,30 Emmanouil T. Dermitzakis 12,13,14 ; Univ. of Chicago: Eric R. Gamazon, 31,32 Hae Kyung Im, 31 Anuar Konkashbaev, 31,32 Dan L. Nicolae, 31 Nancy J. Cox, 31,32 Timothée Flutre, 33,34 Xiaoquan Wen, 35 Matthew Stephens, 33,36 Jonathan K. Pritchard 33,37,38 ; Harvard: Zhidong Tu, 39,40 Bin Zhang, 39,40 Tao Huang, 39,40 Quan Long, 39,40 Luan Lin, 39,40 Jialiang Yang, 39,40 Jun Zhu, 39,40 Jun Liu 41 Biospecimen and data collection, processing, quality control, storage, and pathological review caHUB Biospecimen Source Sites, National Disease Research Interchange (NDRI): Amanda Brown, 42 Bernadette Mestichelli, 42 Denee Tidwell, 42 Edmund Lo, 42 Michael Salvatore, 42 Saboor Shad, 42 Jeffrey A. Thomas, 42 John T. Lonsdale 42 ; Roswell Park: Michael T. Moser, 43 Bryan M. Gillard, 43 Ellen Karasik, 43 Kimberly Ramsey, 43 Christopher Choi, 43 Barbara A. Foster 43 ; Science Care Inc.: John Syron, 44 Johnell Fleming, 44 Harold Magazine, 44 ; Gift of Life Donor Program: Rick Hasz 45 ; LifeNet Health: Gary D. Walters 46 ; UNYTS: Jason P. Bridge, 47 Mark Miklos, 47 Susan Sullivan 47 caHUB ELSI study VCU: Laura K. Barker, 48 Heather M. Traino, 48,49 Maghboeba Mosavel, 48 Laura A. Siminoff 48,49 caHUB comprehensive biospecimen resource Van Andel Research Institute: Dana R. Valley, 50 Daniel C. Rohrer, 50 Scott D. Jewell 50 caHUB pathology resource center NCI: Philip A. Branton 51 ; Leidos Biomedical Research Inc.: Leslie H. Sobin, 52 Mary Barcus 52 caHUB comprehensive data resource Leidos Biomedical Research Inc.: Liqun Qi, 52 Jeffrey McLean, 52 Pushpa Hariharan, 52 Ki Sung Um, 52 Shenpei Wu, 52 David Tabor, 52 Charles Shive 52 caHUB operations management Leidos Biomedical Research Inc.: Anna M. Smith, 52 Stephen A. Buia, 52 Anita H. Undale, 52 Karna L. Robinson, 52 Nancy Roche, 52 Kimberly M. Valentino, 52 Angela Britton, 52 Robin Burges, 52 Debra Bradbury, 52 Kenneth W. Hambright, 52 John Seleski, 53 Greg E. Korzeniewski 52 ; Sapient Government Services: Kenyon Erickson 54 Brain bank operations University of Miami: Yvonne Marcus, 55 Jorge Tejada, 55 Mehran Taherian, 55 Chunrong Lu, 55 Margaret Basile, 55 Deborah C. Mash 55 Program management NHGRI: Simona Volpi, 56 Jeffery P. Struewing, 56 Gary F. Temple, 56 Joy Boyer, 57 Deborah Colantuoni 56 ; NIMH: Roger Little, 58 Susan Koester 59 ; NCI: Latarsha J. Carithers, 51 Helen M. Moore, 51 Ping Guan, 51 Carolyn Compton, 51 Sherilyn J. Sawyer, 51 Joanne P. Demchok, 60 Jimmie B. Vaught, 51 Chana A. Rabiner, 51 Nicole C. Lockhart 51,57 Writing committee Kristin G. Ardlie, 1 Gad Getz, 1,6 Fred A. Wright, 9 Manolis Kellis, 1,3 Simona Volpi, 56 Emmanouil T. Dermitzakis 12,13,14 1 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. 3 MIT Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 4 Center for Basic and Translational Obesity Research and Division of Endocrinology, Boston Childrens Hospital, Boston, MA 02115, USA. 5 Estonian Genome Center, University of Tartu, Tartu, Estonia. 6 Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA. 7 Center for Biomarker Research and Personalized Medicine, Virginia Commonwealth University, Richmond, VA 23298, USA. 8 Department of Statistics and Operations Research and Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA. 9 Bioinformatics Research Center and Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA. 10 Department of Environmental Sciences and Engineering, University of North Carolina, Chapel Hill, NC 27599. 11 Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA. 12 Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland. 13 Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, 1211 Geneva, Switzerland. 14 Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland. 15 Department of Genetics, Stanford University, Stanford, CA 94305, USA. 16 New York Genome Center, New York, NY 10011, USA. 17 Department of Systems Biology, Columbia University Medical Center, New York, NY 10032, USA. 18 Wellcome Trust Centre for Human Genetics Research, Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 7BN, UK. 19 Department of Computer Science, Stanford University, Stanford, CA 94305, USA. 20 Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA. 21 Centre for Genomic Regulation (CRG), 08003 Barcelona, Catalonia, Spain. 22 Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain. 23 Human Genetics Department, McGill University, Montréal, Quebec H3A 0G1, Canada. 24 National Institute for Scientific Computing, Petropolis 25651-075, Rio de Janeiro, Brazil. 25 Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA. 26 Universitat de Barcelona, 08028 Barcelona, Spain. 27 Radboud University Nijmegen, Netherlands. 28 Institut Hospital del Mar dInvestigacions Mèdiques (IMIM), 08003 Barcelona, Spain. 29 Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Oxford OX3 7LJ, UK. 30 Oxford NIHR Biomedical Research Centre, Churchill Hospital, Oxford OX3 7LJ, UK. 31 Section of Genetic Medicine, Department of Medicine and Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. 32 Division of Genetic Medicine, Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA. 33 Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. 34 INRA, Department of Plant Biology and Breeding, UMR 1334, AGAP, Montpellier, 34060, France. 35 Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. 36 Department of Statistics, University of Chicago, Chicago, IL 60637, USA. 37 Department of Genetics and Biology, Stanford University, Stanford, CA 94305, USA. 38 Howard Hughes Medical Institute, Chicago, IL, USA. 39 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 40 Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 41 Department of Statistics, Harvard University, Cambridge, MA 02138. 42 National Disease Research Interchange, Philadelphia, PA 19103, USA. 43 Roswell Park Cancer Institute, Buffalo, NY 14263, USA. 44 Science Care Inc., Phoenix, AZ, USA. 45 Gift of Life Donor Program, Philadelphia, PA 19103, USA. 46 LifeNet Health, Virginia Beach, VA 23453, USA. 47 UNYTS, Buffalo, NY 14203, USA. 48 Virginia Commonwealth University, Richmond, VA 23298, USA. 49 Department of Public Health, Temple University, Philadelphia, PA 19122, USA. 50 Van Andel Research Institute, Grand Rapids, MI 49503, USA. 51 Biorepositories and Biospecimen Research Branch, National Cancer Institute, Bethesda, MD 20892, USA. 52 Biospecimen Research Group, Clinical Research Directorate, Leidos Biomedical Research, Inc., Rockville, MD 20852, USA. 53 iDoxSolutions Inc., Bethesda, MD 20814, USA. 54 Sapient Government Services, Arlington, VA 22201, USA. 55 Brain Endowment Bank, Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA. 56 Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD 20892, USA. 57 Division of Genomics and Society, National Human Genome Research Institute, Bethesda, MD 20892, USA. 58 Office of Science Policy, Planning, and Communications, National Institute of Mental Health, Bethesda, MD 20892, USA. 59 Division of Neuroscience and Basic Behavioral Science, National Institute of Mental Health, Bethesda, MD 20892, USA. 60 Cancer Diagnosis Program, National Cancer Institute, Bethesda, MD 20892, USA. SUPPLEMENTARY MATERIALS www.sciencemag.org/content/348/6235/648/suppl/DC1 Materials and Methods Box S1 Figs. S1 to S34 Tables S1 to S15 References (5086) 3 October 2014; accepted 3 April 2015 10.1126/science.1262110 REPORTS HUMAN GENOMICS The human transcriptome across tissues and individuals Marta Melé, 1,2 * Pedro G. Ferreira, 1,3,4,5 * Ferran Reverter, 1,6,7 * David S. DeLuca, 8 Jean Monlong, 1,7,9 Michael Sammeth, 1,7,10 Taylor R. Young, 8 Jakob M Goldmann, 1,7,11 Dmitri D. Pervouchine, 1,7,12 Timothy J. Sullivan, 8 Rory Johnson, 1,7 Ayellet V. Segrè, 8 Sarah Djebali, 1,7 Anastasia Niarchou, 3,4,5 The GTEx Consortium, Fred A. Wright, 13 Tuuli Lappalainen, 3,4,5,14,15 Miquel Calvo, 6 Gad Getz, 8,16 Emmanouil T. Dermitzakis, 3,4,5 Kristin G. Ardlie, 8 Roderic Guigó 1,7,17,18 Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes.We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of geneswhich is most clearly seen in bloodthough few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes. G ene expression is the key determinant of cellular phenotype, and genome-wide ex- pression analysis has been a mainstay of genomics and biomedical research, pro- viding insights into the molecular events underlying human biology and disease. Whereas expression data sets from tissues/primary cells (1, 2) and individuals (3) have accumulated over recent years, only limited expression data sets have allowed analysis across tissues and individuals 660 8 MAY 2015 VOL 348 ISSUE 6235 sciencemag.org SCIENCE RESEARCH on June 24, 2020 http://science.sciencemag.org/ Downloaded from

The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

David S. DeLuca,1 Ayellet V. Segrè,1 Timothy J. Sullivan,1

Taylor R. Young,1 Ellen T. Gelfand,1 Casandra A. Trowbridge,1

Julian B. Maller,1,2 Taru Tukiainen,1,2 Monkol Lek,1,2 Lucas D. Ward,1,3

Pouya Kheradpour,1,3 Benjamin Iriarte,3 Yan Meng,1

Cameron D. Palmer,1,4 Tõnu Esko,1,4,5 Wendy Winckler,1

Joel N. Hirschhorn,1,4 Manolis Kellis,1,3 Daniel G. MacArthur,1,2

Gad Getz1,6; UNC/NCSU: Andrey A. Shabalin,7 Gen Li,8

Yi-Hui Zhou,9 Andrew B. Nobel,8 Ivan Rusyn,10,11 Fred A. Wright9;Univ. of Geneva: Tuuli Lappalainen,12,13,14,15,16,17 Pedro G. Ferreira,12,13,14

Halit Ongen,12,13,14 Manuel A. Rivas,18 Alexis Battle,19,20

Sara Mostafavi,19 Jean Monlong,21,22,23 Michael Sammeth,21,22,24

Marta Melé,21,22,25 Ferran Reverter,21,26 Jakob M. Goldmann,21,27

Daphne Koller,19 Roderic Guigó,21,22,28 Mark I. McCarthy,18,29,30

Emmanouil T. Dermitzakis12,13,14; Univ. of Chicago: Eric R. Gamazon,31,32

Hae Kyung Im,31 Anuar Konkashbaev,31,32 Dan L. Nicolae,31

Nancy J. Cox,31,32 Timothée Flutre,33,34 Xiaoquan Wen,35

Matthew Stephens,33,36 Jonathan K. Pritchard33,37,38;Harvard: Zhidong Tu,39,40 Bin Zhang,39,40 Tao Huang,39,40 QuanLong,39,40 Luan Lin,39,40 Jialiang Yang,39,40 Jun Zhu,39,40 Jun Liu41

Biospecimen and data collection, processing, quality control,storage, and pathological review caHUB Biospecimen Source Sites,National Disease Research Interchange (NDRI): Amanda Brown,42

Bernadette Mestichelli,42 Denee Tidwell,42 Edmund Lo,42

Michael Salvatore,42 Saboor Shad,42 Jeffrey A. Thomas,42

John T. Lonsdale42; Roswell Park: Michael T. Moser,43

Bryan M. Gillard,43 Ellen Karasik,43 Kimberly Ramsey,43

Christopher Choi,43 Barbara A. Foster43; Science Care Inc.:John Syron,44 Johnell Fleming,44 Harold Magazine,44; Gift of LifeDonor Program: Rick Hasz45; LifeNet Health: Gary D. Walters46;UNYTS: Jason P. Bridge,47 Mark Miklos,47 Susan Sullivan47

caHUB ELSI study VCU: Laura K. Barker,48 Heather M. Traino,48,49

Maghboeba Mosavel,48 Laura A. Siminoff48,49 caHUBcomprehensive biospecimen resource Van Andel ResearchInstitute: Dana R. Valley,50 Daniel C. Rohrer,50 Scott D. Jewell50

caHUB pathology resource center NCI: Philip A. Branton51;Leidos Biomedical Research Inc.: Leslie H. Sobin,52 Mary Barcus52

caHUB comprehensive data resource Leidos Biomedical ResearchInc.: Liqun Qi,52 Jeffrey McLean,52 Pushpa Hariharan,52

Ki Sung Um,52 Shenpei Wu,52 David Tabor,52 Charles Shive52

caHUB operations management Leidos Biomedical Research Inc.:Anna M. Smith,52 Stephen A. Buia,52 Anita H. Undale,52

Karna L. Robinson,52 Nancy Roche,52 Kimberly M. Valentino,52

Angela Britton,52 Robin Burges,52 Debra Bradbury,52

Kenneth W. Hambright,52 John Seleski,53 Greg E. Korzeniewski52;Sapient Government Services: Kenyon Erickson54 Brain bankoperations University of Miami: Yvonne Marcus,55 Jorge Tejada,55

Mehran Taherian,55 Chunrong Lu,55 Margaret Basile,55

Deborah C. Mash55 Program management NHGRI: Simona Volpi,56

Jeffery P. Struewing,56 Gary F. Temple,56 Joy Boyer,57

Deborah Colantuoni56; NIMH: Roger Little,58 Susan Koester59;NCI: Latarsha J. Carithers,51 Helen M. Moore,51 Ping Guan,51

Carolyn Compton,51 Sherilyn J. Sawyer,51 Joanne P. Demchok,60

Jimmie B. Vaught,51 Chana A. Rabiner,51 Nicole C. Lockhart51,57

Writing committee Kristin G. Ardlie,1 Gad Getz,1,6 Fred A. Wright,9

Manolis Kellis,1,3 Simona Volpi,56 Emmanouil T. Dermitzakis12,13,14

1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.2Analytic and Translational Genetics Unit, Massachusetts GeneralHospital, Boston, MA 02114, USA. 3MIT Computer Science andArtificial Intelligence Laboratory, Massachusetts Institute ofTechnology, Cambridge, MA 02139, USA. 4Center for Basic andTranslational Obesity Research and Division of Endocrinology,Boston Children’s Hospital, Boston, MA 02115, USA. 5EstonianGenome Center, University of Tartu, Tartu, Estonia. 6Cancer Centerand Department of Pathology, Massachusetts General Hospital,Boston, MA 02114, USA. 7Center for Biomarker Research andPersonalized Medicine, Virginia Commonwealth University,Richmond, VA 23298, USA. 8Department of Statistics andOperations Research and Department of Biostatistics, University ofNorth Carolina, Chapel Hill, NC 27599, USA. 9BioinformaticsResearch Center and Departments of Statistics and BiologicalSciences, North Carolina State University, Raleigh, NC 27695, USA.10Department of Environmental Sciences and Engineering,University of North Carolina, Chapel Hill, NC 27599. 11Departmentof Veterinary Integrative Biosciences, Texas A&M University,College Station, TX 77843, USA. 12Department of Genetic Medicineand Development, University of Geneva Medical School, 1211Geneva, Switzerland. 13Institute for Genetics and Genomics inGeneva (iG3), University of Geneva, 1211 Geneva, Switzerland.14Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.15Department of Genetics, Stanford University, Stanford, CA94305, USA. 16New York Genome Center, New York, NY 10011,USA. 17Department of Systems Biology, Columbia University

Medical Center, New York, NY 10032, USA. 18Wellcome TrustCentre for Human Genetics Research, Nuffield Department ofClinical Medicine, University of Oxford, Oxford OX3 7BN, UK.19Department of Computer Science, Stanford University, Stanford,CA 94305, USA. 20Department of Computer Science, JohnsHopkins University, Baltimore, MD 21218, USA. 21Centre forGenomic Regulation (CRG), 08003 Barcelona, Catalonia, Spain.22Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain.23Human Genetics Department, McGill University, Montréal,Quebec H3A 0G1, Canada. 24National Institute for ScientificComputing, Petropolis 25651-075, Rio de Janeiro, Brazil.25Department of Stem Cell and Regenerative Biology, HarvardUniversity, Cambridge, MA 02138, USA. 26Universitat de Barcelona,08028 Barcelona, Spain. 27Radboud University Nijmegen,Netherlands. 28Institut Hospital del Mar d’Investigacions Mèdiques(IMIM), 08003 Barcelona, Spain. 29Oxford Centre for Diabetes,Endocrinology and Metabolism, University of Oxford, ChurchillHospital, Oxford OX3 7LJ, UK. 30Oxford NIHR Biomedical ResearchCentre, Churchill Hospital, Oxford OX3 7LJ, UK. 31Section ofGenetic Medicine, Department of Medicine and Department ofHuman Genetics, University of Chicago, Chicago, IL 60637, USA.32Division of Genetic Medicine, Department of Medicine, VanderbiltUniversity, Nashville, TN 37232, USA. 33Department of HumanGenetics, University of Chicago, Chicago, IL 60637, USA. 34INRA,Department of Plant Biology and Breeding, UMR 1334, AGAP,Montpellier, 34060, France. 35Department of Biostatistics,University of Michigan, Ann Arbor, MI 48109, USA. 36Department ofStatistics, University of Chicago, Chicago, IL 60637, USA.37Department of Genetics and Biology, Stanford University,Stanford, CA 94305, USA. 38Howard Hughes Medical Institute,Chicago, IL, USA. 39Department of Genetics and GenomicSciences, Icahn School of Medicine at Mount Sinai, New York, NY10029, USA. 40Icahn Institute of Genomics and Multiscale Biology,Icahn School of Medicine at Mount Sinai, New York, NY 10029,USA. 41Department of Statistics, Harvard University, Cambridge,

MA 02138. 42National Disease Research Interchange, Philadelphia,PA 19103, USA. 43Roswell Park Cancer Institute, Buffalo, NY 14263,USA. 44Science Care Inc., Phoenix, AZ, USA. 45Gift of Life DonorProgram, Philadelphia, PA 19103, USA. 46LifeNet Health, VirginiaBeach, VA 23453, USA. 47UNYTS, Buffalo, NY 14203, USA.48Virginia Commonwealth University, Richmond, VA 23298, USA.49Department of Public Health, Temple University, Philadelphia, PA19122, USA. 50Van Andel Research Institute, Grand Rapids, MI49503, USA. 51Biorepositories and Biospecimen Research Branch,National Cancer Institute, Bethesda, MD 20892, USA.52Biospecimen Research Group, Clinical Research Directorate,Leidos Biomedical Research, Inc., Rockville, MD 20852, USA.53iDoxSolutions Inc., Bethesda, MD 20814, USA. 54SapientGovernment Services, Arlington, VA 22201, USA. 55BrainEndowment Bank, Department of Neurology, Miller School ofMedicine, University of Miami, Miami, FL 33136, USA. 56Division ofGenomic Medicine, National Human Genome Research Institute,Bethesda, MD 20892, USA. 57Division of Genomics and Society, NationalHuman Genome Research Institute, Bethesda, MD 20892, USA. 58Officeof Science Policy, Planning, and Communications, National Instituteof Mental Health, Bethesda, MD 20892, USA. 59Division ofNeuroscience and Basic Behavioral Science, National Institute ofMental Health, Bethesda, MD 20892, USA. 60Cancer DiagnosisProgram, National Cancer Institute, Bethesda, MD 20892, USA.

SUPPLEMENTARY MATERIALS

www.sciencemag.org/content/348/6235/648/suppl/DC1Materials and MethodsBox S1Figs. S1 to S34Tables S1 to S15References (50–86)

3 October 2014; accepted 3 April 201510.1126/science.1262110

REPORTS◥

HUMAN GENOMICS

The human transcriptome acrosstissues and individualsMarta Melé,1,2* Pedro G. Ferreira,1,3,4,5* Ferran Reverter,1,6,7* David S. DeLuca,8

Jean Monlong,1,7,9 Michael Sammeth,1,7,10 Taylor R. Young,8 Jakob M Goldmann,1,7,11

Dmitri D. Pervouchine,1,7,12 Timothy J. Sullivan,8 Rory Johnson,1,7 Ayellet V. Segrè,8

Sarah Djebali,1,7 Anastasia Niarchou,3,4,5 The GTEx Consortium, Fred A. Wright,13

Tuuli Lappalainen,3,4,5,14,15 Miquel Calvo,6 Gad Getz,8,16 Emmanouil T. Dermitzakis,3,4,5

Kristin G. Ardlie,8† Roderic Guigó1,7,17,18†

Transcriptional regulation and posttranscriptional processing underlie many cellular andorganismal phenotypes. We used RNA sequence data generated by Genotype-TissueExpression (GTEx) project to investigate the patterns of transcriptome variation acrossindividuals and tissues. Tissues exhibit characteristic transcriptional signatures that showstability in postmortem samples. These signatures are dominated by a relatively smallnumber of genes—which is most clearly seen in blood—though few are exclusive toa particular tissue and vary more across tissues than individuals. Genes exhibiting highinterindividual expression variation include disease candidates associated with sex,ethnicity, and age. Primary transcription is the major driver of cellular specificity, withsplicing playing mostly a complementary role; except for the brain, which exhibits a moredivergent splicing program. Variation in splicing, despite its stochasticity, may play incontrast a comparatively greater role in defining individual phenotypes.

Gene expression is the key determinant ofcellular phenotype, and genome-wide ex-pression analysis has been a mainstay ofgenomics and biomedical research, pro-viding insights into the molecular events

underlying human biology and disease. Whereasexpression data sets from tissues/primary cells(1, 2) and individuals (3) have accumulated overrecent years, only limited expression data sets haveallowed analysis across tissues and individuals

660 8 MAY 2015 • VOL 348 ISSUE 6235 sciencemag.org SCIENCE

RESEARCHon June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 2: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

simultaneously (4). The Genotype-Tissue Ex-pression Project (GTEx) is developing such aresource (5, 6), collecting multiple “nondiseased”tissues sampled from recently deceased humandonors. We analyzed the GTEx pilot data freeze(6), which comprised RNA sequencing (RNA-seq) from 1641 samples from 175 individualsrepresenting 43 sites: 29 solid organ tissues,11 brain subregions, whole blood, and two celllines: Epstein-Barr virus–transformed lympho-cytes (LCL) and cultured fibroblasts from skin[table S1 and (7)].The identification and characterization of

genetic variants that are associated with geneexpression are extensively discussed in (6). Herewe use the GTEx data to investigate the pat-terns of transcriptome variation across indi-viduals and tissues and how these patternsassociate with human phenotypes. RNA-seq per-formed on the GTEx pilot samples producedan average of 80 million paired-end mappedreads per sample (fig. S1) (7, 8). We used themapped reads to quantify gene expression usingGencode V12 annotation (9), which includes20,110 protein-coding genes (PCGs) and 11,790long noncoding RNAs (lncRNAs). Comparisonwith microarray-based quantification for a sub-set of 736 samples showed concordance be-tween the two technologies (average correlationcoefficient = 0.83, fig. S2). At the threshold de-fined for expression quantitative trait loci (eQTL)analysis [reads per kilobase per million mappedreads (RPKM) > 0.1, see (7)], at which 88% ofPCGs and 71% of lncRNAs are detected in at leastone sample, the distribution of gene expressionacross tissues is U-shaped and complementarybetween PCGs (generally ubiquitously expressed)and lncRNAs (typically tissue-specific or not ex-pressed, Fig. 1A).Tissues show a characteristic transcription-

al signature, as revealed by multidimensionalscaling, of both PCG and lncRNA expression(figs. 1B, S3, and S4), with individual pheno-

types contributing little (fig. S5). The primaryseparation, as observed in prior studies (10),is between nonsolid (blood) and solid tissuesand, within solid tissues, brain is the most dis-tinct. Brain subregions are not well differen-tiated, with the exception of cerebellum (fig. S6).Postmortem ischemia appears to have little im-pact on the characteristic tissue transcriptionalsignatures, as previously noted (11). In a com-parison of 798 GTEx samples with 609 “nondis-eased” samples obtained from living (surgical)donors (table S2), we found that GTEx samplesclustered with surgical samples of the same tis-sue type (Fig. 1C and table S3) (12).Tissue transcription is generally dominated

by the expression of a relatively small numberof genes. Indeed, we found that for most tis-sues, about 50% of the transcription is accountedfor by a few hundred genes (13). In many tissues,the bulk of transcription is of mitochondrialorigin (Fig. 1D and table S4) (14). In kidney,for instance, a highly aerobic tissue with manymitochondria, a median of 51% (>65% in somesamples) of the transcriptional output is fromthe mitochondria (fig. S7). Other tissues shownuclear-dominated expression; in blood, for ex-ample, three hemoglobin genes contribute morethan 60% to total transcription. Genes relatedto lipid metabolism in pancreas, actin in muscle,and thyroglobulin in thyroid are other exam-ples of nuclear genes contributing dispropor-tionally to tissue-specific transcription. BecauseRNA samples are generally sequenced to the samedepth, in tissues where a few genes dominateexpression, fewer RNA-seq reads are compar-atively available to estimate the expression ofthe remaining genes, decreasing the power toestimate expression variation. These tissues—i.e., blood, muscle, and heart (Fig. 1E)—are, con-sequently, those with less power to detect eQTLs(6). Because most eQTL analyses are performedon easily accessible samples, such as blood, thishighlights the relevance of the GTEx multitissueapproach.Although thousands of genes are differentially

expressed between tissues (fig. S8) or show tissue-preferential expression (fig. S9 and table S5),fewer than 200 genes are expressed exclusivelyin a given tissue (figs. S10 and S11 and tables S6and S7, A to E). The vast majority (~ 95%) areexclusive to testis and many are lncRNAs. Thismay reflect low-level basal transcription com-mon to all cell types or result from general tissueheterogeneity, with few primary cell types beingspecific to a given tissue.Expression of repetitive elements also reca-

pitulates tissue type (table S8 and fig. S12A).We identified 3046 PCGs whose expression, inat least one tissue, was correlated with the ex-pression of the closest repeat element (on aver-age 2827 base pairs away, fig. S12B). In abouthalf of these cases, the repeat was also signif-icantly coexpressed with other repeats of itssame family (table S8 and fig. S13). LncRNAexpression can be regulated by specific repeatfamilies (15), and we found evidence that testis-specific expression could be regulated by endog-

enous retrovirus L repeats (ERVL and ERVL-MaLR) (fig. S12C).Using linear mixed models, we found that

variation in gene expression is far greater amongtissues (47% of total variance in gene expression)than among individuals (4% of total variance,Fig. 2A and table S9), and very similar for PCGsand lncRNAs when controlling for gene expres-sion (Fig. 2A). Genes that show high expressionvariance across individuals and low varianceacross tissues include genes on the sex chromo-somes, as well as autosomal genes, such as theRHD gene that determines Rh blood group.We identified 92 PCGs and 43 lncRNAs with

global sex-biased expression [false discovery rate(FDR) < 0.05, Fig. 2B and table S10]. Genes over-expressed in males are predominantly locatedon the Y chromosome. Conversely, many geneson the X chromosome are overexpressed in fe-males, suggesting that more genes might es-cape X inactivation than previously described(16). Among these, we found XIST and JPX,known to participate in X inactivation, as wellas the lncRNAs RP11-309M23.1 and RP13-216E22.4,the expression of which shows enrichment inthe nucleus in female cell lines from ENCODE(17) and hence could be candidates to alsoparticipate in X inactivation (fig. S14) (16).Among autosomal PCGs, MMP3, linked to sus-ceptibility to coronary heart disease [OnlineMendelian Inheritance in Man (OMIM) no.614466] and more prevalent in males, showsthe strongest expression bias (Fig. 2B).We detected 221 PCGs and 153 lncRNAs glob-

ally differentially expressed between individu-als of European and African-American ancestry(FDR < 0.05, Fig. 2C and table S11). There is aslight enrichment of lncRNAs (P < 1 × 10−6),among which we identified the RP11-302J23.1gene, highly expressed in cardiac tissue in Af-rican Americans only, and located in a regionthat harbors weak associations to heart disease(18). Additionally, some genes showing differ-ential expression by ethnicity lie in genomic re-gions under positive selection in European orsub-Saharan African populations (Fig. 2C andfig. S15).Finally, we detected 1993 genes that globally

change expression with age (FDR < 0.05, Fig. 2Dand table S12). Genes that decrease expressionare enriched in functions and pathways relatedto neurodegenerative diseases such as Parkinson’sand Alzheimer’s diseases, amongwhich eight har-bor single-nucleotide polymorphisms (SNPs)for these diseases identified from genome-wideassociation studies (P < 0.05). Among the genesthat increase expressionwith age isEDA2R, whoseligand, EDA, has been associated with age-relatedphenotypes (19).We also identified 753 genes with tissue spe-

cific sex-biased expression (FDR < 0.05, table S13)predominantly in breast tissue (92%), and 31genes with tissue-specific ethnicity-biased expres-sion, many in the skin (FDR < 0.05, Table 1 andtable S14). Among the sex-differentially expressedgenes, five show biased expression specificallyin heart and are of interest given the differing

SCIENCE sciencemag.org 8 MAY 2015 • VOL 348 ISSUE 6235 661

RESEARCH | REPORTS

1Center for Genomic Regulation (CRG), Barcelona, Catalonia,Spain. 2Harvard Department of stem cell and regenerativebiology, Harvard University, Cambridge, MA, USA.3Department of Genetic Medicine and Development,University of Geneva, Geneva, Switzerland. 4Institute forGenetics and Genomics in Geneva (iGE3), University ofGeneva, Geneva, Switzerland. 5Swiss Institute ofBioinformatics, Geneva, Switzerland. 6Facultat de Biologia,Universitat de Barcelona (UB), Barcelona, Catalonia, Spain.7Universitat Pompeu Fabra (UPF), Barcelona, Catalonia,Spain. 8Broad Institute of MIT and Harvard, Cambridge, MA,USA. 9McGill University, Montreal, Canada. 10NationalInstitute for Scientific Computing (LNCC), Petropolis, Rio deJaneiro, Brazil. 11Radboud University, Nijmegen, Netherlands.12Faculty of Bioengineering and Bioinformatics, MoscowState University, Leninskie Gory 1-73, 119992 Moscow,Russia. 13North Carolina State University, Raleigh, NC, USA.14New York Genome Center, New York, NY, USA.15Department of Systems Biology, Columbia University, NewYork, NY, USA. 16Cancer Center and Department of Pathology,Massachusetts General Hospital, Boston, MA 02114, USA.17Institut Hospital del Mar d’Investigacions Mèdiques (IMIM),Barcelona, Catalonia, Spain. 18Joint CRG-Barcelona SuperComputing Center (BSC)–Institut de Recerca Biomedica (IRB)Program in Computational Biology, Barcelona, Catalonia, Spain.*These authors contributed equally to this work. †Correspondingauthor. E-mail: [email protected] (K.G.A.); [email protected] (R.G.)

on June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 3: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

prevalence of cardiovascular disease betweenmales and females. One of these genes, PLEKHA7(fig. S15C), contains SNPs associated with riskfor cardiovascular disease.Overall, tissue specificity is likely to be driven

by the concerted expression of multiple genes.Thus, we performed sex-based differential anal-

ysis of coexpression networks. We identified42 coexpression modules in males and 46 infemales (fig. S16). Among male-specific mod-ules, we found one related to spermatid differ-entiation and development (FDR = 9.0 × 10−4,fig. S16B), and among female-specific modules,we found one related to epidermis and ecto-

derm development (FDR = 4.6 × 10−14, fig.S16C). Differential network expression, there-fore, distinguishes differences between maleand females not well captured by analysis ofindividual genes.Split-mapped RNA-seq reads predict about

87,000 novel junctions with very strong support

662 8 MAY 2015 • VOL 348 ISSUE 6235 sciencemag.org SCIENCE

RESEARCH | REPORTS

0 1 2 3 4 5 6 7 8 9

lncRNAs

Protein coding

0

2

4

6

8

10

12 0

2

4

6

8

10

12

Num

ber

of g

enes

(th

ousa

nds)

Number of tissues−0.1 0.0 0.1 0.2 0.3 0.4

−0.10

−0.05

0.00

0.05

0.10

0.15

Coordinate 1 (34%)

Coo

rdin

ate

2 (1

6%)

LCL

Blood

Brain

Testis

Heart

Muscle

SkinArtery

NerveLung

Thyroid

Adipose tissue

Liver

Breast

Pituitary

0.00

0.25

0.50

0.75

1.00

1 100 10,000

Number of genes

0.25

0.50

0.75

0.00

Protein coding geneMitochondrion gene

PseudogenelncRNA

GTEx samplesAdipose, subcutaneousBlood, ante-mortemBlood, post-mortemMuscle, skeletalSkin, sun exposedBreastLungThyroidHeart, left ventricle

Living samplesAdipose, subcutaneousMuscle, skeletalSkin, sun exposedBlood, ante-mortemBreastLungThyroidHeart, left ventricle

F1

F2

F3

F4

F5

F6

F7

Heart

Blood

Ante-mortem

Post-mortem

Lung

Thyroid

Muscle AdiposeBreast

Skin

−2 –1 0 1 2−2

−1

0

1

2

S1

S2

Fra

ctio

n o

f to

tal t

ransc

riptio

nal o

utp

ut

Mu

scle

Hea

rt

Live

r

Blo

od

Pan

crea

s

Kid

ney

Bra

inS

tom

ach

Col

onA

dren

al g

land

Pitu

itary

Ova

ryV

agin

a

Eso

phag

usS

kin

Pro

stat

eLC

L

Fib

robl

asts

Art

ery

Bre

ast

Ute

rus

Tes

tis

Ner

ve

Ad

ipo

se

Lung

Th

yro

id

MuscleHeartBlood

BrainSkin

Artery

TestisNerve

Adipose

Thyroid

0.05 0.250.15Distance between clusters

BloodLCLTestis

LiverMuscle

HeartBrain

PituitaryPancreas

KidneyFibroblasts

SkinAdrenal glandStomach

ColonOvary

VaginaEsophagus

ProstateArtery

UterusFallopian tube

NerveAdipose

BreastLung

Thyroid

0.1 <> 0.1 and < 1> 1 and < 10> 10 and < 100> 100 and < 1000> 1000 and < 10000

RPKM

Fig. 1. The GTEx multitissue transcriptome. (A) Gene expression levelsand number of tissues in which genes are expressed (>0.1 RPKM in atleast 80% of the samples). RPKMs are averaged over all genes expressedin a given number of tissues. (B) Sample and tissue similarity on thebasis of gene expression profiles. Left: Multidimensional scaling Right:Tissue hierarchical clustering. (C) Expression values from eight GTExtissues (colored circles) plotted radially along seven metagenes ex-tracted from expression data. Antemortem samples curated from theGene Expression Omnibus (GEO) cluster strongly with GTEx tissues. (D)

Transcriptome complexity. Bottom: Cumulative distribution of the av-erage fraction of total transcription contributed by genes when sorted frommost-to-least expressed in each tissue (x axis). Lines represent mean valuesacross samples of the same tissue, and lighter-color surfaces around themean represent dispersion calculated as the standard deviation divided bythe cumulative sum of all means.Top: Biological type and relative contributionto total transcription of the hundred most expressed genes. Height of thebars is proportional to the fraction that these genes contribute to totaltranscription.

on June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 4: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

(fig. S17). These tend to be more tissue spe-cific, detected in fewer samples, and less con-served than previously annotated junctions(only 2.6% of novel junctions can be detectedas orthologous in mouse, compared to 65% forannotated junctions). Multidimensional scal-ing based on exon inclusion levels again large-ly recapitulates tissue type (Fig. 3A). However,samples from brain cluster as the primary out-group, supporting the existence of a distinct

splicing program in the brain (20). Furthermore,preferential gene expression of RNA-binding pro-teins and both differential and preferentialexon inclusion are enriched in the brain (figs.S18 and S19 and table S15). We found very fewexons exclusively included or excluded in a giventissue (fig. S20 and table S16), 40% of whichshow exclusive inclusion in the brain. We alsofound that micro-exons (<15 bp) are overwhelm-ingly used in the brain compared to other tis-

sues (Wilcoxon test, P < 1 × 10−7, Fig. 3B). Thispattern is not obvious in short exons longer than15 bp (P = 0.3, fig. S21). This observed brain-specific splicing pattern may result from differen-tial splicing in the cerebellum, because expressionclustering of the brain regions reveals a generalup-regulation of RNA-binding proteins specifi-cally in the cerebellum (Fig. 3C). This is also thebrain region exhibiting the largest proportion ofnovel splicing events (fig. S22).

SCIENCE sciencemag.org 8 MAY 2015 • VOL 348 ISSUE 6235 663

RESEARCH | REPORTS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Tissue variance

Invi

diua

l var

ianc

e

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.6

0.7

0.8

0.9

1.0

Tissue variance

GSTM1

FAM21B

RPL9

GSTT1

RHD

NPIPL2

HLA−DRB5

AC104135.3

HLA−DQB1−AS1

RP11−365P13.3

XIST

RPS28

AC104135.2

RP11−195E2.4

RP11−403B2.6

AC000032.2

0

2

4

6

8Sp Rho = 0.577 Sp Rho = 0.413

20 30 40 50 60 70Age (yr)

Artery

0

2

4

6

20 30 40 50 60 70Age (yr)

Exp

ress

ion

(RP

KM

)

Nerve

Exp

ress

ion

diffe

renc

e(lo

g [R

PK

M])

0.1

0.2

0.5

1.0

COL28A1

RP11−345M22.2RP11−302J23.1

Reg

ress

ion

coef

ficie

nt

0.001

0.002

0.005

0.010

0.020

Exp

ress

ion

diffe

renc

e(lo

g [R

PK

M])

Exp

ress

ion

(RP

KM

)

0.05

0.10

0.20

0.50

1.00

2.00

5.00

10.00Overexpressed

in malesOverexpressedin females

Overexpressed in AA Overexpressed in EA Decrease with age Increase with age

*

JPX

XIST

MMP3

MMP3

0 0

123456

African American

Exp

ress

ion

(RP

KM

)

123456

European American

0.3 0.4 0.5 0.6 0.7 0.80.00

0.05

0.10

0.15

0.68

0.650.49

0.70

050

100150200250300

Males Females

Protein codinglncRNAs

Chr YAutosomes

Pseudoautosomal regionChr X lincRNAs

Chromosome Y

**Protein coding

lncRNAs

Positive selection in European (CEU)Positive selection in Yaruba (YRI) or Bantu *

*Protein codinglncRNAs

PD or AD metabolic pathwayPD or AD GWAS

Heart

AdiposeBreastKidneySkinMuscleNerve

RP11-302J23.1 EDA2R

EDA2R

Chr X

Fig. 2. Gene expression across tissues and individuals. (A) Left: Con-tribution of tissue and individual to gene expression variation of PCGsand lncRNAs. Bottom right: Mean T SD over all genes (filled circles) andover genes with similar expression levels in PCGs and lncRNAs (unfilledcircles). Circle size is proportional to the sum of tissue and individualvariation, and segment length corresponds to 0.5 SD. Top right: geneswith high individual variation and low tissue variation. (B) Sex differen-tially expressed genes. Top: differentially expressed genes (FDR < 0.05)sorted according to expression differences between males and females.Genes in the Y chromosome are sorted according to the expression in

males. Bottom: MMP3 gene expression in males and females. (C) Genesdifferentially expressed with ethnicity. Top: differentially expressed genes(FDR < 0.05) between African Americans (AA) and European Americans(EA) sorted according to expression differences. A few of these genes liein regions reported to be under positive selection in similar populations.Bottom: expression of RP11-302J23.1. (D) Genes differentially expressedwith age. Top: Genes sorted according to the regression coefficient.Bottom: expression of EDAR2 gene in nerve and artery as a function ofage. Shaded area around the regression line represents 95% confidenceinterval.

on June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 5: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

664 8 MAY 2015 • VOL 348 ISSUE 6235 sciencemag.org SCIENCE

−4 −2 0 2 4z score

BrainAdipose

PituitaryArtery

SkinBlood

HeartNerveLung

BreastPancreas

ThyroidEsophagus

Testis

FibroblastsStomach

Colon

Muscle

LCL

−1.0 −0.5 0.0 0.5 1.0

Phi

Gene expression contributionto isoform abundance

0

2

4

6

Den

sity

ESRP2ESRP1RBM4

KHDRBS2HNRNPH2

ELAVL4ELAVL2NOVA2

KHDRBS3NOVA1

HNRPLLSRSF10

PCBP1HNRNPA2B1

YBX1FUS

SRSF5HNRPDLHNRNPLHNRNPK

CELF2RBFOX1

HNRNPF

RBMXDAZAP1

TRA2BFMR1

PTBP2

MBNL1ELAVL1

SYNCRIPPTBP1CELF1TIAL1

SRRM1TIA1QKI

HNRNPCELAVL3

SRSF9SRSF3

HNRNPU

HNRNPMHNRNPA1

PCBP2

SF3B1HNRNPA3

TARDBP

RBM5SF1

SRSF2

SRSF6RBFOX2

TRA2AHNRNPH3

SFPQSRSF1

HNRNPH1RBM25

SRSF11ZRANB2

SRSF7KHSRP

KHDRBS1

SRSF4HNRNPA0

HNRNPD

Cerebellar hemisphereCerebellumCortexFrontal cortexHippocampusSubstantia nigraAnterior cingulate corteAmygdalaCaudateNucleus accumbensPutamenHypothalamusSpinal cord

Brain subregions

0.00 0.25 0.50 0.75 1.00

Between individual

Between tissue

−0.08 −0.06 −0.04 −0.02 0.00 0.02

0.04

Coordinate 1 (52%)

Coo

rdin

ate

2 (1

1.5%

)

Brain

Blood

LCL

Pituitary

Testis

Muscle

Heart Artery

Thyroid

Nerve

Skin

Adipose tissue

Breast

Lung

0.02

0.00

- 0.02

0.10 0.20 0.30 0.40 0.50 0.60 0.70

0.04

0.08

0.12

0.2 0.4 0.6 0.8

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Indi

vidu

al v

aria

nce

Tissue variance

0.00 0.05 0.10 0.15 0.20

0.15

0.20

0.25

0.30

0.35

SNAP29TRMT1ID3

FAM127A

RNF144B

MIEN1

NDUFS5

RAD23A

HLA−C

TMSB4X

NAMPTL

RPL3RPS16

RPS12

EIF1B

FARSB

RPL23

RPL36

RPLP1RPS8

UBC

RPL29RPL9

RPL4

EIF3F

RPL12

RPL23A

RPL10ARPL39

RPSAP58

RPL17

RPS28

Tissue variance

Translation related

Isoform relative abundance (0.23)

Gene expression (0.65)

0.0

Fig. 3. Splicingacross tissue and individuals. (A) Multidimensional scalingof all samples on the basis of exon inclusion levels (Percent spliced in, PSI).(B) Microexon inclusion across tissues.Values of tissue exon inclusion closeto 1 (–1) indicate that the microexon is included (excluded), in nearly allsamples from the tissue, and excluded (included) in nearly all samples fromthe rest of the tissues.Tissues are sorted according to tissue exon inclusion(phi) median value. (C) Clustering of brain samples on the basis of thenormalized expression levels of 67 RNA binding proteins involved in splicing.

The order of samples and genes is obtained by biclustering the expressionmatrix. (D) Left: Contribution of tissue and individual to splicing variation inPCGs. Bottom right: Mean T SD of individual and tissue contributions tosplicing and to gene expression variation. Circle size is proportional to thesum of tissue and individual variation and segment length corresponds to0.5 SD. Top right: Genes with high splicing variation across individuals. (E)Contribution of gene expression to the between-individual and between-tissue variation in isoform abundance

RESEARCH | REPORTSon June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 6: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

In contrast to gene expression, variation ofsplicing, measured either from relative iso-form abundance or exon inclusion, is similaracross tissues and across individuals, but ex-hibits a much larger proportion of residual un-explained variation (Fig. 3D, fig. S23, and tableS17). This could arise from nonadditive inter-actions between individuals and tissues, butmight also reflect stochastic, nonfunctional fluc-tuations that are more common in splicing thanin expression (21). Among the genes that showhigh interindividual splicing variability, we foundan enrichment of ribosomal proteins and genesrelated to translation and protein biosynthe-sis (Fig. 3D and table S18). Higher variabilitybetween individuals may also partially reflectan effect of ischemic time on splicing, whichwe observed when clustering samples by exoninclusion within each tissue (fig. S24).The abundance of splicing isoforms reflects

the actions of both primary transcription andposttranscriptional processing—mostly alter-native splicing. To determine the relative con-tribution of each process, we estimated theproportion of variance in isoform abundancethat can be simply explained by variance ingene expression. We found that gene expres-sion explains only 45% of the variance betweenindividuals, but 84% of the variance betweentissues (Fig. 3E and fig. S25). This stronglysuggests that primary transcription is the maindriver of cellular specificity, with splicing play-ing a complementary role. Although this maybe unexpected, given the magnitude of the ef-fect, it is consistent with recent findings of lowproteomic support for alternatively spliced iso-forms (22) and few shifts in major protein iso-forms across cell types (table S19) (23).

Overall, our results underscore the valueof monitoring the transcriptome of multipletissues and individuals in order to understandtissue-specific transcriptional regulation andto uncover the transcriptional determinantsof human phenotypic variation and diseasesusceptibility.

REFERENCES AND NOTES

1. FANTOM Consortium and the RIKEN PMI and CLST(DGT) et al., Nature 507, 462–470 (2014).

2. ENCODE Project Consortium, Nature 489, 57–74 (2012).3. T. Lappalainen et al., Nature 501, 506–511 (2013).4. E. Grundberg et al., Nat. Genet. 44, 1084–1089

(2012).5. T. J. Lonsdale et al., Nat. Genet. 45, 580–585 (2013).6. The GTEx Consortium, Science 348, 648–660 (2015).7. Materials and methods are available in the supplementary

materials on Science Online.8. D. S. DeLuca et al., Bioinformatics 28, 1530–1532

(2012).9. J. Harrow et al., Genome Res. 22, 1760–1774 (2012).10. M. Lukk et al., Nat. Biotechnol. 28, 322–324 (2010).11. A. C. Birdsill, D. G. Walker, L. Lue, L. I. Sue, T. G. Beach,

Cell Tissue Bank. 12, 311–318 (2011).12. J. P. Brunet, P. Tamayo, T. R. Golub, J. P. Mesirov, Proc. Natl.

Acad. Sci. U.S.A. 101, 4164–4169 (2004).13. P. Carninci et al., Genome Res. 10, 1617–1630 (2000).14. R. D. Kelly, A. Mahmud, M. McKenzie, I. A. Trounce,

J. C. St John, Nucleic Acids Res. 40, 10124–10138(2012).

15. D. Kelley, J. Rinn, Genome Biol. 13, R107 (2012).16. L. Carrel, H. F. Willard, Nature 434, 400–404 (2005).17. S. Djebali et al., Nature 489, 101–108 (2012).18. V. Regitz-Zagrosek, U. Seeland, Wien. Med. Wochenschr. 161,

109 (2011).19. M. Yan et al., Science 290, 523–527 (2000).20. G. Yeo, D. Holste, G. Kreiman, C. B. Burge, Genome Biol. 5, R74

(2004).21. J. K. Pickrell, A. A. Pai, Y. Gilad, J. K. Pritchard, PLOS Genet. 6,

e1001236 (2010).22. I. Ezkurdia et al., Mol. Biol. Evol. 29, 2265–2283 (2012).23. M. Gonzàlez-Porta, A. Frankish, J. Rung, J. Harrow, A. Brazma,

Genome Biol. 14, R70 (2013).

ACKNOWLEDGMENTS

We acknowledge and thank the donors and their families for theirgenerous gifts of organ donation for transplantation and tissuedonations for the GTEx research study. We thank the GenomicsPlatform at the Broad Institute for data generation; L. Gaffney forhelp with figures; E. Gelfand and C. Trowbridge for project supportand members of the Analysis Working Group for feedback; andD. MacArthur, J. Maller, and B. Neale for critical reading of themanuscript. The primary and processed data used to generatethe analyses presented here are available in the following locations:All primary sequence files are deposited in and available fromdbGaP (phs000424.v3.p1); gene and transcript quantificationsare available on the GTEx Portal (www.gtexportal.org). TheGenotype-Tissue Expression (GTEx) Project was supported bythe Common Fund of the Office of the Director of the NationalInstitutes of Health (http://commonfund.nih.gov/GTEx). Additionalfunds were provided by the National Cancer Institute (NCI);National Human Genome Research Institute; National Heart, Lung,and Blood Institute; National Institute on Drug Abuse; NationalInstitute of Mental Health; and National Institute of NeurologicalDisorders and Stroke. This work was supported by the followinggrants and contracts from the United States National Institutes ofHealth: contract HHSN261200800001E (Leidos Prime contractwith NCI); contracts 10XS170 [National Disease ResearchInterchange (NDRI)], 10XS171 (Roswell Park Cancer Institute),10X172 (Science Care Inc.), and 12ST1039 (IDOX); contract10ST1035 (Van Andel Institute); contract HHSN268201000029C(Broad Institute); R01 DA006227-17 (University of Miami BrainBank); R01 MH090941 (University of Geneva), EuropeanResearch Council, Swiss National Science Foundation, andLouis-Jeantet Foundation to E.T.D.; R01 MH090936 (Universityof North Carolina–Chapel Hill); and grants BIO2011-26205from the Spanish Ministerio de Ciencia e Innovación (MICINN),2014 SGR 464 and 2014 SGR 1319 from the Generalitat deCatalunya, and 294653 from the European ResearchCouncil–European Commission.

SUPPLEMENTARY MATERIALS

www.sciencemag.org/content/348/6235/660/suppl/DC1Materials and methodsFigs. S1 to S25Tables S1 to S20References (24–72)Data tables S4 to S7 and S9 to S18

20 October 2014; accepted 2 April 201510.1126/science.aaa0355

SCIENCE sciencemag.org 8 MAY 2015 • VOL 348 ISSUE 6235 665

Table 1. Genes with sex-biased and ethnicity-biased expression in GTEx tissues. Differentially expressed genes between males (M) and females (F) andbetween African American (AA) and European American (EA) in those tissues with at least 10 samples per group. Median fold change (on autosomal genes) wascalculated for tissues with more than two significant genes.

Sex Ethnicity

No. ofsamples

No. ofgenes Med.

foldchange

No. ofsamples

No. ofgenes Med.

foldchangeM F

Allgenes

PCGs lncRNAY

chrom.X

chrom.

Pseudo-autosomalregion

Autosomal AA EAAll

genesPCGs lncRNA

Adipose 76 37 45 29 16 28 4 1 12 2.8 17 95 0 0 0 –

Artery 88 57 40 21 19 29 2 0 9 2.3 21 121 3 0 0 2.9Blood 99 57 20 12 8 18 1 0 1 – 24 129 0 0 0 –

Breast 14 13 762 567 195 26 21 0 715 4.2 5 22 – – – –

Esophagus 27 11 23 14 9 22 1 0 0 – 5 33 – – – –

Heart 74 34 27 15 12 23 1 0 3 3 11 95 2 1 1 –

LCL 26 13 23 11 12 21 1 1 0 – 9 30 – – – –

Lung 76 43 34 17 17 31 1 0 2 – 14 104 4 3 1 3Muscle 87 51 42 27 15 24 6 0 12 2.7 18 117 2 2 0 –

Nerve 54 34 38 24 14 25 5 1 7 2.3 13 73 2 1 1 –

Skin 76 43 47 32 15 31 5 0 11 2.3 14 103 14 13 1 2.9Thyroid 65 40 42 25 17 27 5 1 9 1.9 13 90 13 7 6 3.5No. of samples 942 566 172 1307 35 27 8No. of individuals 111 64 847 628 219 41 34 1 771 24 148

RESEARCH | REPORTSon June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from

Page 7: The human transcriptome across tissues and individuals · 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 2 Analytic and Translational Genetics Unit, Massachusetts

The human transcriptome across tissues and individuals

Ardlie and Roderic GuigóThe GTEx Consortium, Fred A. Wright, Tuuli Lappalainen, Miquel Calvo, Gad Getz, Emmanouil T. Dermitzakis, Kristin G.M Goldmann, Dmitri D. Pervouchine, Timothy J. Sullivan, Rory Johnson, Ayellet V. Segrè, Sarah Djebali, Anastasia Niarchou, Marta Melé, Pedro G. Ferreira, Ferran Reverter, David S. DeLuca, Jean Monlong, Michael Sammeth, Taylor R. Young, Jakob

DOI: 10.1126/science.aaa0355 (6235), 660-665.348Science 

, this issue p. 648, p. 660, p. 666; see also p. 640Scienceat how truncated protein variants affect expression across tissues.

lookedet al. measured how transcription varies among tissues, and Rivas et al.tissue-specific expression patterns. Melé individuals. They identified quantitative genetic traits that affect gene expression and determined which of these exhibitGenotype-Tissue Expression (GTEx) Consortium collected 1641 postmortem samples covering 54 body sites from 175 per genetic variant. In order to examine how genetic expression varies among tissues within individuals, thethe effects of this variation on the regulation of gene expression. Furthermore, only a few tissues have been examined

Human genomes show extensive genetic variation across individuals, but we have only just started documentingExpression, genetic variation, and tissues

ARTICLE TOOLS http://science.sciencemag.org/content/348/6235/660

MATERIALSSUPPLEMENTARY http://science.sciencemag.org/content/suppl/2015/05/06/348.6235.660.DC1

CONTENTRELATED

http://science.sciencemag.org/content/sci/348/6235/666.fullhttp://science.sciencemag.org/content/sci/348/6235/648.fullhttp://science.sciencemag.org/content/sci/348/6235/640.fullhttp://science.sciencemag.org/content/sci/348/6235/618.full

REFERENCES

http://science.sciencemag.org/content/348/6235/660#BIBLThis article cites 64 articles, 16 of which you can access for free

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Terms of ServiceUse of this article is subject to the

is a registered trademark of AAAS.ScienceScience, 1200 New York Avenue NW, Washington, DC 20005. The title (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement ofScience

Copyright © 2015, American Association for the Advancement of Science

on June 24, 2020

http://science.sciencemag.org/

Dow

nloaded from