Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

  • Upload
    nando93

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    1/16

    LETTER doi:10.1038/nature13990

    Dissecting neural differentiation regulatorynetworks through epigenetic footprintingMichael J. Ziller1,2,3*, Reuven Edri4*, Yakey Yaffe4, Julie Donaghey1,2,3, Ramona Pop1,2,3, William Mallard1,3, Robbyn Issner1,Casey A. Gifford1,2,3, Alon Goren1,5,6, Jeffrey Xing1, Hongcang Gu1, Davide Cacchiarelli1, Alexander M. Tsankov1,2,3,Charles Epstein1, John L. Rinn1,2,3, Tarjei S. Mikkelsen1, Oliver Kohlbacher7, Andreas Gnirke1, Bradley E. Bernstein1,5,6,Yechiel Elkabetz41 & Alexander Meissner1,2,31

    Models derived from human pluripotent stem cells that accuratelyrecapitulate neural development in vitroand allow for the genera-tion of specific neuronal subtypes are of major interest to the stemcell andbiomedicalcommunity.Notchsignalling, particularlythroughthe Notch effector HES5, is a major pathway critical for the onsetand maintenance of neural progenitor cells in the embryonic andadult nervous system13. Here we report the transcriptional and

    epigenomic analysis of six consecutive neural progenitor cell stagesderivedfrom aHES5::eGFPreporterhumanembryonic stem cellline4.Using this system, we aimed to model cell-fate decisions includingspecification, expansion and patterning during the ontogenyof cor-tical neural stem and progenitor cells.In order to dissect regulatorymechanisms that orchestrate the stage-specific differentiation pro-cess, we developeda computationalframeworkto infer keyregulatorsof each cell-state transition based on the progressive remodelling ofthe epigenetic landscape and then validated these through a pooledshort hairpin RNA screen. We were also able to refine our previousobservations on epigenetic priming at transcription factor bindingsites and suggest here that they are mediated by combinations ofcore and stage-specific factors. Taken together, we demonstrate theutility of our system and outline a general framework, not limited

    to the context of the neural lineage, to dissect regulatory circuits ofdifferentiation.

    Weusedthe human embryonicstem(ES)celllineWA9 (alsoknownas H9) expressing GFP under the HES5promoter4 to isolate definedneural progenitorpopulationsof neuroepithelial(NE), early radial glial(ERG),mid radialglial (MRG) andlateradial glial (LRG) cellsbased ontheir cell morphology and Notch activation state5, as well as long-termneuralprogenitors (LNP)based on theirepidermalgrowthfactor recep-tor (EGFR) expression5,6 (Fig. 1a and Extended Data Fig. 1a). We tookthese definedstagesto createstrand-specificRNA sequencing (RNA-seq)data,chromatinimmunoprecipitationfollowedby sequencing(ChIP-seq)maps forhistoneH3 lysine 4 monomethylation (H3K4me1), trimethy-lation (H3K4me3), lysine 27 acetylation (H3K27ac) and H3K27me3 aswell as DNA methylation (DNAme) data by whole-genome bisulphite

    sequencing (WGBS) for the first four stages, and reduced representa-tionbisulphite sequencing (RRBS)for thelasttwo (LRG andLNP)stages(Fig. 1a and Supplementary Table 1).

    Global transcriptional analysis of the undifferentiated ES cells andthefirst four neural progenitorcell (NPC) stages identified 3,396 differ-entially expressed genes (Extended Data Fig. 1b, c and SupplementaryTable 2). Pluripotency-associated genes such asOCT4(also known asPOU5F1) andNANOGare, as expected, rapidly downregulated, andpan-neural genes are induced early and maintained throughout theremainder of the differentiation time course (Extended Data Fig. 1c).

    Using data from themouse Allen BrainAtlas as anin vivo reference forgenes expressed in different brain compartments and developmentalstages, we observed a consecutive shift of expression signatures alongthe NPC differentiation trajectory (Fig. 1b). NE through LRG tran-scripts suggest anterior neural fates, while the MRG and LRG stagesshow in addition some posterior identities (Fig. 1b, left). Accordingly,differentiated progeny derived from these populations express deep

    cortical layer neuronal markers (NEdN and ERGdN) such as FEZF2and BCL11B and superficial layer neuronal markers (MRGdN) such asPOU3F2/POU3F3 and MEF2C (Extended Data Fig. 1d). Progressionfrom early (NE) to late (LRG) stages was also accompanied by a trans-ition from predominantly neurogenic to mainly gliogenic potential,although LRGcells stillgenerate neurons (ExtendedData Fig.1d). Thisprogressive change in NPC identity aligns well with thein vivo order ofdevelopmental events7.

    In line with these observations, our WGBS data show changes inDNAme that can be separated into two overall patterns. The first ischaracterized by widespread loss of methylation and retention of theresulting hypomethylated state throughout subsequent differentiationstages (Fig. 1c, top right). This pattern coincides with major cell-fatedecisions such as commitment from ES cells to the neural fate and the

    transition from ERG to MRG, thelatter demarcating both peak of neu-rogenesis and onset of gliogenic potential (Fig. 1c, right middle). Thesecond pattern is defined by a stage-specific loss with subsequent gainat thenextstage,as observedduring the transitionfromNE to ERG andalso from MRG to LRG (Fig. 1c, right). Conversely, regions gainingDNAme during transition from one stage to another frequently residein a hypomethylated state in allpreceding stages,indicatingthe possiblesilencing of stem cell or pan-neural gene regulatory elements (Fig. 1c,left). At the histone modification level we also observed the most wide-spread changes during the initial neural induction (Fig. 1d); although itis worthnoting that thebiggest gain of therepressivemarkH3K27me3occurs at the MRG stage.

    These coordinated epigenetic changes are probably the result of dif-ferential transcriptionfactor activity811. Wetherefore developeda com-

    putational method to attribute the genome-wide changes in histonemodifications and DNAme at regions termed footprints to particulartranscription factors and quantified this remodelling potential (TERA,transcription factor epigenetic remodelling activity; Fig. 2a, ExtendedData Fig. 2a and Methods). Notably, theH3K27ac peak set in our NPCmodel was significantly enriched for single nucleotide polymorphismspreviously reported to be implicated in Alzheimers disease (P#0.01)and bipolar disorders (P#0.01) by genome-wide association studies,suggesting the possibility to utilize this differentiation system as a basisto study the genetic component of complex diseasesin vitro12,13. Next,

    *These authors contributed equally to this work.

    1These authors jointly supervised this work.

    1Broad Institute of MIT and Harvard, Cambridge, Massachusetts02142, USA. 2Harvard Stem Cell Institute, Cambridge, Massachusetts02138, USA.3Departmentof StemCell and RegenerativeBiology,

    HarvardUniversity, Cambridge, Massachusetts 02138, USA. 4Department of Celland DevelopmentalBiology, Sackler School of Medicine, TelAviv University,RamatAviv 6997801, Israel. 5Departmentof

    Pathology,MassachusettsGeneral Hospitaland HarvardMedical School,Boston, Massachusetts02114, USA. 6Centerfor SystemsBiology andCenterforCancerResearch,MassachusettsGeneralHospital,

    Boston, Massachusetts 02114, USA. 7Applied Bioinformatics, Center for Bioinformatics and Quantitative Biology Center, University of Tubingen, Tubingen 72076, Germany.

    0 0 M O N T H 2 0 1 4 | V O L 0 0 0 | N A T U R E | 1

    Macmillan Publishers Limited. All rights reserved2014

    http://www.nature.com/doifinder/10.1038/nature13990http://www.nature.com/doifinder/10.1038/nature13990
  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    2/16

    to identify potential key regulators of onset, maintenance and trans-ition through distinct NPC populations, we ranked all motifs and theirassociated transcription factors based on their TERA scores betweenconsecutive time points (Supplementary Table 3). We then retrievedthe transcription factors associated with highest scoring 40 motifs foreach cell-state transition (Fig. 2b). This analysis confirmed many well-known key regulators ofin vivo neural development and forebrain

    specification that are induced at theNE stage such asPAX6, OTX2 andFOXG1 (refs 1416)as well as various SOX proteins17. Notably, we alsofoundpredicted differential activity of distinct downstream componentsof signalling pathways such as a decrease ofSMAD4activity at the NEstage, consistent with inhibition of TGF-b signalling that promotesneural induction18. Another example that is predicted to be relevantbut not limited to the MRG stage isPOU3F2, known to be involved in

    ES cell

    NE

    ERG

    MRG

    H3K27ac

    ELP4 PAX6

    Footprints

    Genes

    a

    b

    MZF1

    GLIS3

    SF1

    SREBF1

    REST

    SMAD4

    SREBF2

    NR1H2

    ZNF263

    ZIC3

    YY1

    TCF12

    P

    OU2F2

    NFKB1

    MAF

    MAFF

    ZBTB33

    ATOH1

    NFYC

    MECP2

    STAT3

    NR2C2

    ETS1

    ZBTB7A

    E2F1

    ELF1

    ZFP42

    NR2F6

    CTF1

    NR4A1

    NFYA

    NFIB

    SMAD3

    STAT1

    NE

    UROD1

    TFAP4

    NFIX

    FOXO1

    DMRT3

    MYBL2

    FOXG1

    RBPJ

    SATB1

    RFX1

    ARID5B

    STAT5B

    SOX15

    TGIF1

    NR4A2

    MSX1

    ZBTB16

    TEF

    OTX2

    MEIS3

    MEIS1

    SOX9

    SOX5

    NFATC1

    TEAD3

    SOX2

    TCF4

    P

    OU3F2

    LEF1

    RFX4

    PAX6

    RFX5

    62 kb

    ES cell

    NE

    ERG

    MRG

    0 1010

    TERA score

    Figure 2| Distinct transcription factors are

    associated with stage-specific epigenetictransitions. a, Illustration of epigenomicfootprinting across thePAX6locus(chromosome 11: 31,780,01431,842,503) for dipsin H3K27ac regions (right). Black boxes highlightfootprints determined for H3K27ac peaks thatharbour various putative transcription factorbinding sites based on motif matching. b, The 40top ranked transcription factors predicted to beactivated during the cell-state transition areindicated on the bottom. Colour-coding representsnormalized transcription factor epigeneticremodelling scores, averaging overall TERAs basedon H3K4me3, H3K4me1, H3K27ac and DNAme.In addition, predictions were filtered for factorsexpressed at the stage of predicted induction.

    a

    b

    0.0

    1.5

    c

    DNAmeH3K27me3H3K4me1

    H3K27acH3K4me3

    Num

    bero

    freg

    ions

    104

    Gain Gain GainLoss Loss Loss

    0.0

    1.00.0

    1.00.0

    1.00.0

    1.0

    ES cell

    NE

    ERG

    MRG

    LRG

    LNP

    0.0

    1.0n=4,429

    n=2

    21

    n=1,211

    n=1,083

    n=293

    n=7,391

    n=1

    36

    n=4,953

    n=2,819

    n=866

    DNAme gain DNAme loss

    ESce

    llNE

    ERG

    MRG

    LRG

    LNP

    ESce

    llNE

    ERG

    MRG

    LRG

    LNP

    1.4 Mb

    SOX2 FLJ46066SOX2-OS

    DNAme

    RNA-seq

    Genes

    SOX2SOX2-OS

    100 kb

    ES cellNE

    ERGMRGLRG

    NEdNERGdNMRGdN

    LRGdA

    RSP

    Te

    l

    PHy

    p3

    p2

    p1M

    PPH

    PH

    PMH

    MH

    11

    .5

    13

    .5

    15

    .5

    18

    .5 4 14

    28

    Brain structures Developmental time (days)

    ES cells

    Neuroepithelial

    Early radial glial

    Mid radial glial

    d

    NPCs

    Differen

    t.

    0

    1

    2

    3

    H3K27ac

    OTX2

    Allen Brain Atlas for mouse

    H3K4me3

    H3K4me1

    H3K27me3

    H3K27ac

    ES cell NE NE ERG ERG MRG

    M

    eanexpress

    ion

    1.5

    Figure 1| Consecutive stages of ES-cell-derived neural progenitors arecharacterized by distinct epigenetic states. a, Left, schematic of the cellsystem. Middle, normalizedread-count levelfor H3K27ac over a 1.4-megabase(Mb) region around the SOX2locus (chromosome3: 180,854,252182,259,543) where SOX2-OSrefers to the SOX2overlapping transcript. ChIP-seq read counts were normalized to 1 million reads and scaled to the samelevel (1.5) for all tracks shown. Right, additional tracks for H3K4me3,H3K4me1 and H3K27me3 as well as DNAme (scale 0100%), OTX2bindingand expression covering a 100 kilobase (kb) sub-region (chromosome3:181,389,523181,490,148) of this locus. Histone and RNA-seq data werenormalized to 1 million reads and are shown on distinct scales. b, Maximumgene set activity levels shown aszscores for genes expressed in defined brain

    structures (left) and developmental time points (right) based on the mouseAllen Brain Atlas.Geneset activity wasdefinedas average expression level of allmember genes followed byzscore computation across all nine cell types.

    Different.,differentiated; LRGdA, LRG-derived astrocyte-likecells; RSP,rostralsecondary prosencephalone; Tel, telencephalon; PHy, peduncular (caudal)hypothalamus; p3, hypothalamus; p2, pre-thalamus; p1, pre-tectum; M,midbrain; PPH, prepontine hindbrain; PH, pontine hindbrain; PMH,pontomedullary hindbrain; MH, medullary hindbrain. Developmental timesare embryonic days 11.5, 13.5, 15.5 and 18.5 and postnatal days 4, 14 and 28.c, Distribution of DNAme levels for differentially methylated regions (changein methylation$0.2,P# 0.01) across state transitions; for instance,distributions for regions gaining methylation in the transition from ES cell toNE (top left) at all stages of differentiation. Distinct methylation level traceplots are shown for regions gaining methylation (left) during the specifictransitions(indicated on the side)and lossof methylation(right). Black labelled

    samples are based on WGBS data and grey colour samples (LRG and LNP)were profiled by RRBS.d, Bar plot showing the number of regions that gain orlose selected modifications across the first four cell-state transitions.

    RESEARCH LETTER

    2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 4

    Macmillan Publishers Limited. All rights reserved2014

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    3/16

    sub-ventricular zone expansion and superficial layer neuronal specifi-cation, and TCF12, whichis highlyexpressed in germinal zones duringbrain development19 (Fig. 2b and Supplementary Table 3).

    To obtain a higher-level overview of the processes and roles assoc-iated withthe distinctputativeregulators, we decomposed theH3K27acdata into seven distinct modules, each corresponding to a unique epi-genetic dynamic, genomic regionand upstream regulatorset (ExtendedDataFig. 2b,top). Gene set enrichment analysis20onthegenomicregions

    associated with each of the distinct modules revealed that the moduleactivatedupon neuralinductionand sustainedthroughout theMRG stageis strongly associated with stem cell maintenance and differentiation-related processes as well as Notch signalling (Extended Data Fig. 2b;module 2). Further analysis of upstream regulators of this module re-

    vealed a strong association withPAX6andFOXG1, suggesting a rolefor these factors in the general establishment and maintenance of thetelencephalic cortical identity of theNPC states (Extended Data Fig. 2c).

    To explore the relevance of predicted factors for each cellular state,we carried out a pooled short hairpin RNA (shRNA) screen against244 transcription factors and epigenetic modifiers selected based on ourRNA-seqdata (Fig. 3a,ExtendedDataFig. 3aandSupplementaryTable4).In total, we recovered 110 factors whose knockdown had a significant(Fig. 3b,qvalue# 0.05, mean empirical false discovery rate (FDR)5

    0.045, see Methods) negative impact on the number of HES5

    1

    cells inat least one differentiation stage (Supplementary Table 4), with highoverlapbetween thedistinct stages (Fig. 3c and ExtendedDataFig. 3b).Despite the expected high false-negative rate21 our screen consistently

    validated more than 50% of the predicted transcription factors with aknown motif for the top 20 motifs found at each stage (Fig. 3d andExtended Data Fig. 3c, d), while an expression-based identificationyielded only,30% recovery (Extended Data Fig. 3c). Among the topfactors recovered from the predictions at the early stage (NE and ERG)are the RFX proteins including RFX4, which has been implicated incortical and brain development22,23,FOXG1, as well asNR2F2, whoseparalogue NR2F1 has beenshown toserve asan intrinsic factor for earlyregionalization of the neocortex24,25. Gene set enrichment analysis ofputative genomic targets of NR2F2 (see Methods) in the NE cells fur-ther expands this role, suggesting involvement in telencephalon, dien-cephalon and posteriorhindbrain development (Supplementary Table5).At the MRGstage, we recovergenes involved in extensive neurogenesisandin commencing early gliogenesis such as NFIA and NFIB, which areinvolvedin bothrepressing the neuronalprogenitor statethroughNotchsignalling concomitantly with activating glial fates26, aswell as RESTa major pleotropic epigenetic regulator of neural cell-fate decisions27.

    Next, weselecteda setof 22 core factors withevidence to befunctionalat all stages as assessed by RNA-seq and the shRNA screening results(Extended Data Fig. 4a and Methods). In order to determine whetherthesubsetof corefactors with a DNA binding motifavailable(10 of22)exertsthe same function at each stage,we performed a co-binding analy-sis based on the predicted binding sites of 523 transcription factors indynamically regulated distal H3K27ac footprints. This analysis uncov-ered highly stage-specific relationships that were alsosupported by theobserved knockdown effect at each stage (Fig. 4a and Extended DataFig. 4b). Notably, most of the identified co-binding partners are eitherexpressed in a more stage-specific fashionor are onlyactivated in moremature neuronalor glial cell types (Fig. 4b). To furthervalidate some ofthese findings,we focused onOTX2 due to its high expression in allNPCpopulations (Fig. 4b) and performed ChIP-seq at the NE and MRGstages. OTX2 wasenriched at more targets in NE cells, of which around35% overlapped with MRG-bound sites (Fig. 4c and Extended DataFig. 4c). The shared target set is highly enriched for genes involved instem cell maintenance and differentiation as well as various pro-neuralgene sets knownto act duringadvanced stagesof forebrainand midbrainprogenitor cell maturation (Fig. 4d and Extended Data Fig. 4d). Thisbinding pattern combined with the observation that theOTX2targetgene set reaches peak transcriptional activity in the NEdN and ERGdN

    populations implies a role forOTX2 in the preparation of pro-neural

    genes expressed at later stages (Fig. 4d, e). These findings further suggesta model where a core set of transcription factors helps sustain NPCidentity throughout the differentiation time courseand at the same timeparticipates in the progression and modulation of NPC differentiationpotential through cooperation with stage-specific regulators.

    To gain a better understanding of how factors that are active at dis-tinct NPC stages contribute to their corresponding neuronal and glialcell propensities, we took advantage of the fact that many transcrip-tion factor binding sitesexhibita gain ofH3K4me1 andloss ofDNAmeat the early NPC stages before increased expression of their associatedgenes in more differentiated cell types (hereafter referred to as epige-netic priming) (Fig. 5a and Extended Data Fig. 5ac). For instance, weidentified three pro-neural factors that show evidence of priming, areinduced only at a later stage, and possess transcription factor bindingsites that are also significantly (P# 0.05 permutation test) associatedwith genes differentially expressed at a later stage (Fig. 5a, bold genes).Because these pro-neural genesare notexpressed at theearly NPCstagesbut in more mature cell types derived upon mitogen withdrawal, theidentification of such primingeventshighlightsthat theepigeneticstate

    is useful forpredicting regulatorsrelevant at later stages ofdifferentiation.

    HES5+

    depleted

    NE ERG

    20

    12

    9

    23

    4

    22

    20

    MRG GBX2

    ZIC1

    CUX2GFPCHD7

    ZBTB16DMRTA1

    SOX1

    ARNT2

    LIN28B

    TBR1

    RFX4

    NR2F2

    ATF5

    POU3F2

    ATF3

    YY1MEIS3

    FOXG1REST

    ZBED1

    ZEB1

    NFIB

    TCF4

    EOMES

    EMX2

    LHX5

    ZEB2

    a

    NE

    ERG

    MRG

    Viralinfection

    24 h 5 days

    RFX5

    PBX1

    PBX2

    NE

    ERG

    MRG

    gDNA

    Ctrl/HES5

    depleted

    Seq:

    HES5+

    maintenance/

    progression

    HES5/Ctrl

    HES5+proliferation

    differentiation

    c d

    b

    TERANE

    TERAERG

    TERAMRG

    NEERGMRG

    NEERGMRG

    NEERG

    MRG

    0 20 40 60

    Percentage recovered

    Ctrl

    +

    +

    +

    KD

    0 11

    Depletion score

    Figure 3| A pooled shRNA screen recovers predicted regulators ofin vitroNPC differentiation. a, Simplified schematic of thepooled shRNA screen(seeExtended Data Fig. 3 for more details). Ctrl, control; gDNA, genomic DNA;KD, knockdown; Seq., sequencing.b, Depletion scores for all genes that aresignificantlyreduced(qvalue# 0.05 forat least twodifferentshRNAs pergene)in at least one stage for fluorescence-activated cell sorting (FACS)-purifiedHES51 cells 6 days after knockdown compared to FACS sorted HES52

    obtained from the same infection or compared to cells collected 24h afterinfection (see Extended Data Fig. 3a). Depletion score indicates the extent towhich shRNAs targeting a particular gene were lost during the knockdownperiod relative to thecontrol,indicating potentialrelevance of a particular genefor HES51 maintenance, NPC state progression and proliferation or cell

    survival. Higher depletion scores (red) indicate stronger reduction in shRNApresence; scores were capped at 1 and computed based on at least threetechnical replicates per condition.c, Overlap of genes detected to besignificantly depleted in the HES51 population relative to at least one of thecontrol conditions. d, Performanceof combined regulatorpredictions basedonTERA ranking averaged over H3K4me3, H3K4me1, H3K27ac and DNAme.Performance is measured as percentage of the top 20 predicted activating orrepressing motifs for each stage mapping to transcription factors included inthe shRNA library.

    LETTER RESEARCH

    0 0 M O N T H 2 0 1 4 | V O L 0 0 0 | N A T U R E | 3

    Macmillan Publishers Limited. All rights reserved2014

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    4/16

    NE

    a

    c d

    b

    Stem cell differentiation

    Stem cell development

    Stem cell maintenance

    Somatic stem cell maintenance

    Neural tube patterningDorsal/ventral axis specication

    Cell proliferation in forebrain

    Spinal cord neuron diff.

    Corpus callosum

    Forebrain of neurons

    Midbrain development

    Midbrainhindbrain boundary

    Negative regulation of neuron diff.

    Notch signalling pathway

    Negative regulation of FGF

    Non-canonical Wnt

    Wnt signalling

    Regulation of timing of cell diff.

    Spinal cord

    Cerebellum primordium

    Future midbrain

    Diencephalon

    Future prosencephalon

    Rhombomere

    ES cell

    NE

    ERG

    MRG

    LRG

    NEdN

    ERGdN

    MRGdN

    LRGdA

    NE

    MRG

    e

    Corenetwork

    ESce

    ll

    NE

    ERG

    MRG

    LRG

    NEdN

    ERGdN

    MRGdN

    LRGdN

    MRG

    NE

    ERG

    MRG

    E2F

    4

    PAX6

    RFX5

    CREB3

    OTX2

    NR2F2

    THRA

    PBX2

    ATF5

    CREM

    TEAD

    3

    ZBTB

    7B

    TGIF1

    YY1

    TWIST1

    JDP2

    ATF3

    NR4A1NR6A1ATF4SOX21EGR2MYBL2

    JUNET

    V4

    TFAP2B

    NFIL3

    ZBTB16

    GLI3

    NEUROD4

    NE

    UR

    OG

    2

    TBR1

    OTX1

    LHX

    2ZIC3

    OLIG2

    SREBF1

    PBX1

    NFKB

    1ZN

    F423NR

    3C1N

    FIANF

    IBFOX

    O3KLF13

    CUX2

    MAX

    MAF

    CTNNB1

    TCF4

    ZEB1

    ZNF148

    RFX4

    POU3F2

    MEIS

    2

    NR2F1

    CENPB

    MEIS3

    MYBL1

    GBX2

    SOX8 E2F4

    PAX6RFX5CREB3OTX2NR2F2

    THRAPBX2ATF5CREMTEAD3ZBTB7BTGIF1

    YY1TWIST1JDP2ATF3NR4A1NR6A1ATF4SOX21

    EGR2MYBL2JUNETV4TFAP2BNFIL3ZBTB16

    GLI3NEUROD4NEUROG2TBR1OTX1LHX2ZIC3

    OLIG2SREBF1PBX1NFKB1ZNF423NR3C1NFIANFIB

    FOXO3KLF13CUX2MAXMAFCTNNB1TCF4

    ZEB1ZNF148RFX4POU3F2MEIS2NR2F1CENPB

    MEIS3MYBL1GBX2SOX8

    NE MRG

    0 22

    Expression

    0 1.51.5

    Expression

    6 140

    log10

    qvalue

    7,556 1,873 3,479

    Figure 4| A set of core transcription factorsdynamically associates with stage-specific factorsto modulate NPC identity and differentiationpotential. a, Predicted significant (P# 0.01,enrichment$ 1.5) co-binding relationships indynamically regulatedH3K27ac footprints for a setof 10 transcription factors (bold, core network)required by HES51 cells in at least two stages.Stage-specific predicted co-binding relationships

    are indicated in blue (NE), red (ERG) and grey(MRG). All predicted relations were filtered forsupport by a knockdown effect of each gene at therelevant stage.b, Gene expression patterns shownaszscores for the core network transcriptionfactors as well as all predicted co-binding partnersacross ES cells, all NPCs and more mature cellularstates.c, Venn diagram showing the overlap ofOTX2binding sites determined by ChIP-seq inearly NE and MRG cells.d, Gene set enrichmentanalysis results forOTX2binding sites in early NEand MRG cells.e, Median expression patterns forES cells, allNPCsand more maturecellpopulationsshown aszscores for putative downstream targetgenes ofOTX2binding sites.

    NEERGMRGLR

    GNE

    dN

    ERGdN

    MRG

    dN

    LRGdA

    EScell

    E

    ScellvsNE

    N

    EvsERG

    E

    RG

    vsMRG

    TERA

    a b

    NE

    ERG

    MRG

    1 2 3 4 5

    H3K27acH3K4me1

    Expression

    NEdN

    ERGdN

    MRGdN

    LRGdA

    NE

    ERG

    MRG

    ES cell

    LRG

    NEUROD4expression

    zscore1 0 1

    2 0 2

    zscore

    1 2 3 4 5

    ES cell

    H3K27ac

    H3K4me1

    H3K4me3

    DNAme1010

    TERA/expression

    BARHL1

    TCF4

    FOXG

    1

    SOX2

    NR2F6

    TCF7L1SOX21PA

    X6

    IRF3

    OTX

    2RFX5

    CTNN

    B1EMX2

    TEAD3

    E2F1

    PEBP1

    NFIX

    ASCL2

    NEUROD4

    Figure 5| Binding of core and stage-specific NPC transcription factors isassociated with epigenetic priming of pro-neural genes. a, Characterizationof transcription factors associated with motifs gaining H3K4me1 or losingDNAme at theNE stage beforetheirexpression at a later or more differentiatedcell state as determined by high TERA scores (bold), termed priming. Inaddition, significant (P# 0.01, enrichment$ 1.5) co-binding relationshipswith factors expressed at the NE stage are indicated by coloured lines. For eachtranscription factor (from outer to inner circles, see example to the right forNEUROD4) heat maps indicating the relative expression level as azscore in all

    cell types as well as normalized TERA scores for H3K27ac, H3K4me3,H3K4me1 and DNAme.b, Top, heat maps depicting the H3K4me1 (left) andH3K27ac (right) enrichment levelfor predicted NEUROD binding sitesat eachNPC stage for five distinct dynamic patterns. At the NE and ERG stages,none of the NEUROD family of proteins is expressed at high levels(,3.5 fragments per kilobase of transcript per million mapped reads). Bottom,heat map showing the zscores of the median gene expression levels forpredicted NEUROD downstream target genes for each of the five dynamicpatterns in the more mature neuron- and astrocyte-like populations.

    RESEARCH LETTER

    4 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 4

    Macmillan Publishers Limited. All rights reserved2014

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    5/16

    In order to pinpoint transcription factors potentially involved in facil-itating these priming events at therespective NPC stages, we determinedsignificant predicted co-binding relationships between the subset ofpro-neuraltranscription factorsand factorsthat in contrast are expressedat the stage of priming (Fig. 5a).

    To specifically investigatethe hypothesisthat a partof thepro-neuralbinding site landscape is epigenetically primed at the NPC stages, wefocused on predicted NEUROD protein family binding sites within

    H3K27acfootprintsand definedfive patterns of H3K27acandH3K4me1enrichments acrossthese sites (Fig. 5b). We found thatgenes associatedwith predicted NEUROD binding sites in regions gaining H3K27ac orH3K4me1 enrichment at distinct stages of NPC progression are upre-gulated in more mature populations derived from the respective NPCstage (Fig. 5b and Extended Data Fig. 5d). Consistent with theidea of acomprehensive preparation of the epigenetic landscape during lineagespecification,NEUROD bindingsites thatretain high levelsof H3K27acand H3K4me1 throughout the entire differentiation time course areassociated withvarious anterior and posterior cortical structures as wellas early and late developmental time points (Extended Data Fig. 5e).

    These results support a model where selected transcription factorsatthe NPC stage remodel the binding site repertoire for pro-neural fac-tors by preparing the epigenetic landscape at their respective targets.First the general lineage landscape is established upon commitment tothe neural fate, followed by the stage-specific modulation of primedpro-neural binding sites. This in turn might serve as a mechanism torestrict their binding space in order to ensure proper neuronaland glialdifferentiationcapacity. In additionto these insightsinto the epigeneticdynamicsduring differentiation, we provide a generalanalysis strategyto interpret differences in epigenetic landscapesbased on cell-fate regu-latory transcription factors. Thisstrategy canbe readilyapplied to otherdata sets including the extensive collection of the NIH RoadmapEpigenomics Project (Supplementary Table 3).

    Online ContentMethods, along with any additional Extended Data display itemsandSource Data, areavailablein theonline versionof thepaper; references uniqueto these sections appear only in the online paper.

    Received 19 November 2013; accepted 21 October 2014.

    Published online 24 December 2014.

    1. Imayoshi, I., Sakamoto, M.,Yamaguchi,M., Mori, K. & Kageyama, R. Essentialrolesof Notchsignaling in maintenance of neural stem cells in developing and adultbrains.J. Neurosci. 30, 34893498 (2010).

    2. Shimojo, H., Ohtsuka, T. & Kageyama, R. Dynamicexpression of notch signalinggenes in neural stem/progenitor cells.Front. Neurosci. 5, 78 (2011).

    3. Carlen,M. et al.Forebrain ependymal cells are Notch-dependent and generateneuroblasts and astrocytes after stroke.Nature Neurosci. 12,259267 (2009).

    4. Placantonakis, D. G. et al.BAC transgenesis in human embryonic stem cells as anovel tool to define the human neural lineage. Stem Cells27,521532 (2009).

    5. Elkabetz, Y.et al. Human ES cell-derived neural rosettes reveal a functionallydistinct early neural stem cell stage. Genes Dev.22,152165 (2008).

    6. Lafaille, F. G. et al. Impaired intrinsic immunity to HSV-1 in human iPSC-derivedTLR3-deficient CNS cells.Nature 491, 769773 (2012).

    7. Lui, J .H. etal. Developmentand evolution of thehuman neocortex. Cell 46, 1836(2011).

    8. Voss,T. C. & Hager, G. L. Dynamicregulationof transcriptionalstatesby chromatinand transcription factors.Nature Rev. Genet.15, 6981 (2014).

    9. Ziller, M. J. et al. Charting a dynamic DNA methylation landscape of the humangenome. Nature 500,477481 (2013).

    10. Gifford, C.A. etal. Transcriptionaland epigenetic dynamicsduring specification ofhuman embryonic stem cells.Cell153,11491163 (2013).

    11. Arnold, P. et al. Modeling of epigenome dynamics identifies transcription factorsthat mediate Polycomb targeting. Genome Res. 23, 6073 (2012).

    12. Maurano, M. T. et al.Systematic localization of common disease-associatedvariation in regulatory DNA.Science337, 11901195 (2012).

    13. Claussnitzer, M.et al.Leveraging cross-species transcription factor binding sitepatterns: from diabetes risk loci to disease mechanisms.Cell 156, 343358(2014).

    14. Gotz, M., Stoykova, A. & Gruss, P. Pax6controls radial glia differentiation in thecerebral cortex. Neuron21,10311044 (1998).

    15. Hanashima,C., Li,S. C., Shen, L.,Lai,E. & Fishell,G. Foxg1 suppresses early corticalcell fate.Science303, 5659 (2004).

    16. Martinez-Barbera, J. P.et al.Regionalisation of anterior neuroectoderm and itscompetence in responding to forebrain and midbrain inducing activities dependon mutual antagonism between OTX2 and GBX2.Development128, 47894800(2001).

    17. Pevny, L. H., Sockanathan, S., Placzek, M. & Lovell-Badge, R. A role for SOX1 inneural determination. Development125, 19671978 (1998).

    18. Chambers, S. M. et al.Highly efficient neural conversion of human ES and iPScells by dual inhibition of SMAD signaling.Nature Biotechnol. 27,275280(2009).

    19. Uittenbogaard, M. & Chiaramello, A. Expression of the bHLH transcription factorTcf12(ME1) gene is linked to the expansion of precursor cell populations duringneurogenesis.Gene Expr. Patterns 1, 115121 (2002).

    20. McLean, C. Y.et al. GREAT improves functional interpretation of cis-regulatoryregions.Nature Biotechnol.28,495501 (2010).

    21. Sims, D. et al.High-throughput RNA interference screening using pooled shRNAlibraries and next generation sequencing. Genome Biol.12,R104 (2011).

    22. Blackshear, P. J. et al.Graded phenotypic response to partial and completedeficiency of a brain-specific transcript variant of the winged helix transcriptionfactor RFX4. Development130,45394552 (2003).

    23. Zarbalis, K. et al. A focused and efficient genetic screening strategy in the mouse:identification of mutations that disrupt cortical development. PLoS Biol. 2, e219(2004).

    24. Zhou, C., Tsai, S. Y. & Tsai, M. J. COUP-TFI: an intrinsic factor for earlyregionalization of the neocortex.Genes Dev. 15,20542059 (2001).

    25. Faedo, A. et al.COUP-TFI coordinates cortical patterning, neurogenesis, andlaminar fate and modulates MAPK/ERK, AKT, andb-catenin signaling. Cereb.Cortex18, 21172131 (2008).

    26. Piper, M.et al.NFIA controls telencephalic progenitor cell differentiation throughrepression of the Notch effectorHes1.J. Neurosci. 30,91279139 (2010).

    27. Qureshi, I. A.,Gokhan, S. & Mehler,M. F. REST andCoRESTare transcriptionalandepigenetic regulators of seminal neural fate decisions.Cell Cycle9,44774486(2010).

    Supplementary Informationis available in theonline version of the paper.

    AcknowledgementsWe wouldlike tothank allmembers ofthe Meissnerand Elkabetzlaboratories;we thank L. Studer(Sloan-KetteringInstitute) forthe HES5::eGFPreporterline; we also thank F. Kelley and other members of the Broad Sequencing Platform,J. Doench and members of the Genome Perturbation Platform at the Broad Institute,D.-A. Landau forcritical readingof themanuscript, as wellas to I. Shur andO. Sagi-Assifat Tel Aviv University for their extensive FACS operation. We also thank L. Gaffney for

    graphical support. This workwas funded by the NIH Common Fund (U01ES017155),NHGRI (HG006911), NIGMS (P01GM099117), the New York Stem Cell Foundation,the Israel Science Foundation (ISF) (1126/10, 1710/10) and a Marie CurieInternational Reintegration Grant (IRG277151). A.Go. is supported by the CharlesH. Hood Foundation and A.M. is a New York Stem Cell Foundation RobertsonInvestigator.

    Author ContributionsThe study was designed by M.J.Z., Y.E. and A.M. R.E. and Y.E.developed the NPC system, R.E. and Y.Y. performed consecutive cell isolation,propagation and differentiation and conducted the shRNA screen with Y.E.. M.J.Z.performed the analysis and designed the shRNA screen. W.M. and J.L.R. helped withRNA-seq data processing and analysis. J.D. performed transcription factor ChIP-seqexperiments. R.P. and C.A.G. performed RNA-seq library construction. R.P. and D.C.performed shRNA library construction. T.S.M. provided experimental advice. R.I., J.X.and A.Go. conducted histone ChIP-seq experiments. H.G.performed WGBSand RRBSlibraryconstruction, A.Gn. and A.M.supervised theDNA methylation profiling, C.E. andB.E.B. provided experimental input and advice for the ChIP-seq experiments. A.M.T.provided the transcription factor ChIP-seq protocol. O.K. assisted in the designof

    analytical methods.M.J.Z,Y.E. andA.M. interpretedthe dataand wrote themanuscript.Author Information All data are available from the GEO database under accessionnumber GSE62193, the NIH Roadmap (http://www.roadmapepigenomics.org/data )and NCBI Epigenomics portal (http://www.ncbi.nlm.nih.gov/epigenomics ). Reprintsand permissions information is available at www.nature.com/reprints. The authorsdeclare no competing financial interests. Readers are welcome to comment on theonline version of the paper. Correspondence and requests for materials should beaddressedto Y.E.([email protected]) orA.M. ([email protected]).

    LETTER RESEARCH

    0 0 M O N T H 2 0 1 4 | V O L 0 0 0 | N A T U R E | 5

    Macmillan Publishers Limited. All rights reserved2014

    http://www.nature.com/doifinder/10.1038/nature13990http://www.nature.com/doifinder/10.1038/nature13990http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62193http://www.roadmapepigenomics.org/datahttp://www.ncbi.nlm.nih.gov/epigenomicshttp://www.nature.com/reprintshttp://www.nature.com/doifinder/10.1038/nature13990mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.nature.com/doifinder/10.1038/nature13990http://www.nature.com/reprintshttp://www.ncbi.nlm.nih.gov/epigenomicshttp://www.roadmapepigenomics.org/datahttp://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62193http://www.nature.com/doifinder/10.1038/nature13990http://www.nature.com/doifinder/10.1038/nature13990
  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    6/16

    METHODSCulturing undifferentiatedhuman ES cells.HES5::eGFPbacterialartificialchro-mosome transgenic human ES cells (H9; WA9; Wicell) expressing GFP under theHES5promoter were cultured on mitotically inactivated mouse embryonic fibro-blasts(MEFs) (Globalstem). UndifferentiatedES cells were maintained as describedpreviously5 in medium containing DMEM/F12, 20% KSR, 1 mM glutamine, 1%penicillin/streptomycin,non-essentialamino acidsandb-mercaptoethanol.Undif-ferentiated ES cells were purified with pluripotency markers Alexa 647-conjugatedTra-1-60 and phycoerythrin-conjugated SSEA-3 (BD Pharmingen).

    Neural induction and long-term propagation of NPCs. Neural differentiationof ES cells was performed as described in refs 5,18. In brief, neuroepithelial cellsweregenerated either by monolayerinductionwithdissociatedES cellsplatedonMatrigel (BD biosciences)or by co-culture on MS5 stromal cells. In both casesneural fate was directed by dualSMAD inhibition protocol18. Neural rosettes gen-erated fromboth inductionmethodswere harvestedmechanically during all stagesof differentiation and replated on culture dishes pre-coated with 15mg ml21 poly-ornithine (Sigma), 1mg ml21 laminin (BD Biosciences) and 1mg ml21 fibronectin(BD Biosciences) (Po/Lam/FN) in N2 medium composed of DMEM/F12 and N2supplement (Invitrogen).N2 supplement contained insulin, apo-transferin,sodiumselenite, putrecine and progesterone. This medium was supplemented with sonichedgehog(30 ngml21),fibroblast growthfactor 8 (FGF8;100ng ml21) andbrain-derived neurotrophicfactor (BDNF)(20 ngml21) (all from R&DSystems)to induceand maintain early anterior regionalization of NE cells. These factors were gradu-ally replaced by FGF2 (20ng ml21)andEGF(20ngml21) inthe following2 weeksof differentiation in order to maintain a proliferative (FGF and EGF responsive)NPCstate.NPCs fromall stages werecollectedat indicated daysand FACS purifiedfor HES5::eGFP(NEto LRG) orEGFRfor LNPs to purifyforthe highest NPCstatefor each stage. NE cells were collected at day 12 of differentiation, ERG cells werecollected at day 14, mid-neurogenesis radial glial (MRG) cells were collected atday 35, late-gliogenic radial glial (LRG) cells were collected at day80, and long-term NPCs (LNP) were collectedat day220. At each stage cells were eithersplit forthe next passage or subjected to FACS purification for HES5::eGFPas described.All replating was performed on Po/Lam/FN-coated dishes. For generating maturedifferentiated populations, HES51 sorted NPCs were seeded at high density andsubjected to mitogen withdrawaldifferentiation medium for 17 dayswhich includedN2 supplemented withascorbic acid/BDNF(neuronal; NEdN, ERGdN, MRGdN)or 5% fetal bovine serum (FBS) (Invitrogen) (glial; LRGdA). Additional experi-mental details and in-depth characterization of these cell types are provided inElkabetz and colleagues (manuscript in preparation).Chromatin immunoprecipitation followed by sequencing (ChIP-seq).For the

    histone ChIP experiments, we used similar approaches to ref. 28. Specifically,around 160,000 cells were crosslinked in 1% formaldehyde for 10 min at 37 uC,followed by quenching with 125 mM glycine for 5min at 37 uC, washed with PBScontaining protease inhibitor (Roche, 04693159001) and flash-frozen in liquidnitrogen. To lyse the cells, we used 1% SDS, 10 mM EDTA and 50 mM Tris-HCl,pH 8.1 complementedwith a proteaseinhibitor. The chromatin was thenfragmen-tedwith a BransonSonifier (model S-450D)at 4 uC,and calibratedto a size rangeof200 and 800basepairs (bp). Chromatin was mixedwith antibodyand incubated at4 uC overnight. Protein A and Protein G Dynabeads were added to chromatin/antibody mix (Invitrogen, 100-02D and100-07D, respectively) and incubated for12h at 4 uC. Samples were washed six times with RIPA buffer (10mM Tris-HCl,pH8.0, 1 mMEDTA,pH 8.0, 14mM NaCl, 1%Triton X-100,0.1%SDS, 0.1% DOC),twice with RIPA buffer containing 500 mM NaCl, twice with LiCl buffer (10mMTE, 250 mM LiCl, 0.5% NP-40, 0.5% DOC), twice with TE (10 mM Tris-HCl,pH 8.0, 1 mM EDTA), and then eluted in elution buffer (10mM Tris-Cl, pH8.0,5 mM EDTA, 300 mM NaCl, 0.1% SDS, pH8.0) at 65 uC. Eluate was treated with

    RNaseA(Roche, 11119915001)and ProteinaseK (NEB, P8102S)overnight at 65uC.Forthe OTX2 ChIP cells were collectedandcrosslinkedin 1%formaldehydefor

    15 min on ice, quenched with 125mM glycine for 5 min at room temperatureandpelleted. Nuclei werethenisolated andchromatinwas digested at 37uCwithMNaseenzyme until themajority of theDNA wasbetween 50 and800 bp.Specifically,25 Uand 35U ofMNase enzyme were usedto digest NEcellsandRNS/RGcells, respect-ively. Thechromatinwas thenincubatedwiththe antibodies over night at4 uCandco-immunoprecipitation of antibodyprotein complexes was performed withProtein A or G beads for 12 h at 4 uC.

    All antibody catalogue and lot numbers are listed next to the data set for whichthey were used in Supplementary Table 1.ChIP-seq library preparation and sequencing. To extract DNA and create theIllumina libraries we used solid-phase reversible immobilization (SPRI) beads.The SPRI beads were added to the samples, mixed 15 times, and incubated for2 min at room temperature. Supernatant was extracted from the beads on a mag-net (4 min). 70% ethanol was used to wash the beads and then dried for another

    4 min. Forty microlitres of EB buffer (10 mM Tris-HCl, pH 8.0) was used to elute

    the DNA.The next steps of Illumina library construction include end repair, addi-tion of A-base, ligation of barcoded adaptors and PCR enrichment. To minimizetheloss of ChIPmaterialthroughout thisprocedure, weused a generalSPRI cleanupprocedure after each reaction step reusing the same beads. PEG buffer (20% PEGand 2.5 M NaCl) was used to re-bind ChIP material to SPRI following each reac-tion, andwashing andextractionoccurred asstated above. Theenzymatic reactionswerecarried as follows:(1) DNAend-repair: EpicentreEnd-IT Repairkit incubatedat room temperature for45 min; (2)A-base addition: Klenow (39R59 exonuclease;New England Biolabs) incubated at 37 uC for 30 min; (3) adaptor ligation: DNA

    ligase (New England Biolabs) and indexed oligo adaptors and incubated at 25uC

    for15 min, followed by 0.73SPRI/reaction to removenon-ligated adaptors;(4) PCRenrichment: PCR mastermix (primer set,dNTP mix,Pfu Ultra Buffer(Agilent), PfuUltra-II Fusion (Agilent), water), for 20 cycles. The PCR amplified libraries wecleaned up using 0.73 SPRI/reaction (size selection mode) to remove excessiveprimers. Roughly 5 picomolesof DNA library was then applied to each lane of theflow cell and sequencedon Illumina HiSeq2000 sequencers accordingto standardIllumina protocols.

    For the OTX2 ChIP, DNA libraries were constructed using standard Illuminaprotocols for blunt-ending, poly(A) extension, and ligation. MyOne Silane beads(Life Technologies 37002D) were used to purify DNA fragments following eachstepof the library preparation.Adaptorligation wasperformed overnightat 16 uC.Ligated DNA wasthen PCR amplified and gel size selectedfor fragments between150 and 700 bp. Samples were sequenced using Illumina HiSeq at a target sequen-cing depth of 20 million uniquely aligned reads.

    Strand-specific RNA-sequencing library construction. RNA was extracted

    using the miRNeasy kit (Qiagen, 217004). Poly(A) RNA was isolated using Oligod (T25) beads (NEB, E7490L).The poly(A) fraction wasthen fragmented (Invitrogen,AM8740).Fragments smaller than200 bp were eliminated (Zymo, R1016) andtheremaining fractionwas treatedwith FastAPThermosensitiveAlkaline Phosphatase(Thermo Scientific, EF0652) and T4 Polynucleotide Kinase (NEB, M0201L). RNAwas then ligated to a RNA adaptor as reporter previously29 using T4 RNA Ligase 1(NEB, M0204L), which was then used to facilitate complementary DNA synthesisusingAffinityScript MultipleTemperatureReverse Transcriptase (Agilent, 600105).More specifically, we used the following adaptors reported in ref. 29: RNA sequen-cing, RiL-1939RNA adaptor: prArGrArUrCrGrGrArArGrArGrCrGrUrCrGrUrG/ddC; RNA sequencing, AR17 reverse transcription primer: ACACGACGCTCTTCCGA; RNAsequencing,3Tr3 59DNA adaptor: pAGATCGGAAGAGCACACGTCTG/ddC;RNA sequencing,PCR enrichment:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCAAGCAGAAGACGGCATACGAGATNNNNNNNNGTGACTGGAGTTCAGACGTGTGCTC

    TTCCGATCT.RNA was then degraded and the cDNA was ligated to a DNA adaptor using T4RNA Ligase 1 as describedpreviously29. Final library amplification was completedusing NEBNext High Fidelity 2X PCTMaster Mix(M054L). To clean up thefinalPCR and removed adaptor dimers, two subsequent 13 and 0.83 SPRI reactionswere completed to prepare the final library for sequencing.

    PooledshRNA screen. We selected244 transcriptionfactors andepigeneticmodi-fiers that were differentially or continuously highly expressed during our in vitrodifferentiationtime course in an otherwise unbiasedfashion (Supplementary Table4). In addition, we includedGFP, RFP,LacZ andluciferaseas internalcontrols. Wethen obtained a sub-poolof thehuman 45K shRNApool30 distributedby the BroadInstitute Genomic Perturbations Platform and the RNAi Consortium (TRC)against these genes. For each gene, five distinct shRNAs were included as well asfive scrambled and three empty control vectors, amounting to a total of 1,2301 8shRNAs. The plasmid for shRNA expression under the control of the constitutiveU6 shRNA promoter was the lentiviral vector pLKO.1. shRNA pool production

    and infection conditions were performed as previously described30

    . Subsequently,we performed calibration experiments to determine to optimal combination ofmultiplicity of infection (MOI) and puromyocin concentration to ensure efficientselection. We identified MOI 0.4 and 1 mg ml21 of puromycin as optimal para-metersfor allstages.We then infected26 millioncellsat each stage ofNE, ERGandMRG to ensure sufficient shRNA integration events to recover the complexity ofthe shRNA library.Twenty-four hours postinfection andbefore fullexpressionbutafter integration of the lentivirus into the genome we collected 3 million cells todetermine our baseline shRNA library representation. Subsequently, we subjectedthe cellsto 5 days of puromycin selectionand thenFACS sorted the resulting popu-lationsintoHES51andHES52 compartments.Next, we assessed the representationof theshRNAlibraryin each of the9 populations byretrieving allshRNAintegrationevents from genomic DNA isolatedfrom each sample using PCR followed by next-generation sequencing as previously described31. More specifically, we performedtwo rounds of PCR using the following primers for the primary PCR: primaryreverse: CTTTAGTTTGTATGTCTGTTGCTATTAT; primary forward:AATGG

    ACTATCATATGCTTACCGTAAC. For the second, nested PCR we used: nested

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2014

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    7/16

    forward: GGCTTTATATATCTTGTGGAAAGGA; nested reverse: GGATGAATACTGCCATTTGTCTC.

    Next, we performed standard Illumina sequencing library construction as out-lined above for four technical replicates for NE and MRG and three technicalreplicates for ERG, eachcomprising HES51, HES52 and 24-h control, amountingto a total of 33 libraries.We thensequencedtheseamplicon libraries on a HiSeq2500with a PhiX spike-in of 25%.

    Individual shRNA validation for OTX2 and PAX6.RNA was extracted usingmiRNeasy kit (Qiagen) followed by Maxima reverse transcription reaction kit

    (Fermentas). One nanogram of cDNA was subjected to quantitative PCR (qPCR)usingour custom-designedprimersandthe ABsoluteQPCRSYBR Green ROXMix(ABgene)on a ViiA-7cycler (ABI).Thresholdcycle valuesweredetermined intripli-cates and presented as average compared to HPRT. Fold changes were calculatedusing the 2{DDCT method.

    WGBS and RRBS library production. WGBS libraries were generated as prev-iously described in ref. 10. RRBS was carried out using the multiplexed, gel-freeprotocol described in ref. 32.

    Data processing.For RNA-seq data processing, reads were trimmed to 80, 60 or30 bp depending on theirper-base quality distributionto achieve maximum align-ment rates. Reads were mapped to the human genome (hg19) using TopHat v2.0(ref. 33) (http://tophat.cbcb.umd.edu) employing the unfiltered gencode.v19.annotation.gtf annotation as the transcriptome reference. TopHat was run withdefault parameters except for the coverage search being turned off. Transcriptexpression was estimated with Cuffdiff 2 (ref. 34). The workflow used to analysethe data are described in detail in ref. 35 (alternate protocol B).

    WGBS libraries were aligned using BSMap 2.7 (ref. 36) to the hg19/GRCh37referenceassembly. Subsequently,CpG methylation calls weremade using customsoftware as previously described9, excludingduplicate, low-quality reads as well asreads with more than 10% mismatches. Only CpGs with more than 53 coveragewere considered for further analysis.

    ChIP-seqdatawere aligned to thehg19/GRCh37 reference genomeusingMAQ37

    version 0.7.1 with default parameter settings or Bowtie 2 version 2.05 (ref. 38).Reads were filtered for duplicates and extended by 200 bp at the end of the read.Visualizationof read count data wasperformed by converting raw BAMfiles to .tdffiles using IGVtools39 andnormalizingto 1 million reads. Fragment-length-extended,duplicate and quality-filtered reads were used for subsequent analysis.

    shRNA screen data analysis. For the screen data analysis, we followed the pro-tocol outlined in ref. 40 employing the R package limma41. First, we extracted andcounted the number of times each shRNA was observed in each library using theshRNA sequence as barcode and the R function processHairpinReads(). Next, we

    normalized the shRNA counts to the total number of reads observed containing ashRNA to counts per million (cpm) and retained only those shRNAs with morethan 0.5cpm in more than 2 samples. After further quality control showing excel-lent reproducibility (Extended Data Fig. 3f), we performed differential shRNAcount analysis between the HES51 and 24-h control and the HES51 and HES52

    populations for each stage. To that end we first estimated the dispersion for eachcondition and then fitted a negative binomial generalized linear model using the RpackageedgeR.We thenconducteda likelihood ratio testfor eachcontrast andonlyretain those shRNAs as differentiallyenriched at a FDR# 0.05. To determine geneswithsignificant positiveor negativeimpacton HES51maintenance or cellsurvival,we determined all genes that were targeted by at least two independent shRNAswhich showed a significant effect (FDR# 0.05) in the same direction. We thencomputed a mean effect score in order to rank genes by computing the weightedmean of the log fold change between the two conditions weighted by the log cpmacross all significant shRNAs and targeting a particular gene with an effect in thesamedirection. If anequalnumberof shRNAsshoweda significanteffectin positive

    or negative direction, we classified the gene as not significantly affected. Otherwisewe chose the effect direction based on the majority of the shRNAs. We then com-binedthe results from theHES51 to 24-h control and HES52 comparison intooneby taking the maximum mean effect score observed in either comparison. Theresulting mean effect scores are then used for visualization and analysis purposesin maintext andfigures andare reportedin SupplementaryTable 3. In addition, wealso calculated an empirical FDR by determining the fraction of shRNAs with astatistically significant effect based on the generalized linear model but were notexpressed based on the RNA-seqdata for the condition where the significant effectwas observed.

    Forthe TERA validation analysis,we rankedall motifsaccording to their TERAscores at each stage. Next, we filtered out motifs that were not associated with atleast one transcription factor that was covered in our screen design. We thendetermined the fraction of top 20 motifs (by absolute TERA values) that werelinked to transcriptionfactors which showed a significanteffect in the correspond-ingstage-specificshRNA screen. We report thisnumberas the percentage of motifs

    recovered. Only motif-knockdownresults that havea straightforward interpretation

    were considered as hits. These include: (1) positive TERA score and positivedepletion score (gene is involved HES51 maintenance, progression or cell sur-

    vival); (2) negative TERA score and negative depletion score (impedes HES51

    maintenance, progression or apoptosis); (3) negative TERA score and positivedepletionscore (geneis involvedHES51maintenance, progression or cell survivalbut most likely acts as a repressor by causing H3K27ac or H3K4me3/1 loss). Forthe comparison withthe expression-based analysis, we ranked all significantlydif-ferentially expressed genes by their absolute fold change and determined the frac-tion of top 20 transcription factors observed among the differentially enriched

    shRNAs in the screen.Differential expression analysis. Differential expression analysis was carried outusing Cuffidff 2 (ref. 34) and genesdifferentially expressedat a FDR#0.1for eachcomparison and a minimal expression level of 1 FPKM in at least one of the con-ditions were considered. Clustering analysis was performed using the csCluster()function in the cummeRbund42 package version 2.6.1 (http://compbio.mit.edu/cummeRbund/) with the JensenShannon distance as metric. The number ofclusters for the NPC set (ESC, NE, ERG, MRG, LRG) and the differentiatedpopulations (NEdN, ERGdN, MRGdN, LRGdA) was determined as the numberof clusters between 10 and 20 with the minimum average silhouette width acrossall clusters. Subsequently, a pseudocount of 1 was added to all FPKM countsfollowed by a log2transformation. The resulting values were used for all furtherexpression analysis.

    ChIP-seq data analysis and normalization. For H3K27ac and H3K4me3 histonemarks, the irreproducible discovery rate (IDR) framework43 with a cutoff of 0.1 incombination with the MACS2 (ref. 44) peak caller version 2.1 was usedto identifypeaks taking advantage of both replicates for each condition. For MACS2 peakcalling, we used an initialPvalue cutoff of 0.01 and the corresponding whole-cellextract (WCE) control library as background. All IDR peak sets can be obtainedfrom GEO underGSE62193.

    Forthe broad histone marks H3K27me3 and H3K4me1, we first determined all1-kilobase (kb) tiles of the human genome (hg19) that were significantly enrichedover background in at least one of the replicates. To that end we used a Poissonmodel45 with theWCE as background to model thefragment count distribution ineach genomic To that end we defined a nominal Pvalue for enrichment within agiven regioniin samplekharbouringrikChIP fragments compared to the WCEcontrol samplelwithrilChIP fragments asP(C$ rik)where

    45:

    C*Poisson max 1,eil lk

    and eil5 ril/ll, lk5 (region size)3 (total number of ChIP fragments in samplek)/(corrected genome size), ll5 (region size)3 (total number of ChIP fragments

    in sample l)/(corrected genome size). In order to account for regions with no orminimal WCE read counts due to sampling, we chose e il5max(eil,1). ResultingPvalues were adjusted for multiple testing using the BenjaminiHochberg46 cor-rection andthe qvalue R package47. Only regions significant ata qvalue#0.05andwith an enrichment level over background$1.5 were considered to be enriched.

    For differential enrichment analysis of histone marks between consecutiveconditions, we used the R package diffBind48. To normalize read counts, we usedtheeffective library size, counting only reads in peak regions (either theIDR peaksfor H3K27ac, H3K4me3 or the enriched 1-kb tiles for H3K27me3 or H3K4me1).The differential analysis was then conducted using the DBA_DESEQ2 method,taking fulladvantage of both replicates per conditionwith the bTagwiseparameterset to true. Only regions that were differentially enriched between consecutiveconditions at aPvalue of 0.05 were reported.

    In addition, we created a union peak set for each mark separately by joiningoverlapping peaks/enriched regions in preparation for the TERA analysis. For

    H3K4me1, we computed the enrichment over the union of all H3K27ac regionssince we wanted to focuson muchmore sharply definedputative enhancerregionsforthismark. ForH3K27ac, wefocusedon distalregionsonly($1 kbfrom nearestTSS) since we were specifically interested in putative enhancer regions for thismark. ForH3K4me3,we used the union of allH3K4me3 IDR based peaks regard-less of distance, accounting for most promoters and CpG islands. We then deter-mined the enrichment level for all regions in the union set in each replicate acrossall marks separately. Region enrichment was computed as follows: first, the num-ber of tag counts in each region was determined and normalized to reads perkilobase per million reads (RPKM) sequenced using the full library size of non-duplicatereads. Next, RPKM read counts were divided by themeanRPKM countsacross all WCE libraries. Subsequently, the resulting enrichment levels were log2transformed. Finally, the resulting enrichment values were quantile normalizedacross the entire data set for each mark separately. The resulting values were thenaverage across replicates to obtain a region3 condition normalized enrichmentmatrix. The resulting matrix was used as input for the TERA analysis. We tested

    several ChIP normalization strategies by assessing between-replicate correlation

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2014

    http://tophat.cbcb.umd.edu/http://compbio.mit.edu/cummeRbundhttp://compbio.mit.edu/cummeRbundhttp://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62193http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62193http://compbio.mit.edu/cummeRbundhttp://compbio.mit.edu/cummeRbundhttp://tophat.cbcb.umd.edu/
  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    8/16

    and between-condition discriminative power on a large data set of 70 REMCH3K27ac samples and identified this strategy as the best performing one.

    Footprinting detection.To determine small regions depleted of histone modifi-cations but surrounded by regions of muchgreater enrichment, termed footprints,we extended an approach used for the analysis of DNase I hypersensitivity (HS)data49. Our footprints identification algorithm consisted of three main phases. Inthe first phase, we identify peaks using the IDR framework (see previous section)for H3K27ac and H3K4me3 and use these as baseline regions in which footprintscould be detected. In the second phase, we identified footprints located within/

    aroundpeakregions in the following manner. (1)For each peak, extendby 400bpfrom apex in either direction. (2) Split entire resulting region into bins of size20 bp. (3) Compute number of RPKM counts for a central sliding window acrossthe entire region (shifting by increments of one bin) for different window sizesranging from twobinsto tenbinsin increments of one. (4)For each positionof thecentral window and for each window size, compute the following three quantities:Cij2RPKM count for central window at current position i and window size j,Rij2RPKM count for a 200-bp stretch directly to the right of the central windowand Lij2RPKM count for a 200-bp stretch directly to the left of the centralwindow. (5) For each resultingposition i and window sizej compute the depletionscore:

    eij~f Cijz1

    2Lijz

    f Cijz1

    2Rij

    With the footprint size normalization factor f5 s/b, withsthe size of the centralwindow andb the size of the border regions. (6) Identify non-overlapping, non-

    adjacent footprint candidates starting from small to larger central window sizesand recording footprint candidate if eij. 0 and eij, 1 andLij.CijandRij.Cij,followed by removing all other potential footprints (central window1 borders) oflarger size overlapping the current candidate. (7) Finally, all resulting candidatefootprints with a footprinting score eij# 0.9 were reported.

    The latter procedure wascarriedout forH3K27acand H3K4me3 independentlyfor each sample. Subsequently, we merged all footprints from individual samplesinto consensus footprints set for each epigenetic mark separately, collapsing over-lapping footprints by taking the union of all regions with non-zero overlap.

    Differentially methylated region detection. Differentially methylated region(DMR) detection was carried out as previously described with slight modifica-tions10. Pairwise comparisons of consecutive samples (hESC, NE, ERG, MRG,LRG, LNP) were carried out on a single CpG level using a b-binomial modeland the b difference distribution requiring a maximum qvalue below 0.05 andan absolute methylation difference greater than0.1. qvalues were computed based

    on b-binomial model Pvalues using the BenjaminiHochberg

    46

    method. OnlyCpGs covered by at least 5 reads in either sample were considered. Subsequently,differentially methylated CpGs within 500bp were merged into discrete regions.Differential CpGs without neighbours were embedded into a 100-bp region sur-rounding each CpG. Next, differential methylation analysis was repeated on theregion level using a random effects model. Only regions significant at a Pvaluebelow 0.01, an absolute methylation difference above 0.2 and containing at least 2differentially methylated CpGs were considered differentially methylated. Theseregions were defined as DMRs and used for subsequent analysis. For the DNAmethylation analysis in the context of the TERA framework, we restricted ouranalysis to DMRs consistently covered across all conditions, including those onlyassessed by RRBS. This left us with 7,929 regions.Association of genomic regions with genes. We used the R packageChIPpeakAnno50 to associate eachregion withits nearest ENSEMBL transcriptionstart site and used this mapping for all downstream analysis.Gene set enrichment analysis. Gene set enrichment analysisfor genomic regions

    wascarried outusingthe GREAT toolbox20

    andonly categories withqvalues#0.05for both the hypergeometric and the binomial test as well as a minimal regionenrichment level greater than 2 were considered, following the GREAT recom-mendations. Due to thelarge number of enrichedgene sets, a selected subset of theresults is shownin thedifferentfigures. In addition,we used theAllen BrainAtlas51

    to determine enrichment for distinct brain structures and developmental timepoints. To that end we derived gene sets from the brain atlas data in the followingfashion.

    We obtainedin situhybridization counts for the developing mouse brain at 7distinct fetal time points and 11 different brain substructures through directcorrespondence withhttp://www.alleninstitute.org. Specifically, we investigatedthe following structures: rostral secondary prosencephalone (RSP), telencephalon(Tel), peduncular (caudal) hypothalamus(PHy),hypothalamus(p3),pre-thalamus(p2), pre-tectum (p1), midbrain (M), prepontine hindbrain (PPH), pontine hind-brain (PH), pontomedullary hindbrain (PMH), medullary hindbrain (MH); andtime points: embryonic (E) day 11.5, E13.5, E15.5 and E18.5 as well postnatal (P)

    P4, P14 and P28. In total, we had 14,585 measurements for 2,105 different genes

    across these different regions and time points. In order to define sets of genescharacteristic for each combination of time point and structure, we computedthezscores as well as the maximum observed variation for each gene across theentire matrix of structure anddevelopmental timepoint combinations. Onlygenesthat exhibited a maximum observed variation (maximum activity2minimumactivity)$1 were considered for gene set definition. Next, we mapped all mousegenes to their human orthologues using the biomaRt database. Finally, we definedgene sets for each regiontime-point combination using genes that exhibited azscore$2 in that particular combination. Since the Allen Brain Atlas gene sets

    are defined for each developmental time point and regional identity, we nextsimplified the visualization by focusing either exclusively on structures or devel-opmental time points. Therefore, we determined the gene set with the maximumgene set activity at each differentiation stage across all gene sets associated withdistinct developmental time points for each structure separately. Similarly, wedetermined the geneset withmaximumactivity for eachdevelopmental timepointnow taking the maximum across all structures at each stage. The gene set activitywas determined as the mean log2-transformed expression level of all gene setmembers in for each condition.

    Motif library construction and mapping to transcription factors. We com-bined the position weight matrices (PWM) from Transfac professional database52

    (2011) with the PWM collection reported in ref. 53, only retaining motifs anno-tated forHomo sapiensor mouse. To eliminate redundant motifs, we determinedpairwise motif similarities for all resulting 1,886 PWMs using the TOMTOM54

    program which is part of the MEME55 suite with default parameters. Next, wecompiled a pseudo-distance matrix based on the resulting pairwise motif similar-

    ities. As a proxy for motif similarity, we used the log10-transformed TOMTOMqvalue which was capped at ten. To convert the resulting motif similarities into adistancematrix,we invertedthe scale by subtractingthe transformed qvaluesfromten. We then used the resulting matrix to perform hierarchical clustering withEuclidean distance and Wards method. Finally, we employed the cutree() func-tionwith a thresholdof seven to partitionthe resultingclusteringdendrogram intodiscrete clusters of motifs. For each cluster, we then determined the motif withthe highest complexity based on the relative entropy compared to a genomebackground model with the following base frequencies: A5 0.2725, C5 0.189,G5 0.189andT50.2728.Only motifswith a relativeentropy greaterthan or equalto eight were retained for subsequent analysis. After identification of the candidatewiththe highestcomplexity foreach motif cluster,we assigned allgenes mapping toanymotifin eachcorresponding clusterto thecluster representativemotif.Thisleadto a final motif list of 557 motifs. To obtain a more quantitative association of eachmotif with its linked genes, we computedthe epigenetic transcription factor activity(ETFA)scores across70 REMCH3K27acor H3K4me3cell types andcorrelatedtheresultswith RNA-seqexpression data across 40cell types. Thisanalysisgaverise to acorrelationmatrixcontainingthe Pearson correlationcoefficient of eachmotif withitslinked genes.This matrix was used in combinationwiththe plain gene mappingreported in primary motif sources. For Fig. 2b, we uniquely map each motif to acorresponding linked gene by computing an associationscore as the product of theabsolutePearson correlationcoefficient andthe averagegene expression level of thecorresponding gene. We thenchose the gene with the highest associationscore. Formotifs without an entry in the H3K27ac correlation matrix (due to the inability todetermine suitable GEVparameterson the REMC dataset), we chose the genewiththe highest gene expression level. In Fig. 2b, only genes expressed with at least 10FKPM in the respective condition are considered. We then report the genes map-pingto the40 motifs foreachcondition, whereTERAscores of motifs mapping thesame gene were averaged.

    In Figs 4 and 5, we incorporated the results of the shRNA screen to uniquelymap motifs applying the aforementioned mapping strategy only on the genes

    identified as hits. If it did not map to any gene hit by the screen, we used thestandard assignment strategy outlined above.

    Identification of putative transcription factor binding sites. To determineputative binding sites in a given genomic region, we used a biophysical modelof transcription factor affinities to DNA56,57 to determine putative binding to ourfootprintsets. This biophysical model requires the trainingof generalized extreme

    value (GEV) distributions of binding affinities based on a PWM matrix for eachtranscription factor and each setof genomic regions in order to generate a suitablebackground model. In order to takethe distinctpropertiesof footprints determinedfrom differentepigenetic marks into account, we determined the GEV parametersfor footprints arisingfrom H3K27ac,H3K4me3 andDNAme using the frameworkoutlined in refs 56, 57. The resulting three binding matrices were then filtered forminimal significant binding affinity atPvalues below 0.05. All other entries withhigher Pvalues were set to one. Next, we took the negativelog10of the entire matrixas a quantitative measure of binding affinity in subsequent analysis.

    Inference of transcription factor activities based on epigenetic data. To infer

    transcription factor epigenetic remodelling activities (TERA), we first computed

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2014

    http://www.alleninstitute.org/http://www.alleninstitute.org/
  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    9/16

    ETFA from our epigenetic data. To that end, we first focused on motif activityanalysis and associated each motif in a second step with its corresponding tran-scription factor. For each epigenetic mark, we used the normalized epigeneticenrichment scores as well as DMRs with a minimal DNA methylation differenceof at least 0.2 and covered consistently in all data sets. For the DNA methylationdata, we invertedthe scaleto obtain demethylationscores (15 fully demethylated,05 fully methylated) since usually the demethylated states coincides with generegulatory element activity. To determine the unobserved activity of a transcrip-tion factor binding motif,we took advantage of recentdevelopments in themicro-

    array field58,59

    and adapted this approach to epigenetic data. To that end wemodelledthe enrichment levelyitof a particular epigenetic markat genomic regioni andtime point tas a linear functionof the unknowntranscription factor activities.Consideringppredictor variables (epigenetic motif/transcription factor activities)andk time points we describe the unknown transcription factor activities Xas a

    p3 k matrix. Incorporating all regions n meeting the above listed criteria, weemploy the linear model Y~AzBXzEwith the observed matrix of epigeneticenrichment scoresY(n3 k), a constant offset matrixA(n3 k), the connectivitymatrix B (n3p), describing the filtered binding affinities for all transcriptionfactor motifs to all regions and an error term matrixE. Subsequently, we followedthe approach outlined in ref. 58 and applied partial least square (PLS) regressionand specifically the SIMPLs algorithm60 to determine the unknown transcriptionfactor motif activities. The idea in PLS is to employ a linear dimensionality reduc-tion T~BR, where thep predictors inXare mapped onto c# rank(X)#min(p,n)latent componentsT(n3 cmatrix), and to compute the weight matrixRnot onlybasedon thedatamatrixB but explicitlytakinginto account the responsematrixY.

    The latter strategy maximizes predictive power even for a small number of latentcomponents.

    In order to determine the number of latent components for each epigeneticmarkand genomic context, we performedcross validation by randomly partition-ing the data set 20 times into two-thirds training and one-third test sets. We thenchose thenumber of components such that it minimizedthe prediction error.Thecorresponding analysis methodology was implemented in the statistical program-ming language R adapting the implementation provided in ref. 58. To assess thesignificance of the resulting ETFA scores, we performed a permutation test byrandomly permuting the epigenetic enrichment scores for each gene regulatoryelement andrecomputed theETFA valueson thepermutedvalues.This process isrepeated100 times. Positive ETFAscores are considered to be insignificant and setto 0 ifa greater ETFA score isobservedmorethanonceon therandomly permutedset and vice versa for negative ETFA scores.

    Finally, we determined the TERA scores by computing the differential ETFA

    scores between consecutive conditions. These scores were determined by sub-tracting ETFA scores of consecutive time points from each other. Subsequently,we assessed the significance of this difference using a permutation test by ran-domly permuting the epigenetic enrichment scoresacrossallregions,re-computingthe ETFA scores for each conditions and assessing the TERA score between con-secutive conditions for each motif. Positive TERA scores are considered to beinsignificant and set to 0 if a greater TERA score is observed more than once onthe randomly permuted set and vice versa for negative TERA scores.

    Co-binding analysis. Co-binding relationships wereevaluated using an empiricalapproach with the entire set of footprints for each epigenetic markas background.For a given factori, we determined the footprints set Firelevant for the currentcomparison (for example, changing their epigenetic state in particular cell statetransition) that were predicted to contain a transcription factor binding site basedon the binding model outlined above. Next, we computed the frequency of motifco-occurrenceSFij across Fi for all other motifs j in our database. To generate aproper null distribution, we randomly sampled K5 100 standardized footprintsets Gk each of sizejFijfrom the entire footprintcollection for the epigenetic markunder study and computed the same test statistic SGkij on these sets. Finally, wedetermined an empiricalPvalue and enrichment over the control based on thesequantities by counting the number of instances for whichSGkij S

    Fij:

    Pij~

    Pks

    Gkij s

    Fij

    K

    Only co-binding relationships significant atPvalues#0.01, a median enrichmentover the control$1.5 and an expression level$2 FPKM in at least one conditionwere retained. For the core factor co-binding analysis, the predicted co-bindingrelationships were additionally filtered for support by the knockdown data at thestage of predicted co-binding

    Validation analysis on ENCODE data.To validate the outlined strategyin silicowe took advantage of publically available transcription factor ChIP-seq data infour cell lines from the ENCODE61 project as well as H3K27ac and RNA-seq data

    for70 cell types from theREMC project. We downloaded H3K27ac data as well as

    processed transcriptionfactor binding data from the ENCODE project for the celllineK562 sinceabundant transcriptionfactor bindingdata basedon ChIP-seq wasavailable. In addition, this data set has been successfully used in several studies tobenchmark transcription factor binding predictions62,63. We then applied ourTERA pipeline to the H3K27ac data sets and computed the transcription factorbinding affinities for a set of 557 distinct motifs. With these data sets at hand, wecomputed the true-positive rate (TPR), the false-positive rate (FPR) and the pos-itive predictive values (PPV) for all transcription factors that could be matched toat least one motifwith availablebinding affinities (46out of 117). In the eventthat

    onefactor matchedmultiplemotifs,we chose themotif with thehighestareaunderthe curve.

    GWAS analysis.The GWAS analysis was conducted using 11,027 GWAS SNPsfrom the GWAS catalogue (August 2013). We sought to determine whether theH3K27ac-positive regions identified in the NPC populations were enriched forany GWAS SNP class with respect to a H3K27ac peak compendium across manydifferent tissues. To determine a proper background distribution we randomlysampledK5 1000 equally sized peak sets from H3K27ac-based footprints iden-tified across 70 epigenome roadmapdata sets.Prior to further analysis, we normal-ized the size of each peak all sets by extending it by 250 bp in each direction fromthe center coordinate. Next, we determined the overlap with GWAS SNPs forcontrol and neural H3K27ac footprintsets. Subsequently, we computed an empir-icalPvalue for each trait/diseaseiin the catalogue by determining the number oftrait associated SNPs SCij overlapping with each control region set Cj and thenumber overlapping with the corresponding footprint setsiaccording to

    Pi~

    PjsisCij

    K

    Determination of core network. The core network was defined as those tran-scription factors that were differentially expressed during neural induction fromES cell to NE and not differentially expressed between consecutive stages of NE,ERGandMRG.We didnotconsider theLRG stage.Furthermore, werequiredthateach factor was expressed at least 10 FPKM or more in NE, ERG and MRG andthatits meannormalized,maximumdifference in expressionlevels between anyofthe stages did not exceed one standard deviation computed across the entire datasetof 9 cell types.In addition,we also considered genes that were notdifferentiallyexpressed between any consecutive stages including the ESC stage but fulfilled allother criteria. This identification procedure gave rise to the candidate list of corefactors. We then intersected this list with the results of our shRNA screen andretained only those factors that were significantly depleted in the HES51 popu-lation relative to the respective HES52 or control population in at least twostages.Since the literature supported a role for PAX6andOTX2for which our shRNAsshowed no effect due to the pooled setup or absent knockdown (Fig. 3f and Ex-tended Data Fig. 3g), we included these genes as well. Finally, we merged this listwill all transcription factors that were depleted in our shRNA screen at all threestagesin the HES51 population relative to thecontrolsand were expressedat leastat10 FPKM or more inNE, ERGandMRG.Thisalgorithmyieldeda list of22 tran-scription factors or epigenetic modifiers (Fig. 4a). We then carried out co-bindinganalysis in H3K27ac footprintsdynamically regulated at eachstage in orderto obtainputative stage-specificco-bindingrelationships.To determinesignificantco-bindingevents, we used the permutation procedure outlined above and retained all co-bindingpartnerswith an enrichment overthe control$1.5 that weresignificantatP# 0.01 that were also identified as a significant hit in the shRNA screen at theparticular stage under investigation.

    Transcription factor binding site priming analysis. To determine transcriptionfactors associated with transcription factor binding site priming before factor

    activation, we determined all transcription factors at each stage that were signifi-cantly upregulated at the consecutive NPC time point or induced in the corres-ponding more differentiated cell type (qvalue# 0.1) and showed an increase inH3K4me1- or DNAme-derived TERA activity at the current stage under invest-igation. In addition, we required that the corresponding motif did not map to anytranscription factor that was expressed more than 3.5 FPKM at the current stageunder investigation. From this list, we picked the pro-neural genes NEUROD4,

    ASCL2andNFIXfor further investigation due to their literature support for theirpro-neural functions. Finally, we required that the potential downstream targetgeneswere significantly enriched for differentially regulatedgenes at the nextNPCstage or in the corresponding more differentiated cell types. To that end, wedetermined all putative transcription factor binding sites for a particular factorin dynamically regulated H3K27acor H3K4me1footprints at the stageof potentialpriming. We then associated each of these putative binding sites with the nearestTSS and determined the number of differentially expressed genes for each factor.To assess significance, we randomly drew 100 sets of equally sized H3K27ac

    footprints with no motif of the factor under investigation and determined the

    LETTER RESEARCH

    Macmillan Publishers Limited. All rights reserved2014

  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    10/16

    number of differentially expressed genes for the subsequent stages. Only factorsthat exhibited more differentially expressed genes compared to the control sets inmore than 99% of the cases were retained.

    Next, we performed co-binding analysis in H3K27ac peaks differentially regu-latedbetween theES cell andNE stage as outlinedaboveand display thetop 10 co-binding relationships per factor with an odds-ratio$1.5 that were significant at apermutation-test-basedP#0.01 in Fig. 5a.

    28. Garber, M. et al.A high-throughput chromatin immunoprecipitation approachreveals principles of dynamicgene regulation in mammals.Mol. Cell47, 810822

    (2012).29. Engreitz, J. M.et al.The Xist lncRNA exploits three-dimensional genome

    architecture to spread across the X chromosome. Science 341, 1237973 (2013).30. Luo, B. et al.Highly parallel identification of essential genes in cancer cells.Proc.

    Natl Acad. Sci. USA105, 2038020385 (2008).31. Strezoska, Z.et al. Optimized PCR conditions and increased shRNA fold

    representation improve reproducibility of pooled shRNA screens.PLoS ONE7,e42341 (2012).

    32. Boyle, P.et al. Gel-free multiplexed reduced representation bisulfite sequencingfor large-scale DNA methylation profiling. Genome Biol.13, R92 (2012).

    33. Trapnell, C., Pachter,L. & Salzberg, S. L. TopHat: discoveringsplice junctions withRNA-seq. Bioinformatics 25, 11051111 (2009).

    34. Trapnell, C. et al.Differential analysis of gene regulation at transcript resolutionwith RNA-seq. Nature Biotechnol.31,4653 (2013).

    35. Trapnell, C. et al.Differential gene and transcript expression analysis of RNA-seqexperiments with TopHat and Cufflinks. Nature Protocols7, 562578 (2012).

    36. Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program.BMCBioinformatics 10, 232 (2009).

    37. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and callingvariants using mapping quality scores. Genome Res. 18,18511858 (2008).

    38. Langmead,B. & Salzberg, S. L. Fast gapped-readalignment with Bowtie2. NatureMethods9, 357359 (2012).

    39. Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer(IGV): high-performance genomics data visualization and exploration. Brief.Bioinform.14,178192 (2013).

    40. Dai, Z. et al.shRNA-seq data analysis with edgeR.F1000Res.3,95 (2014).41. Smyth, G. K. inBioinformatics and Computational Biology Solutions Using R and

    Bioconductor Statistics for Biologyand Health (ed.Gentleman,R.) (Springer, 2005).42. Goff, L., Trapnell, C. & Kelley, D. cummeRbund: Analysis, Exploration,

    Manipulation, and Visualization of Cufflinks High-Throughput Sequencing Data.http://compbio.mit.edu/cummeRbund/ (2012).

    43. Li,Q. H.,Brown,J. B.,Huang,H. Y. & Bickel, P.J. Measuring reproducibility of high-throughput experiments.Ann. App. Stat. 5, 17521779 (2011).

    44. Zhang, Y.et al. Model-based analysis of ChIP-seq (MACS).Genome Biol.9, R137(2008).

    45. Mikkelsen, T. S. et al.Comparative epigenomic analysis of murine and humanadipogenesis. Cell 143, 156169 (2010).

    46. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical andpowerful approach to multiple testing.J. R. Stat. Soc. B 57,289300 (1995).

    47. Dabney, A. & Storey, J. D. qvalue: Q-value estimation for false discovery rate control.R package version 1.40.0 (2013).

    48. Ross-Innes, C. S.et al.Differential oestrogen receptor binding is associated withclinical outcome in breast cancer.Nature 481,389393 (2012).

    49. Neph, S.et al. An expansive human regulatory lexicon encoded in transcription

    factor footprints.Nature 489, 8390 (2012).50. Zhu,L. J. et al. ChIPpeakAnno: a Bioconductorpackage to annotate ChIP-seq and

    ChIP-chip data.BMC Bioinformatics 11, 237 (2010).51. Thompson, C. L. etal. A high-resolutionspatiotemporal atlas of geneexpressionof

    the developing mouse brain.Neuron 83,309323 (2014).52. Fogel, G. B.et al.A statistical analysis of the TRANSFAC database.Biosystems81,

    137154 (2005).53. Jolma,A. et al. DNA-binding specificities of human transcription factors. Cell 152,

    327339 (2013).54. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying

    similarity between motifs.Genome Biol.8, R24 (2007).55. Bailey, T. L., Williams,N., Misleh, C. & Li, W. W. MEME: discovering and analyzing

    DNA and protein sequence motifs.Nucleic Acids Res. 34,W369W373 (2006).56. Manke, T., Roider, H. G. & Vingron, M. Statistical modeling of transcription factor

    binding affinities predicts regulatory interactions.PLOS Comput. Biol.4,e1000039 (2008).

    57. Manke, T.,Heinig,M. & Vingron,M. Quantifying theeffectof sequencevariation onregulatory interactions.Hum. Mutat. 31,477483 (2010).

    58. Boulesteix, A. L. & Strimmer, K. Predicting transcription factor activities fromcombined analysis of microarray andChIP data: a partialleast squaresapproach.Theor. Biol. Med. Model. 2, 23 (2005).

    59. Boulesteix,A. L. & Strimmer,K. Partialleast squares: a versatile toolfor theanalysisof high-dimensional genomic data.Brief. Bioinform. 8, 3244 (2007).

    60. de Jong, S. Simpls: an alternative approachto partial least-squares regression.Chemometr. Intell. Lab. 18,251263 (1993).

    61. TheENCODEProject Consortium.An integrated encyclopedia of DNAelements inthe human genome.Nature 489, 5774 (2012).

    62. Sherwood, R. I. et al. Discoveryof directional and nondirectional pioneertranscription factors by modeling DNase profile magnitude and shape.NatureBiotechnol. 32, 171178 (2014).

    63. Thurman, R. E.et al.The accessible chromatin landscape of the human genome.Nature 489, 7582 (2012).

    RESEARCH LETTER

    Macmillan Publishers Limited. All rights reserved2014

    http://compbio.mit.edu/cummeRbundhttp://compbio.mit.edu/cummeRbund
  • 8/10/2019 Dissecting Neural Differentiation Regulatory Networks Through Epigenetic Footprinting

    11/16

    Extended Data Figure 1| Isolation and characterization of ES-cell-derivedneural progenitor cells. This figure relates to Fig. 1 in the main text.a, Schematic of our differentiation model i