55
www.sciencemag.org/cgi/content/full/science.aar6731/DC1 Supplementary Material for Quantifying the contribution of recessive coding variation to developmental disorders Hilary C. Martin*, Wendy D. Jones, Rebecca McIntyre, Gabriela Sanchez-Andrade, Mark Sanderson, James D. Stephenson, Carla P. Jones, Juliet Handsaker, Giuseppe Gallone, Michaela Bruntraeger, Jeremy F. McRae, Elena Prigmore, Patrick Short, Mari Niemi, Joanna Kaplanis, Elizabeth J. Radford, Nadia Akawi, Meena Balasubramanian, John Dean, Rachel Horton, Alice Hulbert, Diana S. Johnson, Katie Johnson, Dhavendra Kumar, Sally Ann Lynch, Sarju G. Mehta, Jenny Morton, Michael J. Parker, Miranda Splitt, Peter D Turnpenny, Pradeep C. Vasudevan, Michael Wright, Andrew Bassett, Sebastian S. Gerety, Caroline F. Wright, David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles, Jeffrey C. Barrett*, on behalf of the DDD Study *Corresponding author. Email: [email protected] (H.C.M.); [email protected] (J.C.B.) Published 8 November 2018 as Science First Release DOI: 10.1126/science.aar6731 This PDF file includes: Materials and Methods Figs. S1 to S20 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/science.aar6731/DC1) Tables S1 to S7 as separate Excel files

Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

www.sciencemag.org/cgi/content/full/science.aar6731/DC1

Supplementary Material for

Quantifying the contribution of recessive coding variation to developmental disorders

Hilary C. Martin*, Wendy D. Jones, Rebecca McIntyre, Gabriela Sanchez-Andrade,

Mark Sanderson, James D. Stephenson, Carla P. Jones, Juliet Handsaker, Giuseppe Gallone, Michaela Bruntraeger, Jeremy F. McRae, Elena Prigmore,

Patrick Short, Mari Niemi, Joanna Kaplanis, Elizabeth J. Radford, Nadia Akawi, Meena Balasubramanian, John Dean, Rachel Horton, Alice Hulbert, Diana S. Johnson, Katie Johnson, Dhavendra Kumar, Sally Ann Lynch, Sarju G. Mehta, Jenny Morton,

Michael J. Parker, Miranda Splitt, Peter D Turnpenny, Pradeep C. Vasudevan, Michael Wright, Andrew Bassett, Sebastian S. Gerety, Caroline F. Wright,

David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles, Jeffrey C. Barrett*, on behalf of the DDD Study

*Corresponding author. Email: [email protected] (H.C.M.); [email protected] (J.C.B.)

Published 8 November 2018 as Science First Release

DOI: 10.1126/science.aar6731

This PDF file includes: Materials and Methods Figs. S1 to S20 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/science.aar6731/DC1)

Tables S1 to S7 as separate Excel files

Page 2: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Funding statement 

The DDD study presents independent research commissioned by the Health Innovation                     

Challenge Fund (grant HICF-1009-003), a parallel funding partnership between the Wellcome                     

Trust and the UK Department of Health, and the Wellcome Trust Sanger Institute (grant                           

WT098051). The views expressed in this publication are those of the author(s) and not                           

necessarily those of the Wellcome Trust or the UK Department of Health. The study has UK                               

Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research                     

Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics                       

Committee). The research team acknowledges the support of the National Institutes for Health                         

Research, through the Comprehensive Clinical Research Network. This study makes use of                       

DECIPHER (http://decipher.sanger.ac.uk), which is funded by the Wellcome Trust. 

 

Materials and Methods 

Family recruitment  

The DDD project recruited individuals with severe, undiagnosed developmental disorders from                     

24 clinical genetics centres within the United Kingdom National Health Service and the                         

Republic of Ireland as described previously ( 11) , using specific criteria                   

(https://decipher.sanger.ac.uk/files/ddd/documents/policy/ddd_project_referral_guide_clinical_

genetics.pdf). Families gave informed consent to participate, and the study was approved by the                           

UK Research Ethics Committee (10/H0305/83, granted by the Cambridge South Research                     

Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics                       

Committee). DNA was collected from saliva samples obtained from the probands and their                         

parents, and from blood obtained from the probands, then samples were processed as previously                           

described (1). The analyses in paper are based on a data freeze that includes 7,831 trios from                                 

7,446 families and 1,791 patients without parental samples (Fig. S2). These individuals include                         

those analysed in previous publications (4 , 9).  

Page 3: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Clinical features 

The patients were systematically phenotyped: detailed developmental phenotypes were recorded                   

using Human Phenotype Ontology (HPO) terms (27), and growth measurements, family history,                       

developmental milestones etc. were collected using a standard restricted-term questionnaire                   

within DECIPHER ( 28). Some HPO terms fall under multiple organ systems (e.g. microcephaly                         

falls under "nervous system", "head or neck" and "skeletal system"), and for Fig. 1 and Table                               

S1, we wanted to avoid counting multiple organ systems that represented a single HPO term                             

multiple times. Thus, we first ranked organ systems according to the number of raw counts of                               

individuals with at least one term under that system in the full DDD cohort. We then took                                 

individuals with at least one HPO under the organ system most commonly affected, and                           

assigned these individuals a count of one organ system. We then removed these HPOs from the                               

patients’ lists, and identified individuals with at least one HPO in the organ system ranked next                               

most commonly affected. We continued to count organ systems and remove HPOs for 19                           

non-overlapping systems. The organ systems in Fig. 1A and Table S1 were determined using                           

this procedure (e.g. a patient with microcephaly is only included under “nervous system”, not                           

under “head or neck” or “skeletal system”), and Fig. 1B shows the distribution of the                             

non-redundant counts. 

 

We note that certain DDD clinicians tend to be more diligent about reporting multiple HPO                             

terms, and some tend to report certain organ systems more thoroughly. Thus, the small                           

differences in clinical features between EABI and PABI (Fig. 1; Fig. S3; Table S1) could be                               

partly because clinicians who work in areas with a high South Asian population have different                             

reporting styles from others. The slightly greater severity of the PABI group might also partly                             

be because, in communities enriched for cousin marriages, there is a higher number of children                             

with genetic disorders (29 ), and clinical geneticists are only able to see the more severe cases. 

 

Exome sequencing and variant quality control 

Page 4: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Exome sequencing, alignment and calling of single-nucleotide variants and small insertions and                       

deletions was carried out as previously described (4), as was the filtering of de novo mutations.                               

For the analysis of biallelic genotypes, we chose thresholds for genotype and site filters to                             

balance sensitivity (number of retained variants) and specificity (as assessed by Mendelian error                         

rate and transition/transversion ratio). We removed sites with a strand bias test p-value < 0.001.                             

We then set individual genotypes to missing if they had genotype quality < 20, depth < 7 or, for                                     

heterozygous calls, a p-value from a binomial test for allele balance < 0.001. Since the samples                               

had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit,                               

we subsequently only retained sites that passed a missingness cutoff in both the V3 and the V5                                 

samples. We found that, after setting a depth filter, the proportion of missing genotypes allowed                             

had a more substantial effect on the number of Mendelian errors than genotype quality and                             

allele balance cutoffs (Fig. S20). Thus, we ran the biallelic burden analysis on two different                             

callsets, using a 10% (strict) or a 50% (lenient) missingness filter, and found that the results                               

were very similar. We report results from the more lenient filter in this paper, since it allowed                                 

us to include more variants. Genotypes were set to missing for a trio if there was a Mendelian                                   

error, and variants were removed if more than one trio had a Mendelian error and if the ratio of                                     

trios with Mendelian errors to trios carrying the variant without a Mendelian error was greater                             

than 0.1. If any of the individuals in a trio had a missing genotype at a variant, all three                                     

individuals were set to missing for that variant. 

 

Variants were annotated with Ensembl Variant Effect Predictor (30) based on Ensembl gene                         

build 83, using the LOFTEE plugin. The transcript with the most severe consequence was                           

selected. We analyzed three categories of variant based on the predicted consequence: (1)                         

synonymous variants; (2) loss-of-function variants (LoFs) classed as “high confidence” by                     

LOFTEE (including the annotations splice donor, splice acceptor, stop gained, frameshift,                     

initiator codon and conserved exon terminus variant); (3) damaging missense variants (i.e. those                         

not classed as “benign” by PolyPhen or SIFT, with CADD>25). Variants were also annotated                           

with MAF data from four different populations of the 1000 Genomes Project (31 ) (American,                           

Page 5: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Asian, African and European), two populations from the NHLBI GO Exome Sequencing                       

Project (European Americans and African Americans) and six populations from the Exome                       

Aggregation Consortium (ExAC) (African, East Asian, non-Finnish European, Finnish, South                   

Asian, Latino), and an internal allele frequency generated using unaffected parents from the                         

DDD. 

 

Ancestry inference  

We ran a principal components analysis in EIGENSOFT (32) on 5,853 common exonic SNPs                           

defined by the ExAC project. We set genotypes with GL<20 to missing and excluded SNPs                             

with >2% missingness. We calculated principal components in the 1000 Genomes Phase III                         

samples and then projected the DDD samples onto them. We grouped samples into three broad                             

ancestry groups (European, South Asian, and Other) as shown in Fig. S2 (right hand plots). By                               

drawing ellipses around the densest clusters of DDD samples, we defined two narrower groups:                           

European Ancestry from the British Isles (EABI) and Pakistani Ancestry from the British Isles                           

(PABI).  

 

For the burden and gene-based analysis, we primarily focused on these narrowly-defined EABI                         

and PABI groups because it is difficult to accurately estimate population allele frequencies in                           

more broadly defined groups. For example, in 4,942 European-ancestry probands, the number                       

of observed biallelic synonymous variants was slightly higher than the number expected (ratio =                           

1.06; p=2.7×10-4).   

Calling autozygous regions 

To call autozygous regions, we ran bcftools/roh(33 ) (bcftools version 1.5-4-gb0d640e)                   

separately on the different broad ancestry groups. We LD pruned our data to avoid overcalling                             

small runs of homozygosity as autozygous regions. Because rates of consanguinity differ                       

dramatically between EABI and PABI, we chose r2 cutoffs for each that brought the ratio of                               

observed to expected biallelic synonymous variants with MAF<0.01 closest to 1 (see below for                           

Page 6: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

calculation of the number expected): PLINK options --indep-pairwise 50 5 0.4 for EABI and                           

--indep-pairwise 50 5 0.8 for PABI. 

 

Sample quality control and subsetting 

Our strategy for filtering probands and defining different subsets is shown in Fig. S1. For our                               

main analyses we excluded individuals without parental data (though we did incorporate them                         

into some of the follow-up on EIF3F and KDM5B). Next, we removed 1,403 probands whose                             

ancestry was not in the EABI or PABI groups described above. Finally we removed 388                             

probands that failed quality control filters. Specifically: 47 probands with >5% missingness at                         

the SNPs used for PCA (since their ancestry assignment was likely uncertain), 7 probands with                             

uniparental disomy, and one individual from every pair of probands who were related (kinship >                             

0.044, estimated by PCRelate (34), equivalent to third-degree relatives). 

 

For estimating cumulative frequencies of rare variants, we removed 924 parents reported to be                           

affected, since one might expect these to be enriched for damaging variants compared to the                             

general population, and 9 European parents with an abnormally high number of rare (MAF<1%)                           

synonymous genotypes (>834, compared to the 99.9th percentile of 223). However, we retained                         

their offspring in the burden and gene-based analyses. 

 

We stratified probands by high autozygosity (>2% of the genome classed as autozygous),                         

whether or not they had an affected sibling, and whether or not they already had a likely                                 

diagnostic dominant or X-linked exonic mutation (a likely damaging de novo mutation or                         

inherited damaging variant in a known monoallelic DDG2P gene                 

(http://www.ebi.ac.uk/gene2phenotype/) [if the parent was affected] or a damaging X-linked                   

variant in a known X-linked DDG2P gene). The 4,458 patients who had no such diagnostic                             

variants were included in the “undiagnosed” set, along with 193 patients who had diagnostic                           

variants in recessive DDG2P genes or potentially diagnostic variants in monoallelic or X-linked                         

DDG2P genes but had high autozygosity or affected siblings. There were 1,366 EABI and 23                             

Page 7: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

PABI probands in the diagnosed set, and 4,318 EABI and 333 PABI probands in the                             

undiagnosed set. For the set of probands with affected siblings shown in Figures 1 and 2, we                                 

restricted to families from which more than one independent (i.e. non-MZ twin) child was                           

included in DDD and in which the siblings’ phenotypes were more similar than expected by                             

chance given the distribution of HPO terms in the full cohort (HPO similarity p-value < 0.05                               

(9 ) ).   

Burden analyses and gene-based tests 

Variants were filtered on class (LoF, damaging missense or synonymous) and by different MAF                           

cutoffs. Variants failing the MAF cutoff in any of the publicly available control populations, the                             

full set of unaffected DDD parents, or the unaffected DDD parents in that population subset                             

(PABI or EABI) were removed.  

 

Following the approach we used previously (9), we calculated Bg,c, the expected number of rare                             

biallelic genotypes of class c (LoF, damaging missense or synonymous) in each gene g, as                             

follows: 

(B ) N λE g,c = prob g,c  

where is the number of probands and is the expected frequency of biallelic  Nprob

              λg,c              

genotypes of class c in gene, calculated as follows:  

(1 )f fλg,c = − ag c,g2 + ag c,g  

where fc,g is the cumulative frequency of variants of class c in gene g with MAF less than the                                     

cutoff, and ag is the fraction of individuals autozygous at gene g. An individual was defined as                                 

being autozygous if he/she had a region of homozygosity with any overlap of gene g; in                               

practice, autozygous regions almost always overlapped genes completely rather than partially. 

 

The rate of LoF/damaging missense compound heterozygous genotypes is: 

(1 )[2f f (1 )]λg,LoF /miss = − ag LoF ,g miss,g − f

LoF ,g   

Page 8: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

To calculate the cumulative frequency of variants of class c in gene g, f c,g, we first phased the                                   

variants in the DDD parents based on the inheritance information. The cumulative frequency is                           

then given by: 

fc,g = hc,g

Nhaps

 

where is the number of parental haplotypes with at least one variant of class c in gene g , and  hc                                      

is the total number of parental haplotypes.Nhaps

  

 

For each gene, we calculated the binomial probability (given probands and rate of                  Nprob

      )λg,c    

the observed number of biallelic genotypes of class c. We did this for four consequence classes                               

(biallelic LoF, biallelic LoF+LoF/damaging missense, biallelic damaging missense, and biallelic                   

LoF+LoF/damaging missense+biallelic damaging missense). Our reasoning was that, in some                   

genes, biallelic LoFs might be embryonic lethal and LoF/damaging missense compound                     

heterozygotes might cause DD, but in other genes, including rare damaging missense variants in                           

the analysis might drown out signal from truly pathogenic LoFs. We conducted the test on two                               

sets of probands (EABI only, and EABI+PABI). We did not analyze PABI separately due to                             

low power. 

 

For the set of EABI only, we conducted a simple binomial test. For the combined EABI+PABI                               

test, we took into account the different ways in which n or more probands with the relevant                                 

genotype could be distributed between the two groups and the probability of observing each                           

combination using population-specific rates (e.g. two observed biallelic genotypes could be                     

both seen in EABI, both in PABI, or one in each). We then summed these probabilities across                                 

all possible combinations to obtain an aggregate probability for sampling n or more probands by                             

chance, as described in (9 ) .   

For some genes, was estimated to be 0 in one or both populations because there were no      λg,c                              

variants in the parents that passed filtering. The vast majority of these also had . We                            (B )O g,c = 0    

dropped these genes from the tests, but still included them in our Bonferroni correction. We also                               

Page 9: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

excluded 715 genes either because they were in the HLA region or because they were classed as                                 

having suspiciously many or suspiciously few synonymous or synonymous+missense variants                   

in ExAC, leaving 18,630 genes. We thus set a significance threshold of 0.05/(8 tests 18,630                            ×    

genes) = p<3.4×10-7. For Fig. S8, we ordered the genes by their lowest p-value, randomized the                               

order of genes with the same p-value, then tested for a difference in the distribution of ranks                                 

between recessive DDG2P genes and all other genes using a Kolmogorov-Smirnov test. 

 

For the burden analysis, we summed up the observed and expected number of biallelic                           

genotypes across all genes to give and , then calculated their difference            (B ) O c   (B )E c          

(the excess) and their ratio . Under the null hypothesis, we expect(B ) (B )O c − E c            E(B ) c

O(B ) c               (B )O c  

to follow a Poisson distribution with rate .(B )E c   

 

Alternative methods for estimating the expected number of biallelic                 

genotypes  

For Fig. S4, we used two alternative methods to the one described above for estimating the                               

expected number of biallelic genotypes. Firstly, we calculated the cumulative frequency based                       

on ExAC by summing the frequencies of individual variants of class c in gene g separately for                                 

non-Finnish Europeans (NFE) for EABI and from South Asians (SAS) for PABI: 

fc,g,ExAC = ∑

V

v=1qc,g,v  

 

For this, we restricted to ExAC variants within the intersection of the V3 and V5 Agilent probes                                 

used in DDD (including 100bp flanks on each side), and removed variants if they had >50%                               

missingness in the relevant ExAC population. 

 

We also used a modified version of the method described by Jin et al. (3) for approximating the                                   

expected number of biallelic genotypes by fitting a polynomial regression on the mutability: 

(B ) m mO g = β0 + β1 g + β22g

+ ε  

Page 10: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

where is the observed number of biallelic genotypes for gene g, is the mutability for  (B )O g                      mg          

gene g calculated using the method of Samocha et al. (35 ) , , and are coefficients, and                      β0 β1     β2        

is a random noise term. We fitted this model to the observed number of biallelic synonymousε                                  

genotypes (MAF<0.01) for each gene to estimate , and , then calculated the expected              β0 β1     β2          

number of biallelic genotypes for each gene using the fitted value: 

(B ) m mE g = β0 + β1 g + β22g  

Note that we did not do the additional step described in Jin et al. in which they calculate the                                     

expected number of biallelic genotypes for gene g as , since this                  (B )∑

genes

O g

β +β m +β m0 1 g 22g

+β m +β m∑

genes

β0 1 g 22g

     

constrains the total expected number of biallelic genotypes to be equal to the number observed,                             

making exome-wide burden analysis impossible. 

 

Estimating the proportion of cases with diagnostic biallelic coding variants or                     

de novo mutations 

We are interested in estimating the proportion of probands with diagnostic variants of          ,πc                  

consequence class c. Under the null hypothesis in which none of the genotypes of class c are                                 

pathogenic, the number of such genotypes we expect to see in probands is:Npr  

NE(b )probands,c null = λpr,c pr  

where is the total expected frequency of genotypes of class c across all genes.λpr,c = ∑G

g=1λg,c   

  

However, under the alternative hypothesis, suppose that some fraction of genotypes in                  φcausal,c        

class c cause DD, and some fraction are lethal. Assuming complete penetrance, we can              φlethal,c                

thus split into genotypes that are due to chance and those that are diagnostic:(b )Eprobands,c  

+NE(b )probands.c alt = (1 )λ− φ

causal,c − φlethal,c pr,c pr Nπc pr

φ λcausal,c pr,c

1−e−φ λcausal,c pr,c

 

Page 11: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

The component due to chance is , and is the average            N(1 )λ− φcausal,c − φ

lethal,c pr,c pr    φ λcausal,c pr,c

1−e−φ λcausal,c pr,c

       

number of pathogenic genotypes per individual, given that the individual has at least one such                             

genotype. 

 

In healthy parents, biallelic genotypes of class c are all due to chance from the portion of c  Npa                                    

that is not pathogenic: 

NE(b )parents.c alt = (1 )λ− φ

causal,c − φlethal,c pa,c pa  

where is the expected rate of biallelic genotypes of class c in the parents, given the  λpa,c                                

cumulative frequencies estimated in the same set of people, and the autozygosity rates. We can                             

thus obtain a maximum likelihood estimate for using , the observed              φc = φcausal,c + φ

lethal,c    Opa,c      

number of biallelic genotypes of class c in the parents: 

φc = 1 − Opa,c

λ Npa,c pa

 

The 95% confidence interval for is , . We show the          φc   1( − λ Npa,c pa

O +1.96pa,c √Opa,c )λ Npa,c pa

O −1.96pa,c √Opa,c       

estimates of from the EABI and PABI parents in Fig. S7. To estimate , we combined data    φc                         πc        

from both populations for MAF<0.01 variants to estimate , and obtained the following                φc          

maximum likelihood estimates and 95% confidence intervals: 0.141 (0.046,0.238),              φLoF /LoF =    

0.083 (-0.009,0.175), and 0.007 (-0.028,0.042).φLoF /miss = φ

miss/miss =   

 

For biallelic genotypes, we can substitute into the expression above, substitute for            φc             Opr,c    

and rearrange to obtain a maximum likelihood estimate of :(b ) Eprobands

πc  

1 )) ≈( )πc = ( Opr,c

λ Npr,c pr

− ( − φc φcausal,c

1−e−φ λcausal,c pr,c Opr,c

λ Npr,c pr

− 1 + φc φ,c1−e−φ λ,c pr,c

  

 

We cannot disentangle and with the available data, but we find that the ratio      φcausal,c   φ

lethal,c                    

makes very little difference to the estimate of , so we make the assumption thatφlethal,c

φcausal,c                   πc              

. The 95% confidence interval is thenφcausal,c = φc              

.( ) , ) ][ λ Npr,c pr

O −1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c ( λ Npr,c pr

O +1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c

  

Page 12: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

De novo mutations were called as previously described (4), selecting the threshold on ppDNM                           

(posterior probability of a de novo mutation) such that the observed number of synonymous de                             

novos matched the number expected. Using Sanger validation data from an earlier dataset (1),                           

we adjusted the observed number of mutations to account for specificity and sensitivity as                           

follows: 

Ode novo, adjusted = 0.95β

O αde novo, raw  

where is the positive predicted value (the proportion of candidate mutations that are true  α                            

positive at the chosen threshold) and is the sensitivity to true positives at the same threshold.            β                      

The adjustment by 0.95 is due to exome sequencing being only about 95% sensitive. The                             

overall de novo mutation rate was calculated in different sets of probands using the          λde novo,c,pr                  

model from (35), adjusting for sex, as described previously (4) .   

Since we cannot estimate for de novo mutations using the parents, as we did for recessive        φc                        

variants, we instead set to 0.099, the fraction of genes with pLI>0.99. This estimate is        φDN LoF                        

more speculative than the directly observed depletion of biallelic genotypes above, but we note                           

that the estimate of for the full set of 7,832 trios only increases from ~0.129 to ~0.154        πDN LoF                          

if we increase from 0.01 to 0.3. To estimate , we make use of this      φDN LoF             φ

DN missense            

relationship: 

≈λ φ N Pr(DDD|c)πc c c BI  where is the population of the British Isles, and is the probability that an  N

BI                  r(DDD|c)P            

individual is recruited to the DDD given he/she has a pathogenic mutation of class c. If we                                 

assume that this recruitment probability is the same for de novo missense mutations as for de                               

novo LoFs, we can write: 

=πDN LoF

πDN missense

λ φDN LoF DN LoF

λ φDN missense DN missense  

 

We know and , will assume so we can estimate ,    λDN missense     λ

DN LoF       .099φDN LoF = 0           π

DN LoF  

and can thus write the number of de novo missense mutations we expect to see as: 

Page 13: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

+ ) NE(mpr,DN missense = (1 )λ− φ

DN missense pr,DN missense pr Npr

(φ λ ) πDN missense prDN missense

2DN LoF

(λ φ )(1−e )DN LoF DN LoF

−φ λDN missense pr,DN missense

 

 

We calculated for a range of values of and found that    )E(mpr,DN missense               φ

DN missense      

best matched the observed data, so we used this value for estimating.036φDN missense = 0                          

.πDN missense  

 

Structural analysis of EIF3F 

This is shown in Fig. S11. Human EIF3:f (pdb 3j8c:f) was submitted to the Protein structure                               

comparison service PDBeFold at the European Bioinformatics Institute (36, 37 ). Of the close                         

structural matches returned, the X-ray yeast structure pdb entry 4OCN was chosen to display                           

the human variant position, as the structural resolution (2.25Å) was better than the human                           

EIF3:f pdb 3j8c:f structure (11.6Å) and it was the most complete structure among the yeast                             

models. In order to map the Phe232 variant onto the equivalent position on the yeast structure,                               

the structural alignment from PDBeFold was used. Solvent accessibility was calculated using                       

the Naccess software (38) using the standard parameters of a 1.4Å probe radius. Amino acid                             

sequence conservation was calculated using the Scorecons server (39) and displayed using                       

sequence logos ( 40).    

Introducing the EIF3F Phe232Val variant with CRISPR/Cas9 

The Phe232Val mutation in the human EIF3F gene was generated by a single base substitution                             

(T>G) using CRISPR/Cas9-induced homology directed repair in the kolf2_C1 human iPSC line,                       

a clonal derivative of the kolf2 line generated by the HipSci consortium (41 ). This was achieved                               

by nucleofection (Lonza, P3 buffer, program CA137) of 106 cells with Cas9-crRNA-tracrRNA                       

ribonucleoprotein (RNP) complexes. Synthetic RNA oligonucleotides (target site:               

5’-AGGCGTGAACATCACTCCCA-3’, 225 pmol crRNA/tracrRNA, IDT) were annealed by               

heating to 95°C for 2 min in duplex buffer (IDT) and cooling slowly, followed by addition of                                 

122 pmol recombinant eSpCas9_1.1 protein (in 10 mM Tris-HCl, pH 7.4, 300 mM NaCl, 0.1                             

Page 14: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

mM EDTA, 1 mM DTT, PMID: 26628643), and 500 pmol of a 100 nt ssDNA oligonucleotide                               

(5’-CTCCGATGCGTTCAGTGTCGTAGTACGCGTATTTCACTGTCAGAGGCGT 

GACCATCACTCCCATGGTCCTCCCAGGGACTCCCATTAAAGTGCTGCGGAAGG-3’, 

IDT Ultramer) as a homology-directed repair template to introduce the desired base change.                         

After recovery, plating at single cell density and colony picking into 96 well plates, 384 clones                               

were screened for heterozygous and homozygous mutations by high throughput sequencing of                       

amplicons spanning the target site using an Illumina MiSeq instrument.  

 

Crude DNA lysates were prepared by incubation of cells in 100 l of yolk sac lysis buffer (10                                   

mM Tris-HCl pH 8.3, 50 mM KCl, 2 mM MgCl2, 0.45% IGEPAL CA-630, 0.45% Tween 20,                               

400 g/ml proteinase K) at 60°C 1 h, 95C 10 min followed by dilution 10x in water. The region                                     

surrounding the mutation was amplified by nested PCR using 1 l diluted lysate and KAPA HiFi                               

hotstart polymerase (Kapa Biosystems) by 35 cycles of {98°C 20 s, 65°C 15 s, 72°C 30 s},                                 

using primers eIF3F_500_F and eIF3F_500_R (see below). Subsequently, reactions were                   

diluted 10x and re-amplified with primers eIF3F_200_F and eIF3F_200_R (see below) (Tm =                         

60°C) to ensure specificity for the EIF3F gene. Indexed Illumina sequencing adaptors were                         

added by a third round of PCR to identify the location of positive clones. Final cell lines were                                   

further validated by Sanger sequencing. Since this region is highly similar to several other                           

regions in the genome, we analysed the final clones for off-target mutagenesis at the                           

homologous sites, and found no evidence of this at the two closest off-target sites on                             

chr2:58252105-58252196 and chr12:5032866-5032941 in any of the lines obtained.  

 

Primer sequences: (5’-3’) 

eIF3F_500_F: tctggggttcatttggtccc 

eIF3F_500_R: ctgctcagtcacacttcctg 

eIF3F_200_F: ACACTCTTTCCCTACACGACGCTCTTCCGATCTagacatggaaaccctctccc 

eIF3F_200_R: TCGGCATTCCTGCTGAACCGCTCTTCCGATCTagtcccttttcaaaccaccc 

 

Page 15: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Investigating EIF3F protein levels in iPSC lines containing the EIF3F variant  

iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in                       

TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO 2:95% air.                       

At 75% confluence, iPS cells were incubated in Gentle Cell Dissociation Reagent (07174,                         

Stemcell Technologies) for 5 min, collected into a tube, centrifuged at 300 g for 3 min and                                 

resuspended in ice-cold 20 mM Tris-HCL pH 7.4 containing protease inhibitor cocktail                       

(1861281, Thermo Scientific). Cells were agitated at 4oC for 30 min before passing through a 26                               

gauge needle several times and centrifuging at 13000 g for 15 min at 4oC. Protein in the                                 

supernatant was quantified using the Qubit fluorometer (Invitrogen). We repeated this for two                         

independent cultures of iPSC lines. Samples were reduced (NP0009, Invitrogen) and mixed                       

with sample buffer (NP0007, Invitrogen) before an equal amount of total protein from each cell                             

line (6 µg for culture 1, 4 µg for culture 2) was electrophoresed through a 4-12% Bis-Tris gel                                   

(NP0322, Invitrogen) and transferred onto a membrane (IPVH00010, Immobilon P) using the                       

X-Cell SureLock system, according to the manufacturer’s protocol (E19051, Invitrogen). Blots                     

were blocked in 5% non-fat milk (Marvel) in tris buffered saline containing 0.1% Tween 20                             

(TBST) for 1 h at room temperature. Primary antibodies were as follows: 1:500 anti-rabbit                           

EIF3F (638202, Biolegend) or 1:1000 beta III tubulin (EP1569Y; AB52623, Abcam), both                       

diluted in 5% non-fat milk in TBST and incubated overnight at 4oC. Rabbit-HRP-linked                         

secondary antibody (7074, Cell Signalling Technologies) was diluted 1:2000 in 5% non-fat milk                         

in TBST and incubated at room temperature for 2 h. Blots were incubated for 2 min with                                 

Western Bright ECL Spray (Advansta) followed by detection with the ImageQuant LAS 4000                         

(GE Healthcare). Band intensities were measured using ImageJ 1.48v and EIF3F expression                       

values were normalised to beta-tubulin. 

 

We jointly analysed the data from the two independent replicates, having normalised the                         

relative expression values by dividing by the mean of the wild-type lines for each replicate. We                               

tested for lower relative EIF3F expression in the homozygous cells compared to heterozygous                         

and wild-type cells combined using a one-sided t-test (Fig. S10). 

Page 16: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Investigating protein synthesis in iPSC lines containing the EIF3F variant 

Protein synthesis was assayed using the Click-iT L-homopropargylglycine (HPG) Alexa Fluor                     

488 Protein Synthesis Assay Kit (C10429, Molecular Probes). Briefly, in this assay, we took                           

iPSCs with or without the EIF3F variant and depleted them of methionine to stop translation,                             

then added a methionine analogue that contains an alkyne moiety which is incorporated into                           

nascent proteins. We then added Alexa Fluor 488 azide, which leads to a chemoselective "click"                             

reaction between the fluorescent azide and the alkyne, allowing the modified proteins to be                           

detected by image-based analysis as a proxy for the amount of protein synthesis.  

 

We optimised the assay for use with iPSCs, using the translational elongation inhibitor                         

cycloheximide as a control to confirm the sensitivity and reproducibility of the assay (Fig. S12).                             

iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in                       

TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO2:95% air.                       

Cells were washed in PBS (D8537, Sigma), dissociated with Accutase (A11105-01, Gibco),                       

then 30 000 cells were seeded per well of a 96 well plate, in triplicate in TeSR-E8 containing 10                                     

M ROCK inhibitor (Y-27632 dihydrochloride; Ab120129, Abcam). The following day, cells                     

were depleted of methionine for 1 h by incubation in methionine-free DMEM (21013-024,                         

Gibco), which was supplemented with insulin, transferrin and selenium (41400045, Gibco), 20                       

ng/ml human FGF-basic protein (100-18C, PeproTech), sodium pyruvate (11360-039, Gibco),                   

L-glutamine (25030-81, Gibco) and N-acetyl cysteine (A7250, Sigma), at 37℃ in 5%CO2:95%                       

air. The media was then changed to methionine-free DMEM plus supplements and 100 M                           

Click-iT HPG (L-homopropargylglycine; C10202, Molecular Probes), a methionine-analogue,               

or vehicle (DMSO; D2650, Sigma), and cells were incubated for 2 h at 37℃ in 5%CO2:95%                               

air. As a control, cycloheximide was added to control wells at the same time, at different                               

concentrations (Fig. S12B).  

 

At the end of the incubation, cells were washed briefly with PBS, trypsinised for 5 min                               

Page 17: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

(25300-054, Gibco) to dissociate single cells, diluted in DMEM containing 20% serum to                         

quench the trypsin, transferred to a V-bottom 96-well plate and triplicate reactions were pooled.                           

Cells were then processed using standard immunocytochemical methods through a series of                       

incubations, centrifugations (all performed at 300 g for 5 min) and washes in PBS, all at room                                 

temperature, in the following order: incubation in 3.7% formaldehyde in PBS (28906, Thermo                         

Scientific) for 15 min to fix the cells; incubation in 0.5% Triton X-100 (X100, Sigma) for 20                                 

min to permeabilise the cells; incubation in 3% bovine serum albumin (A9543, Sigma) in PBS                             

for 10 min to block non-specific binding sites; incubation in 100 l Click-iT reaction cocktail,                             

which was prepared according to the manufacturer’s recommendation, for 1 h, followed by a                           

wash with Click-iT reaction rinse buffer, before resuspending in PBS and analysing                       

fluorescence at 488 nm on a BD LSRFortessa (BD Biosciences).  

 

This experiment was repeated three times for each of the two independent cell lines, for each of                                 

the three genotypes (wild-type, heterozygous, homozygous). Data were analysed using FlowJo                     

(v10.4.2). After gating on single cells, we obtained a bimodal distribution of Alexa Fluor                           

488-positive cells (Fig. S12A), due to the different rates of protein synthesis during the cell                             

cycle (G1/G2 have high rates of protein synthesis and S/M phase have lower rates). We gated                               

on the AlexaFlour 488-positive population (fluorescence > 104), which comprised ~80% of the                         

cells in the absence of cycloheximide (Fig. S12A). This proportion did not significantly depend                           

on genotype, replicate or cell line (ANOVA p>0.05). After gating, there were an average of 139                               

cells per replicate (standard deviation = 76), on which we determined the median fluorescence                           

intensity. We then tested for the effect of genotype on median fluorescence intensity using                           

linear regression, controlling for replicate. As a summary, we averaged the median fluorescence                         

for the two cell lines of the same genotype for each replicate, calculated the ratio of homozygote                                 

to wild-type, and then averaged this across replicates. 

 

Investigating proliferation and apoptosis in iPSC lines containing the EIF3F                   

variant  

Page 18: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

We investigated proliferation of iPS cell lines using two methods. In the first assay, 50,000                             

single cells were plated and four days later total cells were counted (Fig. 13A). In the second                                 

assay, the membrane of single cells was stained with the fluorescent dye CellTrace Violet TM, the                             

fluorescence of which reduces with each cell division and was assessed using flow cytometry                           

(Fig. 3C; Fig. 13B).   

 

For these assays, iPS cells were cultured on Vitronectin XF (07180, Stemcell                       

Technologies)-coated plates in TeSR-E8 medium (05990, Stemcell Technologies) and                 

incubated at 37oC in 5%CO2:95% air. At 65% confluence, iPS cells were washed with PBS                             

(D8537, Sigma), dissociated with Accutase (A11105-01, Gibco), resuspended in 10X volume of                       

TeSR-E8 medium (05990, Stemcell Technologies) containing 10 μM ROCK inhibitor (Y-27632                     

dihydrochloride; Ab120129, Abcam), passed through a 10 μm cell strainer (pluriStrainer,                     

pluriSelect; 43-10010-40) and centrifuged at 300 g for 3 min.  

 

In the first assay, cells were resuspended in TeSR-E8 medium and 50,000 live cells (assessed                             

using Trypan blue exclusion; Invitrogen, 15250061) were plated per well of a 6-well plate                           

containing TeSR-E8 medium and 10 μM ROCK inhibitor. Cells were cultured for the next four                             

days in TeSR-E8 medium, before trypsinising for 5 min (25300-054, Gibco), and counting the                           

total number of live cells (Figure S13A).  

 

In the second assay, cells were resuspended in PBS containing 4 μM CellTrace Violet(TM)                           

(Molecular Probes, C34557) and incubated at room temperature, in the dark for 20 min. Cells                             

were diluted in TeSR-E8 medium containing 10 μM ROCK inhibitor and centrifuged at 300 g                             

for 3 min, and then 40,000 cells were plated per well of a 48-well plate. Cells were cultured for                                     

the next two days in TeSR-E8 medium, washed in PBS, dissociated to single cells with                             

Accutase, transferred to a V-bottom plate and centrifuged at 400 g for 4 min before being fixed                                 

in 100 μl fixation buffer (eBioscience™ IC Fixation Buffer, 00-8222-49) and resuspended with                         

staining buffer (eBioscience™ Flow Cytometry Staining Buffer, 00-4222-26). Cells stained and                     

Page 19: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

fixed at day zero were used as a ‘no division’ control. Note that Fig. 3B shows the distributions                                   

from one representative replicate per genotype. The data were consistent across cell lines and                           

replicates (Fig. S13B) and clearly showed that the homozygous lines had a higher fraction of                             

cells that had only undergone one division than both the wild-type and heterozygous cells. To                             

test this formally, we fitted a linear regression of the number of cells in division 1 against the                                   

genotype (0=wild-type or heterozygous; 1=homozygous) and the total number of cells. 

 

We assessed apoptosis by staining with annexin V conjugated to a fluorescent dye and detected                             

dead cells with propidium iodide (PI). Briefly, phosphatidylserine is externalised in apoptotic                       

cells and is bound by recombinant annexin V-FITC (green fluorescence), while PI is excluded                           

by the membrane of live cells and only stains necrotic cells or those in late apoptosis with red                                   

fluorescence. After incubation with both probes, early apoptotic cells show green fluorescence,                       

and late apoptotic cells show both red and green fluorescence. Necrotic cells show red only and                               

live cells show little or no fluorescence. We used a topoisomerase inhibitor (topotecan) as a                             

positive control in this assay. Cells were harvested at 65% confluence (including the positive                           

control pre-incubated for 2 h in 10 μM topotecan) by dissociation with Accutase, and processed                             

according to the manufacturer’s protocol (Dead Cell Apoptosis Kit; Invitrogen, V13242).  

 

Flow cytometry data were acquired on an LSRFortessa (Becton Dickinson) instrument.                     

Doublets cells were excluded and CTV, Annexin V or PI fluorescence analyzed using FlowJo                           

10.4.2 software package (LCC). Fig. 3B and Fig. S13B summarise the results from the CTV                             

proliferation assay, and Fig. S14 shows the results from the apoptosis assay. 

 

Validation of KDM5B variants by targeted re-sequencing 

We re-sequenced all KDM5B de novo mutations and inherited LoF variants, with the exception                           

of two large deletions. PCR primers were designed using Primer3 to amplify the site of interest,                               

generating approximately a 230 bp product centred on the site. PCR amplification of the                           

targeted regions was carried out using JumpStart TM AccuTaqTM LA DNA Polymerase                     

Page 20: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

(Sigma-Aldrich), using 40 ng of input DNA from the proband and their parents. Unique                           

identifying tag sequences were introduced into the PCR amplicons in a second round of PCR                             

using KAPA HiFi HotStart ReadyMixPCR Kit (KapaBiosystems). PCR amplicons were pooled                     

and 96 products were sequenced in one MiSeq lane using 250 bp paired-end reads. Reference                             

and alternate read counts extracted from the resulting bam files and were used determine the                             

presence of the variant in question. In addition, read data were visualised using IGV. 

 

Transmission-disequilibrium test on KDM5B LoFs 

We observed 15 trios in which one parent transmitted a LoF to the child, 5 trios in which one                                     

parent had a LoF that was not transmitted, 2 quartets in which one parent had a LoF that was                                     

transmitted to one out of two affected children, and 4 trios in which both parents transmitted a                                 

LoF to the child. We tested for significant over-transmission using the                     

transmission-disequilibrium test as described by Knapp (42 ) . There were 7 LoFs (including one                         

large deletion) observed in probands whose parents were not originally sequenced, which we                         

excluded from the TDT. Of the six for which we attempted validation and segregation analysis,                             

one was found to be de novo and five inherited.  

 

Searching for coding and regulatory modifiers of KDM5B 

We defined a set of genes that might modify KDM5B function as: interactors of KDM5B                             

obtained from the STRING database of protein-protein interactions (43) (HIST2H3A , MYC,                     

TFAP2C , CDKN1A, TFAP2A, SETD1A, SETD1B, KDM1A, KDM2B, PAX9) plus those                   

mentioned by Klein et al. ( 44 ) (RBBP4, HDAC1, HDAC4 , MTA2 , CHD4 , FOXG1, FOXC9), as                           

well as all lysine demethylases, lysine methyltransferases, histone deacetylases, and SET                     

domain-containing genes from http://www.genecards.org/. The final list contained 95 genes. We                     

looked for LoF or rare missense variants in these genes in the monoallelic KDM5B LoF carriers                               

that might have a modifying effect, but found none that were shared by more than two of the de                                     

novo carriers. 

 

Page 21: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

We also looked for indirect evidence of a regulatory “second hit” near KDM5B by examining                             

the haplotypes of common SNPs in the region (Fig. S15). DDD probands and a subset of their                                 

parents were genotyped on either the Illumina OmniExpress chip or the Illumina CoreExome                         

chip. We performed variant and sample quality control for each dataset separately. Briefly, we                           

removed variants and samples with high data missingness (>=0.03), samples with high or low                           

heterozygosity, sample duplicates, individuals of African and East Asian ancestry, and SNPs                       

with MAF<0.005. We then ran SHAPEIT2 ( 45) to phase the SNPs within 2Mb either side of                               

KDM5B. To make Fig. S15, we used the heatmap() function in R to cluster the phased                               

haplotypes using the default hierarchical clustering method (based on Euclidean distance).   

 

Analysis of DNA methylation in KDM5B probands 

DNA from 64 DDD whole blood samples comprising 41 probands with a KDM5B variant and                             

23 negative controls was run on an Illumina EPIC 850K methylation array. Negative controls                           

were selected from DDD probands with de novo mutations in genes not expressed in whole                             

blood (SCN2A , KCNQ2, SLC6A1 , and FOXG1), since we would not expect these to                         

significantly impact the methylation phenotype in that tissue. Samples were randomised on the                         

array to reduce batch effects, and were QCed using a combination of data from control probes                               

and numbers of CpGs that failed to meet the standard detection p-value of 0.05. Based on these                                 

criteria, two samples failed and were excluded from further analysis (one of the negative                           

controls and one of the inherited KDM5B LoF carriers).  

 

We looked at methylation levels in the KDM5B LoF carriers to search for an “epimutation”                             

(hypermethylation on or around the promoter) that might be acting as second hit. We analyzed a                               

subset of CpGs in and around the KDM5B promoter region: the CpG island in the KDM5B                               

promoter itself, and a CpG island in the promoter of KDM5B-AS1, a lnc-RNA not specifically                             

associated with KDM5B, but also highly expressed in the testis. We also extended analysis 5kb                             

on either side of the start and stop sites of the KDM5B promoter. We examined the distribution                                 

Page 22: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

of the beta values (the ratio of methylated to unmethylated alleles) at each of the CpGs in the                                   

10kb region (Fig. S16). 

 

A final post-QC set of 754,273 methylation probes and samples were analysed for                         

whole-genome methylation changes using principal component analysis (Fig. S17). One of the                       

samples with a de novo was removed from PCA analysis as it was a significant outlier. PCA                                 

was run on the post-QC set of beta values using the standard prcomp() functions in R (version                                 

3.4.1). 

 

Generation of the KDM5B knockout mice  

A mouse Kdm5b loss-of-function allele (MGI:6153378) was generated by                 

CRISPR/CAS9-mediated deletion of coding exon 7 (ENSMUSE00001331577), leading to a                   

premature translational termination due to a downstream frameshift. CAS9 mRNA (46) and two                         

pairs of guide RNAs flanking exon 7 (targeting CCTAGTAACACTAGGTGTTAATA,                 

GTGTTTGGTTGTCAGTTAGAGGG, CCTCTCGTACATACATCCTAGGC,   

CCTAGGCTCGAACTTCACCATGT) were injected into 1-cell stage C57BL/6NJ zygotes,               

which were then implanted into host mothers. Mice born from these transfers were tested for                             

germ line transmission, and identified founders were bred on a C57BL/6NJ background to                         

establish colonies for further testing. 

 

Breeding and housing of mice and all experimental procedures were assessed by the Animal                           

Welfare and Ethical Review Body of the Wellcome Sanger Institute and conducted under the                           

regulation of UK Home Office license (P6320B89B), and in accordance with institutional                       

guidelines. From in-crosses of heterozygous Kdm5b mutant mice, we recovered 350 pups at                         

weaning, of which 39 were homozygous (11.1% versus 25% expected). Kdm5b -/- pups were                         

raised and weaned together with Kdm5b+/+ pups. Mice were housed in mixed genotype cages                           

(2-5 mice) with food and water ad-libitum, under controlled temperature and humidity and a                           

Page 23: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

12h light cycle (light on at 7am) at the Research Support Facility of the Wellcome Sanger                               

Institute.  

  

Phenotyping of wild-type and KDM5B knockout mice 

At 10 weeks of age, a cohort of 16 wild-type and 16 homozygous knockout male mice                               

underwent a series of behavioral tests. These were carried out between 9am – 5pm, after 1h of                                 

habituation to the testing room. Experimenters were blind to genotype, mouse movements were                         

recorded with an overhead infrared video camera and tracked by automated video tracking                         

(Ethovision XT 11.5, Noldus Information Technology).   

 

Light-dark box (Fig. 4C) : This test was adapted from Gapp et al. (47) . Mice were housed in                                 

pairs for 20 minutes before introducing them individually into the light-dark box, a plastic box                             

(40 × 42 × 26 cm) divided in two compartments. One is smaller, closed and dark (1/3 of the                                   

total surface area) and is connected through a door (5 cm) to a larger, brightly lit (370 lux with                                     

an overhead lamp) compartment (2/3 of the box). Each mouse was placed in the dark                             

compartment, the door was then opened and the animal was left to explore for 10 min. The time                                   

spent in the light compartment was tracked and the difference between genotypes was estimated                           

using a t-test. 

 

Three-Chamber Sociability Assay (Fig. 4D): The protocol was adapted from Dias et al. (48 ).                           

In brief, the test arena was divided into 3 equally sized chambers (25 x 37.5 cm) connected by                                   

small doors (5 x 5 cm). Mice were first individually habituated to the arena for 5 minutes, where                                   

the central chamber was empty and each of the side chambers had an empty cylindrical corral (8                                 

cm diameter, 15 cm tall, 1 cm gap between bars). The mouse was then returned to the central                                   

chamber and doors were closed, while a stimulus mouse of a different strain (129Sv) was                             

introduced into one of the corrals. Doors were opened and the test mouse was allowed to                               

explore all three chambers: one side chamber with an empty corral and the other with a corral                                 

containing a freely moving novel mouse, for 5 minutes. The time spent investigating both                           

Page 24: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

corrals was recorded and measured as total investigation. The time spent investigating the                         

mouse was assessed as a percentage of the total investigation time, i.e. [time investigating corral                             

with mouse]/[time investigating corral with mouse + time investigating empty corral]. A t-test                         

was used to evaluate the difference between genotypes of the time spent investigating the                           

mouse.  

 

Social Recognition Assay (Fig. 4E): This paradigm assesses memory and relies on the natural                           

preference of mice to investigate a novel mouse over a familiar one. The assay consists of two                                 

days, as reported by Dias et al. ( 48). On both days, mice were habituated for 10 minutes to the                                     

test box, an empty, clean cage. Briefly, on the first day, mice were tested on a habituation                                 

paradigm to assess their olfaction and social investigation. An anesthetized stimulus mouse was                         

placed in the center of the arena for 1 minute, repeated 4 times at 10 minute intervals. On the                                     

5th trial, a novel mouse was presented. Both Kdm5b wild-type and homozygous knockout mice                           

showed a similar decreasing investigation of the repeatedly-presented stimulus mouse, and an                       

increased investigation on trial 5 of the novel mouse. On day 2, after 24 hours, the                               

discrimination test was performed. Mice were simultaneously presented with the familiar mouse                       

from Day 1 and a completely novel mouse. The amount of time the test animal spent                               

investigating each stimulus mouse by close-proximity (sniffing, oronasal contact, or                   

approaching within 1–2 cm) was recorded. Using predefined exclusion criteria, one wild-type                       

mouse was excluded on day 1, and a second wild-type was excluded on day 2, both for                                 

insufficient overall investigation. A two-way ANOVA was used to assess the discrimination                       

difference between genotypes, followed by Bonferroni's multiple comparisons test (two                   

comparisons, familiar versus unfamiliar mouse for wild type and mutant).  

 

Barnes Maze probe trial (Fig. S19AB) : This assay is a test of visuo-spatial learning and                             

memory on a circular maze (120cm diameter table) with 20 holes around the perimeter. One of                               

the holes leads to a small dark box (Target) where they can escape from the brightly lit maze.                                   

Mice were trained for three days, 10 trials (4 min maximum each), to find the target location.                                 

Page 25: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

On the probe trial, 72h after the last training day, the escape box was removed. Each mouse was                                   

given 4 minutes to explore the maze. The mouse's movements were tracked, and the amount of                               

time spent around each of the holes was analyzed. Data is expressed as the percentage of time                                 

spent dwelling around each hole, relative to the total amount of time spent investigating all                             

holes. With 20 holes to investigate, the amount of time spent by chance around each hole would                                 

be 5%. The difference between genotypes in remembering the target location was evaluated                         

using a two-way ANOVA with Bonferroni's multiple comparisons test.   

 

Statistical analysis for all mouse behavior experiments was performed with GraphPad Prism 7                         

(GraphPad Software).  

 

X-rays (Fig. S19C): Seven wild-type and seven homozygous Kdm5b knockout mice were                       

anaesthetised with ketamine/xylazine (100mg/ 10mg per kg of body weight) and then placed in                           

an MX-20 X-ray machine (Faxitron X-Ray LLC). Whole body radiographs were taken in                         

dorso-ventral and lateral positions. Images were then analysed and morphological abnormalities                     

assessed using Sante DICOM Viewer v7.2.1 (Santesoft LTD). 

 

 

 

 

 

 

 

 

Supplementary Figures 

Page 26: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S1: Flow diagram indicating the final number of samples used in the various analyses, and                               

why samples were removed. Note that we use “diagnosed” and “undiagnosed” as shorthand in                           

the manuscript, but these terms are not fully accurate descriptors of these groups (see Methods                             

section on “Sample quality control and subsetting”). The “diagnosed” probands include only                       

those with diagnostic dominant or X-linked exonic variants. The “undiagnosed” set includes                       

193 patients who had diagnostic variants in recessive DDG2P genes or potentially diagnostic                         

variants in monoallelic or X-linked DDG2P genes but had high autozygosity or affected siblings 

 

Page 27: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S2: Principal components analysis of the 1000 Genomes Phase 3 samples (left) with DDD                             

samples projected on top of them (right). The ellipses used to define the EABI and PABI                               

populations in DDD are shown on the PC2 versus PC3 plot.  

 

 

 

 

Page 28: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S3: Distribution of the number of organ systems affected per proband (10 ). P-values are                             

from Wilcoxon rank-sum tests.  

 

 

Page 29: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S4: Number of observed and expected biallelic synonymous genotypes per individual,                       

estimated across 5684 EABI and 356 PABI probands. The grey lines indicate the observed                           

number and the points with small black lines represent the expected number with a 95%                             

confidence interval. Two-sided p-values are indicated. The expected numbers were calculated in                       

different ways as described in Methods. Note that the estimate based on the DDD parental                             

haplotypes (black points) best matches the observed data, and that use of the ExAC frequencies                             

or the polynomial model of Jin et al. (3) gives estimates of the expected number that are                                 

significantly different from the observed number. 

 

 

 

 

 

 

 

 

 

Page 30: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S5: Histograms of levels of autozygosity across EABI and PABI probands. 

 

 

 

 

 

 

 

 

Page 31: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

Fig. S6. Burden of biallelic damaging genotypes in different gene sets, for undiagnosed EABI                           

and PABI probands combined (N=4,651). This includes biallelic LoFs, biallelic damaging                     

missense and compound heterozygous LoF/damaging missense. The lines show 95%                   

confidence intervals and the p-values are from a one-sided Poisson test. Note the strong                           

enrichment in known recessive DD genes and genes predicted to be intolerant of biallelic LoFs                             

based on the pRec score (22), and lack of enrichment in known dominant DD genes and genes                                 

predicted to be intolerant of monoallelic LoFs (pLI>0.9). 

 

Page 32: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S7: Estimates of , the proportion of biallelic genotypes that are causal for DD or lethal.        φ                          

These were estimated from the parental data. See Methods for details. The points show                           

maximum likelihood estimates and the lines show 95% confidence intervals. Points at the same                           

MAF cutoff have been slightly scattered along the x-axis for ease of visualisation.  

 

 

Page 33: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S8: Distribution of the ranks of minimum p-values per gene for known recessive genes                             

versus all other genes. The order of genes with the same minimum p-value was randomised. A                               

Kolmogorov-Smirnov (KS) test indicated that these distributions were significantly different.                   

This, combined with the fact that fourteen of the sixteen genes with p<10-4 are known recessive                               

DD genes, suggests our approach for identifying these genes is valid and Bonferroni correction                           

is conservative. 

Page 34: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S9: Anterior-posterior facial photographs of selected individuals with the homozygous                     

Phe232Val variant in EIF3F. DECIPHER IDs are shown in the top right corner. Affected                           

individuals did not have a distinctive facial appearance. Individual 265452 (leftmost) had                       

muscle atrophy, as demonstrated in photographs of the anterior surface of the hands which show                             

wasting of the thenar and hypothenar eminences. This is notable because it is only reported in                               

one other proband in the DDD study and, in mice, Eif3f has been shown to play a role in                                     

regulating skeletal muscle size via interaction with the mTOR pathway (49 ). None of the other                             

individuals were either assessed to have or previously recorded to have muscle atrophy.  

 

 

 

Page 35: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

 

 

 

Fig. S10. Results from Western blot showing lower levels of the EIF3F protein in iPSCs                             

homozygous for the Phe232Val variant. WT: wild-type, HET and HOM: heterozygous and                       

homozygous for Phe232Val. Note that two independent cell lines were used for each genotype,                           

and that the two culture represent independent replicates. Beta-tubulin was used as a                         

normalising control. Band intensities were quantified using ImageJ. We jointly analysed the                       

data from both independent cultures, having normalised the relative expression values by                       

dividing by the mean of the wild-type lines for each replicate. There was significantly lower                             

relative EIF3F expression in the homozygous cells compared to heterozygous and wild-type                       

cells combined (one-sided t-test; p=0.003), with a mean reduction of 26.6%. 

 

 

 

 

 

Page 36: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S11: Predicted structural effects of the pathogenic Phe232Val variant in EIF3F. The                         

secondary structure, domain architecture and 3D fold of EIF3F is conserved between species                         

but sequence similarity is low (29% between yeast and humans). A) Section of the amino acid                               

sequence logo for EIF3F where the strength of conservation across species is indicated by the                             

size of the letters. The sequence below represents the human EIF3F. Boxed characters are the                             

aromatic residues conserved between humans and yeast and proximal in space to Phe232. B)                           

Structure of the section of EIF3F containing the Phe232Val variant, highlighted in green.                         

Amino acids conserved between yeast and human sequences as highlighted in panel A are                           

shown in grey. The conserved Phe232 side chain is buried (solvent accessibility 0.7%) and                           

likely plays a stabilizing role, so the Phe232Val variant may disrupt protein stability. See (10 )                             

for details of structure prediction.  

Page 37: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

 

 

 

Fig. S12: Optimising the sensitivity of the Alexa Fluor 488 Protein Synthesis Assay Kit for                             

assessing protein synthesis in iPSCs with or without the EIF3F Phe232Val variant. A)                         

Representative histogram of fluorescence intensities (AlexaFluor-488), a proxy for the rate of                       

nascent protein synthesis, across multiple wild-type iPSCs measured by flow cytometry. This                       

plot shows a single replicate. We gated the major population (blue rectangle; i.e. fluorescence >                             

104), and used its median (red dotted line) in downstream analysis. B) Dose-response curve for                             

iPSCs treated with cycloheximide. Data are represented as mean and standard deviation of                         

median fluorescence intensity of the major Alexa-Fluor 488-positive population across four                     

independent replicates, three of which were run in parallel with experiments presented in Figure                           

3C. Note that, for the leftmost point, the error bars are smaller than the dot and thus not visible. 

 

 

 

Page 38: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S13. The EIF3F Phe232Val variant reduces proliferation of iPSCs in the homozygous but                           

not heterozygous state. (A) Mean cell counts four days after plating 50,000 cells per line, with                               

95% confidence intervals, from three independent replicates. (B) Proportions of cells that have                         

undergone zero, one or multiple divisions in a cell trace violet (CTV) proliferation assay. There                             

are four replicates for each of the two independent cell lines for each genotype. The                             

homozygous line had significantly more cells in division 1 than wild-type and heterozygous                         

lines (p=2x10-5; linear regression including total number of cells as covariate). See Fig. 3C for                             

representative distributions of CTV intensity, one for each genotype.  

Page 39: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

 

Fig. S14: No difference in cell apoptosis of iPSCs containing the EIF3F variant versus                           

wild-type cells. Graph shows mean and standard deviation of the percentage of live, apoptotic                           

or necrotic cells, assayed using annexin V or propidium iodide staining (see (10)). Data derived                             

from three replicates. 

Page 40: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S15: Plot showing haplotypes of common SNPs around KDM5B in individuals with de                           

novo missense or LoF mutations or with monoallelic or biallelic LoFs. These is no evidence for                               

a local haplotype shared by multiple probands with monoallelic LoFs that was not also present                             

in an unaffected parent with a monoallelic LoF. The region shown lies between two                           

recombination hotspots. The rows represent phased haplotypes, with orange and green                     

rectangles corresponding to the different alleles at the SNPs at the positions indicated along the                             

bottom. Hierarchical clustering has been applied to the haplotypes, as indicated by the                         

dendrogram on the left, and the labels on the right indicate which individual carries the                             

haplotype, and whether the individual was a proband carrying a de novo (purple), a biallelic                             

LoF (dark green), or an inherited heterozygous LoF (yellow), or a parent carrying a                           

heterozygous LoF (pink).  

Page 41: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

Fig. S16: Violin plots of the beta values (the ratio of methylated to unmethylated alleles) at each                                 

of the CpGs in the 10kb region around the KDM5B promoter. The CpGs within the KDM5B and                                 

KDM5B-AS1 promoters are annotated below the plot, with coordinates relative to hg19. The                         

bottom panel shows the negative controls (probands with likely causal de novo mutations in                           

known DD genes not expressed in blood), and the other panels show probands with variants in                               

KDM5B that are either biallelic (top panel), de novo (second panel) or monoallelic and inherited                             

(third panel). 

 

  

 

Page 42: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

 

Fig. S17: Results from a principal components analysis of genome-wide DNA methylation in                         

probands with de novo mutations or biallelic or inherited monoallelic LoFs in KDM5B. The lack                             

of differences between groups suggests that LoFs in KDM5B do not manifest in DNA                           

methylation changes, at least not in whole blood of children in this age range (age 18 months to                                   

16 years). 

 

 

Page 43: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

 

 

 

 

 

Fig. S18: Anterior-posterior facial photographs of one of the individuals with biallelic KDM5B                         

variants. Note the narrow palpebral fissures, arched or thick eyebrows, dark eyelashes, low                         

hanging columella, smooth philtrum and thin upper vermillion border.  

 

 

Page 44: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S19: Cognitive and skeletal defects of homozygous Kdm5b knockout mice. A) Impaired                         

long-term spatial memory in the Probe trial of Barnes Maze assay. 72 hours after learning the                               

location of the target hole, wild-type mice spent significantly more time around the target hole                             

compared to knockout mice. Two-way ANOVA, p=0.026 for hole ✕ genotype interaction; with                         

Bonferroni’s multiple comparison test, p<0.001 for Target hole. B) Representative heatmap of                       

wild-type mouse activity around the Barnes Maze during the 72 hour probe trial. C) Skeletal                             

defects identified from X-ray analysis. All homozygous Kdm5b knockout mice analysed (n=7)                       

had transitional vertebrae (one-tailed Fisher's exact test p=0.03) compared to none of the                         

wild-types (n=7). This is an X-ray image of the dorsal view of wild-type and homozygous                             

Kdm5b knockout mice. This case shows a transitional Lumbar (L1) to Sacral (S1) vertebra,                           

resulting in an extra Sacral vertebra (arrow), and the absence of one Thoracic vertebra (T13,                             

arrowhead). Green dots indicate Thoracic vertebra, blue are Lumbar, red are Sacral and brown                           

are Caudal vertebra.  

Page 45: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Fig. S20: Plots showing effect of variant filtering strategies on number of variants, Mendelian                           

errors and Ti/Tv. We first set genotypes to missing based on genotype quality (GQ), depth (AD)                               

and the p-value from a test of allele balance (pAB), and then removed sites according to the                                 

proportion of missing genotypes. 

 

Page 46: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Supplementary Tables 

 

Table S1: Summary of the phenotypes in the study, for different subsets of probands. We show                               

the proportion of patients with each higher-level phenotype (based on HPO terms), and results                           

from Fisher’s exact tests for a difference in these proportions between different subsets of                           

probands. We also show summary statistics for the number of HPO terms, various                         

developmental milestones, and results from Mann-Whitney U tests comparing these quantitative                     

metrics between different subsets of probands. We also summarise the number of probands who                           

achieved (or who had not yet achieved) the milestone at an age ≤2SD, 2SD-4SD, 4SD-8SD and                               

>8SD from the mean for normal individuals. For these, the distributions for normal children                           

were extracted from (50) for age at first words and (51 ) for age at walking. % Within the ID/DD                                     

category, the classes are mutually exclusive of one another and based on HPO terms that fall                               

under HP:0001249 (Intellectual Disability) or HP:0001263 (Global Developmental Delay). For                   

creating this combined category, we took the most severe or (if unclear) more specific term (e.g.                               

mild ID and severe/profound DD → severe/profound ID/DD; mild DD and DD of unknown                           

severity → mild ID/DD). & Several cognition-related terms do not come under HP:0001249                         

(Intellectual Disability) or HP:0001263 (Global Developmental Delay) (i.e. HP:0100543                 

(Cognitive impairment), HP:0001328 (Specific learning disability) and HP:0000750 (Delayed                 

speech and language development)), so we included these here. * For the mean, median,                           

standard deviation and Mann-Whitney U tests of the developmental milestones, if probands had                         

not yet achieved the milestone, we set them to missing if they were under age 18 months and to                                     

their age at assessment if over 18 months. + The organ system counts were carried out as                                 

described in the Methods, in order to avoid double-counting the same HPO term that occurred                             

under multiple organ systems. 

 

Table S2: Estimates of the , the proportion of probands explained by diagnostic biallelic          π                  

coding genotypes or de novo coding mutations, for different sample sets. Shown are the                           

Page 47: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

maximum likelihood estimates for , and a 95% confidence interval. See Methods for how        π                    

these were calculated.  

 

Table S3: Results from tests of an excess of damaging biallelic genotypes for all genes. The                               

lowest p-value out of the eight tests conducted for each gene is shown. We give results for the                                   

stringent ancestry filter (4318 EABI probands and 333 PABI probands), as shown in Table 1, as                               

well as the lenient ancestry filter (4942 European ancestry probands and 498 South Asian                           

ancestry probands). 

 

Table S4: Genes enriched for damaging biallelic coding genotypes with p<1×10-4 (for eight                         

tests; see ( 10)). Shown are the number of observed biallelic genotypes of different consequence                           

classes, the lowest p-value out of the eight tests (achieved using EABI alone for all genes except                                 

for VPS13B), details of the corresponding test, and the p-value for phenotypic similarity for the                             

relevant probands (9). Known recessive DD genes from the DDG2P list are indicated                         

(http://www.ebi.ac.uk/gene2phenotype/). LZTR1 has been implicated as a dominant cause of                   

Noonan syndrome ( 52) and there is some prior evidence supporting LINS as a recessive ID gene                               

(53, 54).  

Table S5: Phenotypes of the nine patients homozygous for the EIF3F Phe232Val variant. Only                           

five were used in the initial gene discovery analysis, which excluded one sibling from each of                               

families 2 and 3, 260950 (who had a potentially diagnostic inherited X-linked variant in                           

HUWE1 , subsequently deemed to be benign since it did not segregate with disease in his                             

family), and 265452 (who had no parental genetic data available). 

 

Table S6: De novo mutations in KDM5B from this and previous studies, and inherited LoFs in                               

KDM5B from this study. Note that, of the 26 probands with inherited LoFs, five of them were                                 

reported to have a parent who had at least one clinical phenotype shared with the child, but for                                   

only two families was this the parent who carried the LoF.  

Page 48: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

 

Table S7: Phenotypes of the probands with biallelic or de novo KDM5B variants. 

Page 49: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

References and Notes 1. The Deciphering Developmental Disorders Study, Large-scale discovery of novel genetic

causes of developmental disorders. Nature 519, 223–228 (2015). doi:10.1038/nature14135 Medline

2. R. K. C Yuen, D. Merico, M. Bookman, J. L Howe, B. Thiruvahindrapuram, R. V. Patel, J. Whitney, N. Deflaux, J. Bingham, Z. Wang, G. Pellecchia, J. A. Buchanan, S. Walker, C. R. Marshall, M. Uddin, M. Zarrei, E. Deneault, L. D’Abate, A. J. S. Chan, S. Koyanagi, T. Paton, S. L. Pereira, N. Hoang, W. Engchuan, E. J. Higginbotham, K. Ho, S. Lamoureux, W. Li, J. R. MacDonald, T. Nalpathamkalam, W. W. L. Sung, F. J. Tsoi, J. Wei, L. Xu, A.-M. Tasse, E. Kirby, W. Van Etten, S. Twigger, W. Roberts, I. Drmic, S. Jilderda, B. M. K. Modi, B. Kellam, M. Szego, C. Cytrynbaum, R. Weksberg, L. Zwaigenbaum, M. Woodbury-Smith, J. Brian, L. Senman, A. Iaboni, K. Doyle-Thomas, A. Thompson, C. Chrysler, J. Leef, T. Savion-Lemieux, I. M. Smith, X. Liu, R. Nicolson, V. Seifer, A. Fedele, E. H. Cook, S. Dager, A. Estes, L. Gallagher, B. A. Malow, J. R. Parr, S. J. Spence, J. Vorstman, B. J. Frey, J. T. Robinson, L. J. Strug, B. A. Fernandez, M. Elsabbagh, M. T. Carter, J. Hallmayer, B. M. Knoppers, E. Anagnostou, P. Szatmari, R. H. Ring, D. Glazer, M. T. Pletcher, S. W. Scherer, Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017). doi:10.1038/nn.4524 Medline

3. S. C. Jin, J. Homsy, S. Zaidi, Q. Lu, S. Morton, S. R. DePalma, X. Zeng, H. Qi, W. Chang, M. C. Sierant, W.-C. Hung, S. Haider, J. Zhang, J. Knight, R. D. Bjornson, C. Castaldi, I. R. Tikhonoa, K. Bilguvar, S. M. Mane, S. J. Sanders, S. Mital, M. W. Russell, J. W. Gaynor, J. Deanfield, A. Giardini, G. A. Porter Jr., D. Srivastava, C. W. Lo, Y. Shen, W. S. Watkins, M. Yandell, H. J. Yost, M. Tristani-Firouzi, J. W. Newburger, A. E. Roberts, R. Kim, H. Zhao, J. R. Kaltman, E. Goldmuntz, W. K. Chung, J. G. Seidman, B. D. Gelb, C. E. Seidman, R. P. Lifton, M. Brueckner, Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017). doi:10.1038/ng.3970 Medline

4. Deciphering Developmental Disorders Study, Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017). doi:10.1038/nature21062 Medline

5. H. H. Ropers, Genetics of early onset cognitive impairment. Annu. Rev. Genomics Hum. Genet. 11, 161–187 (2010). doi:10.1146/annurev-genom-082509-141640 Medline

6. L. E. L. M. Vissers, C. Gilissen, J. A. Veltman, Genetic studies in intellectual disability and related disorders. Nat. Rev. Genet. 17, 9–18 (2016). doi:10.1038/nrg3999 Medline

7. P. A. Baird, T. W. Anderson, H. B. Newcombe, R. B. Lowry, Genetic disorders in children and young adults: A population study. Am. J. Hum. Genet. 42, 677–693 (1988). Medline

8. S. J. Schrodi, A. DeBarber, M. He, Z. Ye, P. Peissig, J. J. Van Wormer, R. Haws, M. H. Brilliant, R. D. Steiner, Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data. Hum. Genet. 134, 659–669 (2015). doi:10.1007/s00439-015-1551-8 Medline

Page 50: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

9. N. Akawi, J. McRae, M. Ansari, M. Balasubramanian, M. Blyth, A. F. Brady, S. Clayton, T. Cole, C. Deshpande, T. W. Fitzgerald, N. Foulds, R. Francis, G. Gabriel, S. S. Gerety, J. Goodship, E. Hobson, W. D. Jones, S. Joss, D. King, N. Klena, A. Kumar, M. Lees, C. Lelliott, J. Lord, D. McMullan, M. O’Regan, D. Osio, V. Piombo, E. Prigmore, D. Rajan, E. Rosser, A. Sifrim, A. Smith, G. J. Swaminathan, P. Turnpenny, J. Whitworth, C. F. Wright, H. V. Firth, J. C. Barrett, C. W. Lo, D. R. FitzPatrick, M. E. Hurles; DDD study, Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat. Genet. 47, 1363–1369 (2015). doi:10.1038/ng.3410 Medline

10. Materials and methods are available as supplementary materials. 11. C. F. Wright, T. W. Fitzgerald, W. D. Jones, S. Clayton, J. F. McRae, M. van Kogelenberg,

D. A. King, K. Ambridge, D. M. Barrett, T. Bayzetinova, A. P. Bevan, E. Bragin, E. A. Chatzimichali, S. Gribble, P. Jones, N. Krishnappa, L. E. Mason, R. Miller, K. I. Morley, V. Parthiban, E. Prigmore, D. Rajan, A. Sifrim, G. J. Swaminathan, A. R. Tivey, A. Middleton, M. Parker, N. P. Carter, J. C. Barrett, M. E. Hurles, D. R. FitzPatrick, H. V. Firth; DDD study, Genetic diagnosis of developmental disorders in the DDD study: A scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). doi:10.1016/S0140-6736(14)61705-0 Medline

12. A. S. Teebi, Autosomal recessive disorders among Arabs: An overview from Kuwait. J. Med. Genet. 31, 224–233 (1994). doi:10.1136/jmg.31.3.224 Medline

13. G. O. Tadmouri, P. Nair, T. Obeid, M. T. Al Ali, N. Al Khaja, H. A. Hamamy, Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009). doi:10.1186/1742-4755-6-17 Medline

14. C. Gilissen, J. Y. Hehir-Kwa, D. T. Thung, M. van de Vorst, B. W. M. van Bon, M. H. Willemsen, M. Kwint, I. M. Janssen, A. Hoischen, A. Schenck, R. Leach, R. Klein, R. Tearle, T. Bo, R. Pfundt, H. G. Yntema, B. B. A. de Vries, T. Kleefstra, H. G. Brunner, L. E. L. M. Vissers, J. A. Veltman, Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344–347 (2014). doi:10.1038/nature13394 Medline

15. C. L. Beaulieu, L. Huang, A. M. Innes, M.-A. Akimenko, E. G. Puffenberger, C. Schwartz, P. Jerry, C. Ober, R. A. Hegele, D. R. McLeod, J. Schwartzentruber, J. Majewski, D. E. Bulman, J. S. Parboosingh, K. M. Boycott; FORGE Canada Consortium, Intellectual disability associated with a homozygous missense mutation in THOC6. Orphanet J. Rare Dis. 8, 62 (2013). doi:10.1186/1750-1172-8-62 Medline

16. A. Fogli, O. Boespflug-Tanguy, The large spectrum of eIF2B-related diseases. Biochem. Soc. Trans. 34, 22–29 (2006). doi:10.1042/BST0340022 Medline

17. V. Faundes, W. G. Newman, L. Bernardini, N. Canham, J. Clayton-Smith, B. Dallapiccola, S. J. Davies, M. K. Demos, A. Goldman, H. Gill, R. Horton, B. Kerr, D. Kumar, A. Lehman, S. McKee, J. Morton, M. J. Parker, J. Rankin, L. Robertson, I. K. Temple, S. Banka; Clinical Assessment of the Utility of Sequencing and Evaluation as a Service (CAUSES) Study; Deciphering Developmental Disorders (DDD) Study, Histone Lysine

Page 51: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Methylases and Demethylases in the Landscape of Human Developmental Disorders. Am. J. Hum. Genet. 102, 175–187 (2018). doi:10.1016/j.ajhg.2017.11.013 Medline

18. I. Iossifov, B. J. O’Roak, S. J. Sanders, M. Ronemus, N. Krumm, D. Levy, H. A. Stessman, K. T. Witherspoon, L. Vives, K. E. Patterson, J. D. Smith, B. Paeper, D. A. Nickerson, J. Dea, S. Dong, L. E. Gonzalez, J. D. Mandell, S. M. Mane, M. T. Murtha, C. A. Sullivan, M. F. Walker, Z. Waqar, L. Wei, A. J. Willsey, B. Yamrom, Y. H. Lee, E. Grabowska, E. Dalkic, Z. Wang, S. Marks, P. Andrews, A. Leotta, J. Kendall, I. Hakker, J. Rosenbaum, B. Ma, L. Rodgers, J. Troge, G. Narzisi, S. Yoon, M. C. Schatz, K. Ye, W. R. McCombie, J. Shendure, E. E. Eichler, M. W. State, M. Wigler, The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014). doi:10.1038/nature13908 Medline

19. G. L. Carvill, H. C. Mefford, Microdeletion syndromes. Curr. Opin. Genet. Dev. 23, 232–239 (2013). doi:10.1016/j.gde.2013.03.004 Medline

20. H. H. Ropers, T. Wienker, Penetrance of pathogenic mutations in haploinsufficient genes for intellectual disability and related disorders. Eur. J. Med. Genet. 58, 715–718 (2015). doi:10.1016/j.ejmg.2015.10.007 Medline

21. C. N. Vallianatos, S. Iwase, Disrupted intricacy of histone H3K4 methylation in neurodevelopmental disorders. Epigenomics 7, 503–519 (2015). doi:10.2217/epi.15.1 Medline

22. M. Lek, K. J. Karczewski, E. V. Minikel, K. E. Samocha, E. Banks, T. Fennell, A. H. O’Donnell-Luria, J. S. Ware, A. J. Hill, B. B. Cummings, T. Tukiainen, D. P. Birnbaum, J. A. Kosmicki, L. E. Duncan, K. Estrada, F. Zhao, J. Zou, E. Pierce-Hoffman, J. Berghout, D. N. Cooper, N. Deflaux, M. DePristo, R. Do, J. Flannick, M. Fromer, L. Gauthier, J. Goldstein, N. Gupta, D. Howrigan, A. Kiezun, M. I. Kurki, A. L. Moonshine, P. Natarajan, L. Orozco, G. M. Peloso, R. Poplin, M. A. Rivas, V. Ruano-Rubio, S. A. Rose, D. M. Ruderfer, K. Shakir, P. D. Stenson, C. Stevens, B. P. Thomas, G. Tiao, M. T. Tusie-Luna, B. Weisburd, H.-H. Won, D. Yu, D. M. Altshuler, D. Ardissino, M. Boehnke, J. Danesh, S. Donnelly, R. Elosua, J. C. Florez, S. B. Gabriel, G. Getz, S. J. Glatt, C. M. Hultman, S. Kathiresan, M. Laakso, S. McCarroll, M. I. McCarthy, D. McGovern, R. McPherson, B. M. Neale, A. Palotie, S. M. Purcell, D. Saleheen, J. M. Scharf, P. Sklar, P. F. Sullivan, J. Tuomilehto, M. T. Tsuang, H. C. Watkins, J. G. Wilson, M. J. Daly, D. G. MacArthur; Exome Aggregation Consortium, Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). doi:10.1038/nature19057 Medline

23. J. X. Chong, M. J. McMillin, K. M. Shively, A. E. Beck, C. T. Marvin, J. R. Armenteros, K. J. Buckingham, N. T. Nkinsi, E. A. Boyle, M. N. Berry, M. Bocian, N. Foulds, M. L. G. Uzielli, C. Haldeman-Englert, R. C. M. Hennekam, P. Kaplan, A. D. Kline, C. L. Mercer, M. J. M. Nowaczyk, J. S. Klein Wassink-Ruiter, E. W. McPherson, R. A. Moreno, A. E. Scheuerle, V. Shashi, C. A. Stevens, J. C. Carey, A. Monteil, P. Lory, H. K. Tabor, J. D. Smith, J. Shendure, D. A. Nickerson, M. J. Bamshad, M. J. Bamshad, J. Shendure, D. A. Nickerson, G. R. Abecasis, P. Anderson, E. M. Blue, M. Annable, B. L. Browning, K. J. Buckingham, C. Chen, J. Chin, J. X. Chong, G. M. Cooper, C. P. Davis, C. Frazar, T. M.

Page 52: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

Harrell, Z. He, P. Jain, G. P. Jarvik, G. Jimenez, E. Johanson, G. Jun, M. Kircher, T. Kolar, S. A. Krauter, N. Krumm, S. M. Leal, D. Luksic, C. T. Marvin, M. J. McMillin, S. McGee, P. O’Reilly, B. Paeper, K. Patterson, M. Perez, S. W. Phillips, J. Pijoan, C. Poel, F. Reinier, P. D. Robertson, R. Santos-Cortez, T. Shaffer, C. Shephard, K. M. Shively, D. L. Siegel, J. D. Smith, J. C. Staples, H. K. Tabor, M. Tackett, J. G. Underwood, M. Wegener, G. Wang, M. M. Wheeler, Q. Yi; University of Washington Center for Mendelian Genomics, De novo mutations in NALCN cause a syndrome characterized by congenital contractures of the limbs and face, hypotonia, and developmental delay. Am. J. Hum. Genet. 96, 462–473 (2015). doi:10.1016/j.ajhg.2015.01.003 Medline

24. M. D. Al-Sayed, H. Al-Zaidan, A. Albakheet, H. Hakami, R. Kenana, Y. Al-Yafee, M. Al-Dosary, A. Qari, T. Al-Sheddi, M. Al-Muheiza, W. Al-Qubbaj, Y. Lakmache, H. Al-Hindi, M. Ghaziuddin, D. Colak, N. Kaya, Mutations in NALCN cause an autosomal-recessive syndrome with severe hypotonia, speech impairment, and cognitive delay. Am. J. Hum. Genet. 93, 721–726 (2013). doi:10.1016/j.ajhg.2013.08.001 Medline

25. J. Rainger, D. Pehlivan, S. Johansson, H. Bengani, L. Sanchez-Pulido, K. A. Williamson, M. Ture, H. Barker, K. Rosendahl, J. Spranger, D. Horn, A. Meynert, J. A. B. Floyd, T. Prescott, C. A. Anderson, J. K. Rainger, E. Karaca, C. Gonzaga-Jauregui, S. Jhangiani, D. M. Muzny, A. Seawright, D. C. Soares, M. Kharbanda, V. Murday, A. Finch, R. A. Gibbs, V. van Heyningen, M. S. Taylor, T. Yakut, P. M. Knappskog, M. E. Hurles, C. P. Ponting, J. R. Lupski, G. Houge, D. R. FitzPatrick, M. Hurles, D. R. FitzPatrick, S. Al-Turki, C. Anderson, I. Barroso, P. Beales, J. Bentham, S. Bhattacharya, K. Carss, K. Chatterjee, S. Cirak, C. Cosgrove, A. Daly, J. Floyd, C. Franklin, M. Futema, S. Humphries, S. McCarthy, H. Mitchison, F. Muntoni, A. Onoufriadis, V. Parker, F. Payne, V. Plagnol, L. Raymond, D. Savage, P. Scambler, M. Schmidts, R. Semple, E. Serra, J. Stalker, M. van Kogelenberg, P. Vijayarangakannan, K. Walter, G. Wood; UK10K; Baylor-Hopkins Center for Mendelian Genomics, Monoallelic and biallelic mutations in MAB21L2 cause a spectrum of major eye malformations. Am. J. Hum. Genet. 94, 915–923 (2014). doi:10.1016/j.ajhg.2014.05.005 Medline

26. M. Albert, S. U. Schmitz, S. M. Kooistra, M. Malatesta, C. Morales Torres, J. C. Rekling, J. V. Johansen, I. Abarrategui, K. Helin, The histone demethylase Jarid1b ensures faithful mouse development by protecting developmental genes from aberrant H3K4me3. PLOS Genet. 9, e1003461 (2013). doi:10.1371/journal.pgen.1003461 Medline

27. S. Köhler, M. H. Schulz, P. Krawitz, S. Bauer, S. Dölken, C. E. Ott, C. Mundlos, D. Horn, S. Mundlos, P. N. Robinson, Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464 (2009). doi:10.1016/j.ajhg.2009.09.003 Medline

28. E. Bragin, E. A. Chatzimichali, C. F. Wright, M. E. Hurles, H. V. Firth, A. P. Bevan, G. J. Swaminathan, DECIPHER: Database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42 (D1), D993–D1000 (2014). doi:10.1093/nar/gkt937 Medline

29. E. Sheridan, J. Wright, N. Small, P. C. Corry, S. Oddie, C. Whibley, E. S. Petherick, T. Malik, N. Pawson, P. A. McKinney, R. C. Parslow, Risk factors for congenital anomaly

Page 53: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

in a multiethnic birth cohort: An analysis of the Born in Bradford study. Lancet 382, 1350–1359 (2013). doi:10.1016/S0140-6736(13)61132-0 Medline

30. W. McLaren, L. Gil, S. E. Hunt, H. S. Riat, G. R. S. Ritchie, A. Thormann, P. Flicek, F. Cunningham, The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). doi:10.1186/s13059-016-0974-4 Medline

31. 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015). doi:10.1038/nature15393 Medline

32. A. L. Price, N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, D. Reich, Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). doi:10.1038/ng1847 Medline

33. V. Narasimhan, P. Danecek, A. Scally, Y. Xue, C. Tyler-Smith, R. Durbin, BCFtools/RoH: A hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32, 1749–1751 (2016). doi:10.1093/bioinformatics/btw044 Medline

34. M. P. Conomos, A. P. Reiner, B. S. Weir, T. A. Thornton, Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 98, 127–148 (2016). doi:10.1016/j.ajhg.2015.11.022 Medline

35. K. E. Samocha, E. B. Robinson, S. J. Sanders, C. Stevens, A. Sabo, L. M. McGrath, J. A. Kosmicki, K. Rehnström, S. Mallick, A. Kirby, D. P. Wall, D. G. MacArthur, S. B. Gabriel, M. DePristo, S. M. Purcell, A. Palotie, E. Boerwinkle, J. D. Buxbaum, E. H. Cook Jr., R. A. Gibbs, G. D. Schellenberg, J. S. Sutcliffe, B. Devlin, K. Roeder, B. M. Neale, M. J. Daly, A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). doi:10.1038/ng.3050 Medline

36. E. Krissinel, K. Henrick, Multiple alignment of protein structures in three dimensions. CompLife 3695, 67–78 (2005).

37. E. Krissinel, K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004). doi:10.1107/S0907444904026460 Medline

38. S. J. Hubbard, J. M. Thornton, NACCESS-Computer Program. 1993. Department of Biochemistry and Molecular Biology, University College London.

39. W. S. J. Valdar, Scoring residue conservation. Proteins 48, 227–241 (2002). doi:10.1002/prot.10146 Medline

40. G. E. Crooks, G. Hon, J.-M. Chandonia, S. E. Brenner, WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). doi:10.1101/gr.849004 Medline

41. H. Kilpinen, A. Goncalves, A. Leha, V. Afzal, K. Alasoo, S. Ashford, S. Bala, D. Bensaddek, F. P. Casale, O. J. Culley, P. Danecek, A. Faulconbridge, P. W. Harrison, A. Kathuria, D. McCarthy, S. A. McCarthy, R. Meleckyte, Y. Memari, N. Moens, F. Soares, A. Mann, I. Streeter, C. A. Agu, A. Alderton, R. Nelson, S. Harper, M. Patel, A. White, S. R. Patel, L. Clarke, R. Halai, C. M. Kirton, A. Kolb-Kokocinski, P. Beales, E. Birney, D. Danovi, A.

Page 54: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

I. Lamond, W. H. Ouwehand, L. Vallier, F. M. Watt, R. Durbin, O. Stegle, D. J. Gaffney, Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017). doi:10.1038/nature22403 Medline

42. M. Knapp, The transmission/disequilibrium test and parental-genotype reconstruction: The reconstruction-combined transmission/ disequilibrium test. Am. J. Hum. Genet. 64, 861–870 (1999). doi:10.1086/302285 Medline

43. D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, C. von Mering, The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45 (D1), D362–D368 (2017). doi:10.1093/nar/gkw937 Medline

44. B. J. Klein, L. Piao, Y. Xi, H. Rincon-Arano, S. B. Rothbart, D. Peng, H. Wen, C. Larson, X. Zhang, X. Zheng, M. A. Cortazar, P. V. Peña, A. Mangan, D. L. Bentley, B. D. Strahl, M. Groudine, W. Li, X. Shi, T. G. Kutateladze, The histone-H3K4-specific demethylase KDM5B binds to its substrate and product through distinct PHD fingers. Cell Reports 6, 325–335 (2014). doi:10.1016/j.celrep.2013.12.021 Medline

45. O. Delaneau, J.-F. Zagury, J. Marchini, Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013). doi:10.1038/nmeth.2307 Medline

46. P. Mali, L. Yang, K. M. Esvelt, J. Aach, M. Guell, J. E. DiCarlo, J. E. Norville, G. M. Church, RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013). doi:10.1126/science.1232033 Medline

47. K. Gapp, J. Bohacek, J. Grossmann, A. M. Brunner, F. Manuella, P. Nanni, I. M. Mansuy, Potential of Environmental Enrichment to Prevent Transgenerational Effects of Paternal Trauma. Neuropsychopharmacology 41, 2749–2758 (2016). doi:10.1038/npp.2016.87 Medline

48. C. Dias, S. B. Estruch, S. A. Graham, J. McRae, S. J. Sawiak, J. A. Hurst, S. K. Joss, S. E. Holder, J. E. V. Morton, C. Turner, J. Thevenon, K. Mellul, G. Sánchez-Andrade, X. Ibarra-Soria, P. Deriziotis, R. F. Santos, S.-C. Lee, L. Faivre, T. Kleefstra, P. Liu, M. E. Hurles, S. E. Fisher, D. W. Logan; DDD Study, BCL11A Haploinsufficiency Causes an Intellectual Disability Syndrome and Dysregulates Transcription. Am. J. Hum. Genet. 99, 253–274 (2016). doi:10.1016/j.ajhg.2016.05.030 Medline

49. A. Csibi, K. Cornille, M.-P. Leibovitch, A. Poupon, L. A. Tintignac, A. M. J. Sanchez, S. A. Leibovitch, The translation regulatory subunit eIF3f controls the kinase-dependent mTOR signaling required for muscle differentiation and hypertrophy in mouse. PLOS ONE 5, e8994 (2010). doi:10.1371/journal.pone.0008994 Medline

50. M. C. Frank, M. Braginsky, D. Yurovsky, V. A. Marchman, Wordbank: An open repository for developmental vocabulary data. J. Child Lang. 44, 677–694 (2017). doi:10.1017/S0305000916000209 Medline

Page 55: Supplementary Material for...had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit, we subsequently only retained sites that passed a missingness

51. M. E. Smith, G. Lecker, J. W. Dunlap, E. E. Cureton, The Effects of Race, Sex, and Environment on the Age at Which Children Walk. Pedagog. Semin. J. Genet. Psychol. 38, 489–498 (1930). doi:10.1080/08856559.1930.10532284

52. G. L. Yamamoto, M. Aguena, M. Gos, C. Hung, J. Pilch, S. Fahiminiya, A. Abramowicz, I. Cristian, M. Buscarilli, M. S. Naslavsky, A. C. Malaquias, M. Zatz, O. Bodamer, J. Majewski, A. A. L. Jorge, A. C. Pereira, C. A. Kim, M. R. Passos-Bueno, D. R. Bertola, Rare variants in SOS2 and LZTR1 are associated with Noonan syndrome. J. Med. Genet. 52, 413–421 (2015). doi:10.1136/jmedgenet-2015-103018 Medline

53. N. A. Akawi, F. Al-Jasmi, A. M. Al-Shamsi, B. R. Ali, L. Al-Gazali, LINS, a modulator of the WNT signaling pathway, is involved in human cognition. Orphanet J. Rare Dis. 8, 87 (2013). doi:10.1186/1750-1172-8-87 Medline

54. H. Najmabadi, H. Hu, M. Garshasbi, T. Zemojtel, S. S. Abedini, W. Chen, M. Hosseini, F. Behjati, S. Haas, P. Jamali, A. Zecha, M. Mohseni, L. Püttmann, L. N. Vahid, C. Jensen, L. A. Moheb, M. Bienek, F. Larti, I. Mueller, R. Weissmann, H. Darvish, K. Wrogemann, V. Hadavi, B. Lipkowitz, S. Esmaeeli-Nieh, D. Wieczorek, R. Kariminejad, S. G. Firouzabadi, M. Cohen, Z. Fattahi, I. Rost, F. Mojahedi, C. Hertzberg, A. Dehghan, A. Rajab, M. J. S. Banavandi, J. Hoffer, M. Falah, L. Musante, V. Kalscheuer, R. Ullmann, A. W. Kuss, A. Tzschach, K. Kahrizi, H. H. Ropers, Deep sequencing reveals 50 novel genes for recessive cognitive disorders. Nature 478, 57–63 (2011). doi:10.1038/nature10423 Medline