55
Supporting Appendix for: MicroRNA-15a and 16-1 Act Via MYB to Elevate Fetal Hemoglobin Expression in Human Trisomy 13 Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, Jo-Anne Vergilio, Peng Ji, Jinkuk Kim, Prathapan Thiru, Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish* *To whom correspondence should be addressed. E-mail: [email protected]

Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

  • Upload
    voliem

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Supporting Appendix for: MicroRNA-15a and 16-1 Act Via MYB to Elevate Fetal Hemoglobin Expression in Human Trisomy 13 Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, Jo-Anne Vergilio, Peng Ji, Jinkuk Kim, Prathapan Thiru, Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish* *To whom correspondence should be addressed. E-mail: [email protected]

1  

Page 2: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

SI Materials and Methods

Cell Culture

293T cells were maintained in DMEM with 10% FCS and 2% penicillin/streptomycin. These cells

were transfected with the FuGene 6 (Roche) reagent according to manufacturer’s protocol.

K562 erythroleukemia cells were cultured in RPMI-1640 medium with 10% FCS, 2%

penicillin/streptomycin, and 1% L-glutamine. Cells were maintained at a density of 0.1-1 X 106

cells/ ml.

Culture and differentiation of primary human CD34+ cells was performed similar to what has

been previously described (1). Briefly, the CD34+ cells were obtained from magnetically-sorted

mononuclear samples of G-CSF mobilized peripheral blood from donors and were frozen after

isolation. Cells were obtained from the Yale Center of Excellence in Molecular Hematology

(YCEMH). Cells were thawed and washed into RPMI-1640 with 10% FCS, and then seeded in

StemSpan SFEM Medium (StemCell Technologies Inc.) with 1X CC100 cytokine mix (StemCell

Technologies Inc.) and 2% penicillin/streptomycin. Cells were maintained in this expansion

medium at a density of 0.1-1 X 106 cells/ ml with media changes every other or every third day

as necessary. Cells were kept in expansion medium for a total of 6 days. On day 6, cells were

reseeded into StemSpan SFEM Medium with 2% P/S, 20 ng/ml SCF, 1 U/ml Epo, 5 ng/ml IL-3,

2 micromolar dexamethasone, and 1 micromolar β-estradiol. Cells were maintained in

differentiation medium, with media changes every other or every third day as needed. Cells

were maintained at a density of 0.1-1 X 106 cells/ ml. By day 3 of differentiation, homogeneous

larger blasts were present in the culture. By day 5, the majority of cells had proerythroblast

morphology and on day 7 the majority of the cells had basophilic erythroblast morphology. By

day 12 of differentiation, the majority of cells demonstrated orthochromatophilic and

2  

Page 3: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

polychromatophilic erythroblast morphology. This morphological classification has been

confirmed using phenotypic markers of erythropoiesis, including CD235, CD71, CD45, and

CD36 expression.

Constructs

A 628 bp genomic DNA fragment from human chromosome 13 containing the hairpin region of

miR-15a and 16-1 and 200bp flanking sequence on each side was isolated by PCR from human

genomic DNA and was cloned into the XhoI/XbaI and FseI/PacI sites of the pLVX-puro

(Clontech) and pSMPUW-puro (Cell BioLabs, Inc.) lentiviral expression vectors, respectively

(core primer sequences: forward 5’-GGGCACAGAATGGACTTCAG-3’; reverse 5’-

GATGGCATTCAATACAATTATTA-3’).

The 1.21 kb 3’-UTR of MYB was cloned into the XhoI and NotI sites of the psiCheck2 vector

(Promega) after PCR amplification from human genomic DNA.

shRNA lentiviral constructs targeting human MYB were obtained from the Sigma-Aldrich

Mission shRNA collection and the clones used in this study were TRCN0000009853 and

TRCN0000040058. The sequences of the shRNAs encoded by these clones is (respectively):

CCGGAACAGAATGGAACAGATGACCTCGAGGTCATCTGTTCCATTCTGTTCTTTTTG

CCGGCCAGATTGTAAATGCTCATTTCTCGAGAAATGAGCATTTACAATCTGGTTTTTG

Lentiviral Preparation & Infection

3  

Page 4: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Lentiviruses were prepared in a manner similar to what has been described previously (1).

Briefly, 293T cells were transfected with the lentiviral constructs described above along with

VSV-G and p∆8.9 vectors that contain the other necessary viral components. Media was

changed 24 hours after transfection with the desired media of the target cell of interest. At 48

hours, this media was collected and filtered at 0.45 µm. CD34+ (generally transduced for these

experiments at day 3 of expansion) and K562 cells were transduced with lentivirus using a spin-

infection method. The cells were mixed with filtered viral supernatant (with cytokines added as

appropriate for the cell line) and polybrene (filtered 4 mg/ ml stock) was added to the mixture at

2 µl/ ml of media. The cells were spun at room temperature for 90 minutes at 2500 rpm. Cells

were then kept at 37oC overnight and the following day, the cells were washed twice with PBS,

and seeded into fresh medium. Selection with puromycin was started at 48 hours after viral

transduction. Appropriate selection concentrations were determined by exposing uninfected

cells to a range of puromycin concentrations, typically between 1 - 2 µg/ml as a final

concentration of puromycin.

RNA extraction, qRT-PCR, and Microarray Analysis

Isolation of RNA was performed using the miRNeasy Mini Kit (Qiagen). An on-column DNase

(Qiagen) digestion was performed according to manufacturer’s instructions with the RNeasy

Mini Kit. RNA was quantified by a NanoDrop spectrophotometer (Thermo Scientific). cDNA

was synthesized with the iScript cDNA synthesis Kit (Bio-Rad). Real time PCR was performed

using the ABI 7900 Machine with ABI SYBR Green PCR Mastermix (Applied Biosystems, Inc.),

similar to what has been previously described (1, 2). For quantification of microRNA levels, the

appropriate TaqMan set of primers and the TaqMan PCR Mastermix (Applied Biosystems, Inc.)

were used following manufacturer’s instructions on the ABI 7900 instrument. RNU19 was used

as a control for these experiments. Relative expression was quantitated using the ∆∆Ct method

4  

Page 5: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

as described previously (1, 2). Sequences of primers used for RT-PCR have been published

previously (3). Preparation of samples for expression microarray analysis was done as

previously described (1) and microarrays were processed by the Genome Technology Core at

the Whitehead Institute. Expression array data were normalized by Robust Multichip Averaging

(RMA) using the affy package from Bioconductor. A custom probeset definition was used for

processing the arrays such that there was one probe set per Entrez Gene ID. Differentially

expressed genes were identified using a moderated t test in the limma package in Bioconductor

(4), corrected for the false discovery rate. This analytical approach was also utilized with gene

set GSE13110 from MYB siRNA knockdown in human cord blood CD34+ cells (5). The

expression of all genes in this dataset was normalized and the expression of the hemoglobin

genes was subsequently ascertained. Gene set enrichment analysis (GSEA) was performed as

previously described using processed microarray data (2, 6, 7). To produce gene sets from the

lineage-specific data (8) or from the shMYB data, genes with log2 expression changes of >2

(with p values < 0.01) or >1 (with p values < 0.05), respectively, were used to create the various

gene sets. These were then applied using the GSEA algorithm to assess enrichment of these

gene sets in the processed datasets described in the text. This was done for both the

processed shMYB data, as well as for an erythroid differentiation dataset (GSE628) that has

been described previously (9). The gene sets that were used in this analysis are available on

request. The shMYB microarray data has been deposited in the Gene Expression Omnibus

(http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE25678.

Flow Cytometry

For cell cycle analysis by propidium iodide (PI) staining, cells were washed in PBS, diluted into

300 µl of PBS, and fixed with 3 ml of 70 - 80% ethanol (EtOH) added slowly in a dropwise

fashion to the cells. The cells were allowed to fix at 4oC for at least 15 minutes and were often

5  

Page 6: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

stored in the EtOH solution for several days. Cells were washed with PBS with 1 % serum and

then incubated in 250 µl of PBS/ 1% serum containing 50 µg/ ml PI and 0.1 mg/ ml RNase A.

The cells were incubated at 37oC for 30 minutes (in the dark) and then were kept on ice prior to

FACS analysis.

Staining of surface markers was performed as has been previously described (2). Cells were

washed with PBS with 1% serum and stained with the appropriate surface antibodies (labeled

with various fluorophores). After washing the antibodies, cells were stained with either PI or 7-

AAD as a marker of viability.

All data analysis was performed using the FlowJo 7.5.5 suite of flow cytometry analysis tools.

Cell cycle analysis was performed using the cell cycle modeling feature in FlowJo using the

Watson model that estimates the percentage of cells in G1, S, and G2/M phases using best

fitting approaches of the flow cytometry data from propidium iodide staining.

Luciferase Reporter Assays

293T cells were seeded at 5000 cells per well in 96 well plates on the day prior to transfection.

Transfection was carried out in at least 3 replicates using FuGENE 6 (Roche), with 5 ng of the

pLVX-miR-15a-16-1 or pLVX expression vectors and 10 ng of the MYB 3’-UTR psiCHECK-2

construct. After 2 days, detection of both firefly and renilla luciferase was performed using the

Dual-Glo Luciferase assay kit (Promega). Luminescence was quantitated and renilla luciferase

readings were normalized against the firefly luciferase activity to determine the relative

luciferase activity.

Western Blotting

6  

Page 7: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Western blotting was performed as described previously (1, 3). K562 cells were selected in

puromycin for at least 1 week and harvested before western blotting. The anti-MYB antibody

(clone 1-1) was purchased from Millipore.

Pathological Analysis

Pathological specimens from autopsies from over four decades of the archives of the

Department of Pathology at the Children’s Hospital of Boston were examined using a query for

trisomy 13. In several cases, improperly annotated cases were found and were not used in

further analysis. In other cases, limited hematopoietic tissues (bone marrow, spleen, and liver)

were available for assessment and so these cases were not used in further analysis. After

filtering the cases obtained, 17 cases were used in the final analysis and hematopoiesis was

assessed by a hematopathologist in all cases, using standard criteria for lineage ratios,

cellularity, and numbers of progenitors (10, 11). Summaries of this analysis are shown in Table

S1. Similar samples at the Children’s Hospital of Boston, from the same time period, but from

patients with other diagnoses were used to control for the abnormal findings observed in this

analysis.

Integrative Genomic Analysis

The boundaries of the chromosomal 13q14 region were defined using tools available through

the UCSC genome browser (http://genome.ucsc.edu/) using Human (Homo sapiens) genome

draft version GRCh37/ hg19 (February 2009). The region was defined from coordinates

chr13:40,100,001-55,300,000. All genes in this region were identified using RefSeq, GenBank,

and UniProt identifiers and redundancy among the genes was collapsed to define a unique

7  

Page 8: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

group of genes for analysis. To assess the relative expression of these genes in human

erythroid (CD71+) progenitors, data from the GNF expression atlas was used for these genes

compared with a panel of 78 other cell types from this dataset (12). Using such an analysis, all

known regulators of globin gene regulation and erythropoiesis (13) have expression levels > 1

and therefore this was used as a cutoff in assessing potential candidate genes (Figure S2).

Among the genes found in this region, 18% of the genes have such an expression ratio and

selected for further analysis and examination as candidates for the elevated fetal hemoglobin

level trait. This data is shown in Table S1 and Figure S2.

The analysis of histone 3 lysine 4 trimethylation using ENCODE data from K562 erythroid cell

lines in the 13q14 region was performed in the following manner. Promoters of genes in the

region of interest were downloaded using UCSC Genes definitions, 1500 bases upstream of the

transcription start site using the UCSC Genome Browser’s Table Browser feature. ENCODE

ChIP data was downloaded in WIG format and converted to BED format. Overlapping regions

were determined using Galaxy Tools (http://main.g2.bx.psu.edu/), the “Intersect” tool was used

with the files containing the promoters and ChIP data. ChIP data that had a height of at least 50

pixels was used as a threshold for a positive peak. The resulting promoters in chromosomal

band 13q14 that had significant H3K4me3 peaks are shown in Table S2. The intersection of

this data with the expression data in Table S1 (for genes with relative expression > 1) is shown

in Table S3.

To assess potential candidate microRNA targets, the aggregate PCT (14) for all predicted targets

of miR-15/16 were compared with the relative expression levels in erythroid and hematopoietic

tissues from the GNF expression atlas (12), similar to what is described above. Candidate

targets of interest were chosen based upon high aggregate PCT scores, along with relatively

elevated expression within the tissues of interest (Figure 3A and S6). The aggregate PCT scores

8  

Page 9: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

for all miR-15/16 seed targets can be obtained from the TargetScan website

(http://www.targetscan.org/). The data from release 5.1 (April 2009) was used in this analysis.

Among this group of candidates, MYB was particularly notable and therefore was used in further

analyses.

9  

Page 10: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

SI Text

Probabilistic Models of Chromosomal Region Association

We developed a set of models to explain the correlation between trisomy of particular bands of

chromosome 13 and elevation/persistence of fetal hemoglobin (HbF). The aim is to compute the

posterior probability of model parameters given the data under a particular model M :

( )| ,P parameters data M

The above can be calculated using the standard Bayesian formulation:

( ) ( ) ( )( )

| , || ,

|P data parameters M P parameters M

P parameters data MP data M

= (1)

The denominator in Equation (1) represents the probability of observing the data under a

specific model, and is commonly referred to as the Bayesian Score (BS). An appropriate model

produces a high BS, whereas inappropriate models (ones with improper assumptions or

parameters) produce a low BS. It is important to note that the BS is not a function of any

particular parameter values, but rather it considers all possible parameter values in proportion to

their ability to explain the data. We use the BS to select the most appropriate model.

Data:

We analyze a data set of 57 subjects with partial trisomy of chromosome 13 who have been

classified as affected (elevated HbF) or unaffected (15, 16). Each subject’s partial trisomy is

represented as a vector of 0s and 1s, where 1 signifies trisomy. The data set is a matrix where

10  

Page 11: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

each row is a particular subject. There are rows (number of subjects) and N B columns

(number of bands of chromosome 13).

Subject 1 1 1 0Subject 2 1 0 0

Subject N 0 1 1

N

B

⎫⎡ ⎤ ⎡ ⎤⎪⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢= = ⎬⎢ ⎥ ⎢⎥⎥⎪⎢ ⎥ ⎢ ⎥⎪⎣ ⎦ ⎣ ⎭

S

L L L

L L L

L M L M M O M

L L L1442443⎦

[

Each subject is either affected or unaffected with elevated HbF. This information is stored in a

binary vector ]1 0 1TA

N

=v

L1442443

N, also of length , where 1 signifies an affected subject.

11  

Page 12: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

To summarize the data, we can construct probability distributions of trisomy of various bands in

the affected and unaffected populations (shown in Figures 1B and S1 in the paper):

( )

( ) ( ) ( )| / 1

| 1 / 1

T T

T T

P Trisomy Affected A A

P Trisomy Unaffected A A

=

= − −

S

S 1

v v v

v v v v v

Av

1T

Here, is a column vector of length and simply serves to either sum the elements of a

vector, or to reverse the bits in . Thus,

1v

N

Av v

1 A−v v

A

is simply the total number of affected subjects in

the data set, is a vector analogous to v

but with 1s for unaffected subjects, and

is the total number of unaffected subjects. For a full list of symbols used in this

document, refer to the Mathematical Glossary.

( )1 1A−v v vT

Model Notation and Assumptions

We test a number of models designed to compute the probability that a particular band X of

chromosome 13 contains the gene responsible for elevation of HbF. This is similar to the model

used by Korbel et al. in their study of segmental trisomy 21 (17). However, we also model gene

silencing by supposing that there exists a regulatory region, which is necessary for the genes in

band X to be expressed or, alternatively, to ensure that the genes in band X are not silenced.

We assume that if the gene responsible for elevation of HbF is expressed in trisomy, then that

subject has a probability of being affected (or *P *1 P− of being unaffected). On the other hand,

if the responsible gene is not expressed in trisomy (either because band X is not present in

trisomy, or if it is present in trisomy but the regulatory region is not), then the subject has a

probability of being affected (1 of being unaffected), with 0P 0P− 0 *P P< .

12  

Page 13: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

In all models, we have a single parameter X , and we wish to compute as

in Equation (1). There are two other potential parameters, and , but their values can be

ascertained from full trisomy data. The vast majority of individuals with full trisomy 13 have

elevated HbF, and since full trisomy implies that the gene of interest is certainly expressed in

trisomy, we set . On the other hand, HbF elevation is very rare in individuals without

trisomy 13, so we set . We also test these assumptions, and show that as long as

, on average we obtain the same answer as if we assume the aforementioned explicit

values for and .

( )| ,P X x data M=

*P 0P

* 0.8P =

0 0.05P =

0 *P P<

*P 0P

Finally, we consider the prior probability of X : ( )P X x= (how strongly we believe that the

gene of interest resides on a particular band x , before we see any data). There are many ways

to derive a valid prior; for example, we could say that the probability of a gene being on a

particular band is proportional to the number of genes contained within that band. For simplicity,

we use a uniform prior, where we consider each band equally likely to contain the gene of

interest. In the remainder of the document, all the equations contain the general term

, but in the numerical analysis we use ( )P X x= ( ) 1P X x x . B

= = for all

General Probability Structure

For each model, we compute the probability ( )| ,P data X x M= . Due to the definitions of

and , the conceptual equation of this probability is:

*P

0P

( ) ( ) ( ) ( ) ( )* 0 * 0| , 1 1ATxE ADx ATxS UTxE UDx UTxSN N N N N NP data X x M P P P P

+= = − −

+ (2)

13  

Page 14: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

where is the number of subjects who are affected (ATxEN A ) with elevated HbF and in whom

band x is expressed ( E ) in trisomy ( ), is the number of subjects who are affected and

in whom band

T ADxN

x is present in disomy ( ), is the number of subjects who are affected

and in whom band

D ATxSN

x is present in trisomy but is silenced ( ); , , and N are the

analogues for the unaffected (U ) population. In the equation for each model, we show a matrix-

vector notation for how each of these counts is calculated from the data (S and ).

S UTxEN UDxN UTxS

Av

( )P X x

The denominator of Equation (1) is simply Equation (2) multiplied by the prior = , and

summed over all possible values of x (by the law of total probability):

( ) ( ) (1

| | ,B

kP data M P data X k M P X k

=)= = =∑

14  

Page 15: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Model 1

In Model 1, we assume that no other regulatory region is necessary for expression of the genes

on band X . This is equivalent to assuming that a regulatory region does exist, but that it also

resides on band X . For this model, the posterior probability distribution for X is:

( ) ( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )

( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )

1 1 1 1* 0 * 0

1 1 1 1* 0 * 0

1

1 1| , 1

1 1

T TT Tx x x x

T TT Tk k k k

A S A S A S A S

B A S A S A S A S

k

P P P P P X xP X x data M

P P P P P X k

− − − −

− − − −

=

− − == =

− − =∑

v v v v v v v v v v v v

v v v v v v v v v v v v

x xS δ= Sv v

th column of S . x is the Where

Model 2

In Model 2, we assume that a regulatory region exists somewhere between the p terminus and

X , but not within X . We also assume that the regulatory region is equally likely to exist in any

of the bands between the p terminus and ( )1x1/X (probability of − for each band). The

posterior probability distribution for X is similar to that of Model 1, but with a different set of

exponents:

15  

Page 16: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

( )

( ) ( ) ( ) ( )( )

( ) ( )( )

) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

1 1* 0

1 1* 0

1 1 1 1* 0

1 1

| , 2

1 1

T Tx x x x x

T Tk k k k k

T Tk k k k k

A L A S L

A L A S L

A L A S L

P P

P P

P X x

P X x data M

P P

P X k

− + −

− + −

− − − +

⎧ ⎫×⎪ ⎪

( ) ( )

( ) ( )( )

( ) (

1 1 1 1* 0T T

x x x x xA L A S L

P P

− − − + −⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪

D D

D D

D D

D D

v v v v vv v

v v v v vv v

v v v v v v vv v

v v v v v v vv v

1

B

k=

⎪⎩ ⎭⎪

For clarity, we use t ctor

(3)

he ve xS as in Model 1, and define a diagonal matrix containing the

vector

v

xSv

on the main diagonal ( )x xdiag S=Dv

, and a vector containing the fraction of bands

between the p terminus and band x (excluding band x ) that are trisomied 1x x

δ⎛ ⎞⎜ ⎟−⎝ ⎠

vv

To understand the meaning of

xL <= S .

xLv

, consider the two subjects in Figure 1, and assume they are

both affected. Subject 1 has trisomy of bands 1-6 and subject 2 has trisomy of bands 4-10. If we

are looking at band 5x = , and we assume that the regulatory region is between the p term

(left side) and 5, then ct S1 contains the regulatory region within the trisomied segment

with probability 1 and therefore S1

inus

subje

contributes a 1 to the ATxEN count. Subject 2 however,

contains trisomy of band 5, but only contains trisomy of one of the four bands that could contain

the regulatory region. Since each of the four bands (1-4) have equal probability of containing the

regulatory region, S2 has a 0.25 chance of also having trisomy of the regulatory region and

being subject to *P , and a 0.75 chance of not containing the regulatory region within its trisomy

region in which case he/she is subject to 0P . Therefore, S2 would contribute 0.25 to the ATxEN

16  

Page 17: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

count and 0.75 to the ATxSN count. xLv

contains the fraction of bands that potentially contain the

regulatory region and are trisomied.

S

S2

1

1 2 3 4 5 6 7 8 9 10 qp

Partial trisomy of two hypothetical affected subjects.

Model 3

odel 3 is similar t el 2, except that we assume that the regulatory r

somewhere between

Figure 1

o ModM egion exists

X

within

and the q terminus. The other assumptions of Model 2 remain: the

regulatory region is not X X, and it is equally likely to be in any of the bands between

and the q te Model 3 is:

rminus. The equation for

( )

( ) ( ) ( ) ( )( )

( ) ( )

(( )( ) ( )( ) ( ) ( )( )

( )

1 1* 0

* 0

1 1 1 1* 0

1 1

| , 3

1 1

T Tx x x x x

T Tk k k k k

A G A S G

A G A S G

P P

P P

P X x data MP P

P P

P X k

− + −

− − − + −

( ) ( ) ( ) ( )( )

( )

) ( ) ( ) ( )( )

1 1 1 1

1 1* 0

T Tx x x x x

T Tk k k k k

A G A S G

A G A S G

P X x

− − − + −

− + −

⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪⎩ ⎭

D D

D D

D D

D D

v v v v v v v

v v v v v v v

v v v v v v v v v

v v v v v v v v v

(4)

1

B

k=

17  

Page 18: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Where we replace xL with v

xxG

B xδ>⎛ ⎞

= ⎜ ⎟−

v

⎝ ⎠S

vra of , which contains the f ction trisomied bands

between band x and the q terminus (bes xLv

ides the different denominators, note that in we

bands less than only look at x , but in xGv

x we only look at bands greater than ).

Model 4:

In Model 4, we assume that there is a regulatory region, but we make no restriction on where it

an be. We still assume that it is equally likely to be in any of the bands, except bc and X . The

equation for this probability distribution is similar to Equations (3) and (4) except that we define

a vector which contains the fraction of all bands outside of band X that are trisomied:

11

B xxO

Bδ⎛ ⎞−

= ⎜ ⎟−⎝ ⎠S

v vv

. Using this definition and the same definitions of xSv

and xD from above, we

can write the distribution for this model as:

( ) ( ) ( ) ( )( )

( )

1 1* 0T T

x x x x xA O A S OP P

− + −⎧ ⎫×D D

v v v v v v v

⎪ ⎪

( )

( ) ( )( ) ( ) ( )( )

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

1 1 1 1* 0

1 1* 0

1 1 1 1* 0

1 1

| , 4

1 1

T Tx x x x x

T Tk k k k k

T Tk k k k

A O A S O

A O A S O

A O A S O

P P

P X x

P X x data MP P

P P

P X k

− − − + −

− + −

− − − +

⎪ ⎪

k

⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪⎩ ⎭

D D

D D

D D

v v v v v v v v v

v v v v v v v

v v v v v v v vv

1

B

k=

(5)

In Models 2-4, we assumed that the regulatory region does not exist within the same band as

the gene of interest. For completeness, we also consider models where we include band X as

18  

Page 19: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

a possible location of the regulatory region. We name these models M2I-M4I, since they are

inclusive of band X . The equations are:

odel 2I

We define

M

to also include band x : xxIL

xδ≤⎛ ⎞

= ⎜ ⎟⎝ ⎠

Sv

vxILv

:

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

1 1* 0

1 1 1 1* 0

1 1* 0

1 1 1 1* 0

1 1

| , 2

1 1

T Tx xI x x xI

T Tx xI x x xI

T Tk kI k k kI

T Tk kI k k kI

A L A S L

A L A S L

A L A S L

A L A S L

P P

P P

P X x

P X x data M IP P

P P

P X k

− + −

− − − + −

− + −

− − − + −

⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪

− − ×⎨⎪

=⎪⎩

D D

D D

D D

D D

v v v v vv v

v v v v v v vv v

v v v v vv v

v v v v v v vv v

1

B

k=

⎫⎪⎪⎪⎬⎪⎪

⎪ ⎪⎭

(6)

Model 3I

e define xIGv

to also include band x : 1

xxIG

B xδ≥⎛ ⎞

= ⎜ ⎟− +⎝ ⎠S

vv

: W

19  

Page 20: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

1 1* 0

1 1 1 1* 0

1 1* 0

1 1 1 1* 0

1 1

| , 3

1 1

T Tx xI x x xI

T Tx xI x x xI

T Tk kI k k kI

T Tk kI k k kI

A G A S G

A G A S G

A G A S G

A G A S G

P P

P P

P X x

P X x data M IP P

P P

P X k

− + −

− − − + −

− + −

− − − + −

⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪

=⎪⎩

D D

D D

D D

D D

v v v v v v v

v v v v v v v v v

v v v v v v v

v v v v v v v v v

(7)

1

B

k=

⎫⎪⎪⎪⎬⎪⎪

⎪ ⎪⎭

Model 4I

We define

xIOv

to also include band x : 1B

xIOB

⎛ ⎞= ⎜ ⎟

⎝ ⎠S

vv

:

( ) ( ) ( ) ( )

( )

( )

( )( ) ( )( ) ( ) ( )( )

1 1* 0

1 1 1 1* 01 1

T Tx xI x x xI

T Tx xI x x xI

A O A S O

A O A S O

P P

P P

− + −

− − − + −

( )

( ) ( ) ( ) ( )( )

( )( ) ( )( ) ( ) ( )( )

( )

1 1* 0

1 1 1 1* 0

| , 4

1 1

T Tk kI k k kI

T Tk kI k k kI

A O A S O

A O A S O

P X x

P X x data M IP P

P P

P X k

− + −

− − − + −

⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪

=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪

=⎪⎩

D D

D D

D D

D D

v v v v v v v

v v v v v v v v v

v v v v v v v

v v v v v v v v v

1

B

k=

⎫⎪⎪⎪⎬⎪⎪

⎪ ⎪⎭

(8)

20  

Page 21: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Results

Figure 2 contains the BS for all 7 models. Judging from the values, Model 2 is the most

appropriate.

M1 M2 M3 M4 M2I M3I M4I0

1

2

3

4

5

6

7x 10-12

Model

Bay

esia

n S

core

Model Bayesian Scores

Figure 2 Model M2 has the highest BS, followed closely by models M1 and M2I. Although it cannot be distinguished from the figure, the order of BS for the remaining models is M4I, M4, M3I, M3 (from highest to lowest).

We show the probability distributions according to all 7 models in Figure 3. It is interesting to

note that Models 1, 4, and 4I all suggest 13q12 as the one most likely to contain the gene

related to elevation of HbF. However, the model which best explains the data (Model 2) peaks

for 13q14. Since Model 2 is the best model according to Figure 2, we conclude that 13q14

contains the gene causing elevation of HbF.

The next logical question concerns our choice of and : how sensitive is the BS and the

most likely band to the choice of these values? To answer this question, we computed the

probability distributions and BSs for each model for a set of and values, maintaining only

*

*P 0P

P 0P

21  

Page 22: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

the condition that . Figure 4A shows that the actual values of and affect which is

the most likely model. There are four regions: Model 1 is most appropriate for low values of

and relatively low values of , Model 3 is most appropriate for high values of and high

values of , while Model 2 is most appropriate for the majority of ( , ) values with a

subsection containing smaller values of owned by Model 2I. Figure 4B shows that the BSs

corresponding to the most likely models from panel A are actually fairly well contained in the

( , ) space, being highest around and . Figure 4C shows that if we

choose the most appropriate model for each ( , ) combination, and then choose the band

with the highest posterior probability, band 13q14 dominates the majority of the parameter

space. Smaller sections of the parameter space favor bands 13q11 - 13q13 and the terminal

bands pter and qter. Figure 4D shows the actual posterior probabilities of the corresponding

bands in panel C.

*P P> 0 *P 0P

*P

0P

*P 0P

*P 0P

*P

0P =0P *P 0.12

0P

* 0.75P =

*P

22  

Page 23: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0

0.5

1P

(X=x

|dat

a,M

1)Probability distributions of band responsible for persistent fetal hemoglobin

pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0

0.5

1

P(X

=x|d

ata,

M2

or M

2I)

M2M2I

pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0

0.5

1

P(X

=x|d

ata,

M3

or M

3I)

M3M3I

pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0

0.5

1

Band

P(X

=x|d

ata,

M4

or M

4I)

M4M4I

Figure 3 Posterior probability distributions according to all 7 models. The majority of models favor band 13q12; however, Model 2, which is the most likely according to the BS criterion, points to band 13q14.

23  

Page 24: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

A

P0

P*

Model with the highest BS

0.2 0.4 0.6 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

M1

M2

M3

M4

M2I

M3I

M4I

B

P0

P*

BS of Most Appropriate Model

0.2 0.4 0.6 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.5

1

1.5

2

2.5

3

3.5

x 10-11

C

P0

P*

Most Likely Band from Most Appropriate Model

0.2 0.4 0.6 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

pterp12p11.2p11.1q11q12q13q14q21q22q31q32q33qter

D

P0

P*

Probability of Most Likely Band from Most Appropriate Model

0.2 0.4 0.6 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0

0.2

0.4

0.6

0.8

1

Figure 4 The most likely model for a particular , is shown in panel A. Panel B shows the actual BS of the corresponding models in A. Panel C shows the most likely band from the distributions corresponding to the most likely models in A, and panel D shows the actual probability of the corresponding band in panel C.

*P 0P

To summarize the , perturbation study detailed in Figure 4 ,we compute the average over

all values of and of the probability that each band is the highest-probability band

according to the most likely model. In other words: for each value of and , compute the

BS of each model, select the model with the highest BS (

*P 0P

*P 0P

*P 0P

*M ), use to ( )*| ,P X x data M=

24  

Page 25: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

*select the band with the highest probability ( x ), and record the value

which captures both the likelihood of the particular band containing the gene of interest and the

appropriateness of the model in explaining all the data. Then, for each band , compute the

average of over all the and for which

( )* *, |P X x data M=

b

( )* *, |P X x data M= *P 0P *x b= . This result is shown

in Figure 5, where we clearly see that regardless of choice of and , if we pick the most

appropriate model for the situation, band 13q14 most likely contains the gene responsible for

elevation of HbF in trisomy 13.

*P 0P

pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0

0.2

0.4

0.6

0.8

1

Ba

EP* ,P

0(P(X

=x* ,d

ata|

M* )I x*

=b)

Average Probability of ea nd over all P* an 0

0P

nd

ch ba

*P

d P

Figure 5 Probability that a band contains the gene responsible for elevated HbF according to the most appropriate model (M*), averaged over all values of and .

Conclusion

We constructed seven models to calculate the probability that a particular band of chromosome

13 houses the gene responsible for elevation of HbF when the gene is expressed in trisomy.

The likelihood of each model was evaluated using a data set containing 57 partial trisomy cases

with documented levels of HbF. The most likely model assumes that a single band contains the

gene of interest, and that a regulatory region for that gene exists between the p terminus and

25  

Page 26: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

the gene itself. According to this model, the gene implicated in elevation of HbF is located on

band 13q14. This conclusion is not sensitive to the particular values of (probability of

elevated HbF if the responsible gene is expressed in trisomy) and (probability of elevated

HbF if the responsible gene is not expressed in trisomy) assumed in the model.

*P

0P

26  

Page 27: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Mathematical Glossary

Here we present a quick reference for all the symbols used in the text above. All vectors are

column vectors; their transposes (superscript ) are shown in the definitions below. T

x th column of : SSubject trisomy matrix, one subject on each

row, subjects, N B chromosome bands, 1

means trisomy of a particular band:

Subject 1 1 1 0 ⎫⎡ ⎤ ⎡Subject 2 1 0 0

Subject N 0 1 1

N

B

⎤⎪⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥= = ⎬⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥⎪⎣ ⎦ ⎣ ⎦⎭

SL L L

L M L M M O M

L L L1442443

[

L L L

Affected vector, each subject is either affected

(1) or unaffected (0):

]1 0 1TA

N

=v

L1442443

Av

[

Vector of ones, same size as :

]1 1 1 1T

N

=v

L1442443

Vector of ones, length B :

[ ]1 1 1 1TB

B

=v

L1442443

The

x xSv

δ= Sv

Diagonal matrix with the vector xSv

on the main

diagonal:

( )x xdiag S=Dv

Vector containing the fraction of bands

between the p terminus and band x that are

trisomied:

excluding x : 1

xxL

xδ<⎛ ⎞

= ⎜ ⎟

vv

−⎝ ⎠S

including x : xxIL

xδ≤⎛ ⎞

= ⎜ ⎟⎝ ⎠

Sv

v

Vector containing the fraction of bands

between the band x and the q terminus that

are trisomied:

excluding x : xxG

B xδ>⎛ ⎞

= ⎜ ⎟

vv

−⎝ ⎠S

27  

Page 28: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

28  

Vector of zeros with a 1 in position

corresponding to x ; length B :

[ ]0 0 1 0 0

x

=

L L

T

B64444744448vxδ

xδ<

v, xδ>

v, etc are vectors like xδ

v with ones

where the subscript is true, for example xδ<

v

has ones for all bands less than x .

including x :1

xxIG

B xδ≥⎛ ⎞

= ⎜ ⎟− +⎝ ⎠S

vv

Vector containing the fraction of bands outside

band x that are trisomied:

excluding x :1

1B x

xOB

δ⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Sv v

v

including x : 1B

xIOB

⎛ ⎞= ⎜ ⎟

⎝ ⎠S

vv

Page 29: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Table S1. Chromosomal Band 13q14 Gene Information and Relative Erythroid Expression. Name RefSeq EntrezGene CD71+

Erythroid Known Position

Description

EBPL NM_032565 84650 -3.013 chr13:49132863-49163612

emopamil binding related protein, delta8-delta7

DGKH NM_178009 160851 -2.69 chr13:41520889-41701888

diacylglycerol kinase, eta isoform 2

FNDC3A NM_001079673 22862 -2.319 chr13:48448049-48681916

fibronectin type III domain containing 3A

NUDT15 NM_018283 55270 -2.29 chr13:47509704-47519283

nudix-type motif 15

C13orf1 NM_020456 57213 -2.274 chr13:49384843-49408626

chromosome 13 open reading frame 1 isoform 1

SUGT1L1 NR_003365 283507 -2.256 chr13:40384027-40393886

Homo sapiens full length insert cDNA clone YY75B04.

LRCH1 NM_015116 23143 -2.08 chr13:46025304-46215731

leucine-rich repeats and calponin homology (CH)

THSD1 NM_018676 55901 -1.868 chr13:51849304-51878321

thrombospondin type I domain-containing 1

HTR2A NM_000621 3356 -1.778 chr13:46305514-46368370

5-hydroxytryptamine (serotonin) receptor 2A

CYSLTR2 NM_020377 57105 -1.75 chr13:48178954-48181499

cysteinyl leukotriene receptor 2

KCNRG NM_199464 283518 -1.673 chr13:49487391-49493059

potassium channel regulator isoform 1

MGC5590 n/a n/a -1.656 chr13:43615287-43633393

Homo sapiens hypothetical protein MGC5590, mRNA (cDNA clone MGC:5590 IMAGE:3460233), complete cds.

AKAP11 NM_016248 11215 -1.539 chr13:41744289-41795402

A-kinase anchor protein 11

BC025370 n/a n/a -1.536 chr13:43872388-43878021

Homo sapiens hypothetical gene supported by BC025370, mRNA (cDNA clone IMAGE:3945331).

ENOX1 NM_001127615 55068 -1.477 chr13:42685666-43101613

ecto-NOX disulfide-thiol exchanger 1

TSC22D1 NM_183422 8848 -1.437 chr13:43905655-44048701

TSC22 domain family, member 1 isoform 1

SUCLA2 NM_003850 8803 -1.427 chr13:47414792-47473463

succinate-CoA ligase, ADP-forming, beta subunit

SERPINE3 NM_001101320 647174 -1.302 chr13:50813169-50834240

nexin-related serine protease inhibitor

TNFSF11 NM_003701 8600 -1.134 chr13:42046298-42080148

tumor necrosis factor ligand superfamily, member

KIAA0564 NM_015058 23078 -1.067 chr13:41038961-41433221

hypothetical protein LOC23078 isoform a

C13orf18 NM_025113 80183 -1.013 chr13:45814138-45859636

hypothetical protein LOC80183

P2RY5 NM_005767 10161 -1.003 chr13:47883183-47887947

G-protein coupled purinergic receptor P2Y5

KCTD4 NM_198404 386618 -0.948 chr13:44664988-44673175

potassium channel tetramerisation domain

CDADC1 NM_030911 81602 -0.907 chr13:48720104-48765619

cytidine and dCMP deaminase domain containing 1

PHF11 NM_001040443 51131 -0.875 chr13:48967802-49001118

PHD finger protein 11 isoform a

TRIM13 NM_001007278 10206 -0.834 chr13:49469144-49490604

ret finger protein 2 isoform 2

SERP2 NM_001010897 387923 -0.743 chr13:43845978-43869850

stress-associated endoplasmic reticulum protein

FOXO1 NM_002015 2308 -0.701 chr13:40027801-40138734

forkhead box O1

CAB39L NM_030925 81617 -0.556 chr13:48780787- calcium binding protein 39-like

29  

Page 30: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

48873736

CCDC70 NM_031290 83446 -0.528 chr13:51334118-51338373

coiled-coil domain containing 70 precursor

C13orf31 NM_001128303 144811 -0.5 chr13:43351420-43366068

hypothetical protein LOC144811

SLC25A30 NM_001010875 253512 -0.464 chr13:44868044-44890515

solute carrier family 25, member 30

WDFY2 NM_052950 115825 -0.392 chr13:51056485-51234172

WD repeat and FYVE domain containing 2

ITM2B NM_021999 9445 -0.344 chr13:47705309-47734228

integral membrane protein 2B

LECT1 NM_007015 11061 -0.217 chr13:52175400-52211948

leukocyte cell derived chemotaxin 1 isoform 1

LCP1 NM_002298 3936 -0.188 chr13:45598059-45654395

L-plastin

GTF2F2 NM_004128 2963 -0.181 chr13:44592672-44756237

general transcription factor IIF, polypeptide 2,

FAM124A NM_145019 220108 -0.109 chr13:50694508-50753617

hypothetical protein LOC220108

GUCY1B2 NR_003923 2974 -0.07 chr13:50487553-50538143

Guanylate cyclase 1, soluble, beta 2.

MLNR NM_001507 2862 -0.042 chr13:48692475-48694514

motilin receptor

NEK3 NM_152720 4752 -0.035 chr13:51604780-51631511

NIMA-related kinase 3

RCBTB2 NM_001268 1102 -0.017 chr13:47961100-48005317

regulator of chromosome condensation and BTB

TPT1 NM_003295 7178 0.043 chr13:44809304-44813297

tumor protein, translationally-controlled 1

DKFZp434F1622 NR_002816 374500 0.096 chr13:51639848-51763681

Homo sapiens cDNA FLJ14630 fis, clone NT2RP2000459.

COG3 NM_031431 83548 0.247 chr13:44937072-45008761

component of golgi transport complex 3

DHRS12 NM_001031719 79758 0.26 chr13:51242921-51276294

dehydrogenase/reductase (SDR family) member 12

ATP7B NM_000053 540 0.263 chr13:51404806-51483631

ATPase, Cu++ transporting, beta polypeptide

KIAA1704 NM_018559 55425 0.283 chr13:44461687-44500404

hypothetical protein LOC55425

RCBTB1 NM_018191 55213 0.451 chr13:49004083-49057720

regulator of chromosome condensation (RCC1) and

VPS36 NM_016075 51028 0.46 chr13:51884738-51922764

vacuolar protein sorting 36

EPSTI1 NM_001002264 94240 0.477 chr13:42360122-42464377

epithelial stromal interaction 1 isoform 1

ALG11 NM_021645 9724 0.478 chr13:51484551-51504074

asparagine-linked glycosylation 11,

UTP14C NM_021645 9724 0.478 chr13:51500942-51503242

UTP14, U3 small nucleolar ribonucleoprotein,

CPB2 NM_001872 1361 0.492 chr13:45525323-45577212

plasma carboxypeptidase B2 isoform a

ESD NM_001984 2098 0.553 chr13:46243392-46269368

esterase D/formylglutathione hydrolase

RNASEH2B NM_024570 79621 0.656 chr13:50381893-50428902

ribonuclease H2, subunit B isoform 1

C13orf30 NM_182508 144809 0.659 chr13:42253751-42263683

hypothetical protein LOC144809

ELF1 NM_172373 1997 0.714 chr13:40404164-40454418

E74-like factor 1 (ets domain transcription

INTS6 NM_012141 26512 0.748 chr13:50833702-50925276

integrator complex subunit 6 isoform a

MED4 NM_014166 29079 0.895 chr13:47548093-47567241

mediator complex subunit 4

NARG1L NM_024561 79612 0.94 chr13:40783341-40849166

NMDA receptor regulated 1-like protein isoform

NUFIP1 NM_012345 26747 0.943 chr13:44411384-44461613

nuclear fragile X mental retardation protein

30  

Page 31: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

KPNA3 NM_002267 3839 1.003 chr13:49171462-49265058

karyopherin alpha 3

ZC3H13 NM_015070 23091 1.015 chr13:45434315-45524895

zinc finger CCCH-type containing 13

DNAJC15 NM_013238 29103 1.057 chr13:42495362-42581306

DNAJ domain-containing

RB1 NM_000321 5925 1.231 chr13:47775884-47954027

retinoblastoma 1

WBP4 NM_007187 11193 1.762 chr13:40533697-40556139

WW domain-containing binding protein 4

MTRF1 NM_004294 9617 1.768 chr13:40688516-40735713

mitochondrial translational release factor 1

SIAH3 NM_198849 283514 1.848 chr13:45252417-45323847

seven in absentia homolog 3

SLC25A15 NM_014252 10166 1.89 chr13:40261597-40282246

solute carrier family 25 (mitochondrial carrier;

MRPS31 NM_005830 10240 1.917 chr13:40201432-40243347

mitochondrial ribosomal protein S31

DLEU2 NR_002612 8847 2.142 chr13:49454689-49597678

Homo sapiens BCMS-upstream neighbor (BCMSUN) mRNA, partial sequence.

ARL11 NM_138450 115761 2.207 chr13:49100625-49105732

ADP-ribosylation factor-like 11

SUGT1 NM_006704 10910 2.809 chr13:52125002-52160434

suppressor of G2 allele of SKP1 isoform b

CKAP2 NM_001098525 26586 3.003 chr13:51927496-51948764

cytoskeleton associated protein 2 isoform 2

C13orf15 NM_014059 28984 3.753 chr13:40929542-40943013

response gene to complement 32

31  

Page 32: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Table S2. Chromosomal Band 13q14 Promoters with Significant H3K4me3 Peaks in K562 Cells.

Gene Name AK056182 AK095119 AK124928

ALG11 ATP7B

BC039553 C13orf1 CAB39L CDADC1 CKAP2 CLLD8

CR625002 DHRS12

DKFZp434B105 DKFZp434F1622 DKFZp434H1720

DLEU2 EBPL EBRP

GTF2F2 HNRNPA1L2

INTS6 ITM2B

KBTBD6 KIAA0564 KIAA1704

KPNA3 MRPS31 MTRF1

NARG1L NUDT15 NUFIP1 pp13759

RB1 SETDB2

SLC25A15 SUGT1 TPT1

UTP14C WBP4 WDF2

WDFY2 XTP6

ZC3H13

32  

Page 33: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

33  

Table S3. Chromosomal Band 13q14 Genes with Elevated Expression in CD71+ Erythroid Progenitors and Significant H3K4me3 Promoter Peaks in K562 Cells.

Gene Name

CKAP2

DLEU2

MRPS31

MTRF1

RB1

SLC25A15

SUGT1

WBP4

ZC3H13

Page 34: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Table S4. Pathological Analysis of Hematopoiesis in Trisomy 13 Cases.

Identification Number Age, Gender, and Phenotypic Information Notes from Pathological Assessment TR13-1 3 month, female

Trisomy 13 ‐ 90% cellular BM ‐ M:E = 1:1-2 ‐ Full erythroid maturation ‐ 6 megas per 40x hpf, increased ‐ Mild EMH in spleen

TR13-2 5 day, male Trisomy 13

‐ >90% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver

TR13-3 1 day, male Trisomy 13

‐ 100% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 1-2 megas per 40 x hpf ‐ Mild EMH in liver

TR13-4 3 day, female Trisomy D (unclear if karotype done, phenotype

compatible)

‐ 100% cellular BM ‐ M:E = 10:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver

TR13-5 1 day, female Trisomy 13

‐ 100% cellular BM ‐ M:E = 10:1 ‐ Rare erythroids, but seem mature ‐ Overall 1-2 megas per 40 x hpf, focal clusters

present ‐ Mild EMH in liver

TR13-6 1 hr 18 min, male

Trisomy 13 ‐ 100% cellular BM ‐ M:E = 3-5:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Marked EMH in liver

TR13-7 5 hr, male Trisomy 13

‐ 100% cellular BM ‐ M:E = 10:1 ‐ Only rare erythroids, maturation difficult to

assess ‐ 1-2 megas per 40 x hpf ‐ Moderate EMH in liver

TR13-8 1 day, female Trisomy 13 mosaic

‐ 100% cellular BM ‐ M:E = 1:3

34  

Page 35: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Prominent EMH in liver

TR13-9 4 day old, male, Trisomy 13 [46,XY,-D,t(DqDq)+]

‐ 100% cellular BM ‐ 5:1 M:E ratio ‐ Slight left shift of myeloid elements ‐ Full erythroid maturation ‐ Abnormal megakaryocyte nuclei are noted

(hyperchromatic nuclei with “Staghorn” appearance), megas with patchy distribution t/o marrow, foci of increased megakaryocytes

‐ Mild EMH in liver TR13-10 7 day old female, Trisomy 13 Mosaic (present in 36% of

cells) ‐ 100% cellular BM ‐ 1:1 to 1:2 M:E ratio ‐ Full erythroid maturation, but with increased

immature forms (SEE PHOTO) ‐ Overall 2-3 megas per hpf, patchy distribution ‐ Megakaryocytes with abnormal nuclei (Small

hyperchromatic nuclei and “Staghorn” appearance, SEE PHOTO)

‐ No extramedullary hematopoiesis noted in liver ‐ Spleen with mild hematopoiesis

TR13-11 11 day old female, Trisomy 13 ‐ 100% cellular BM ‐ > 5:1 M:E ratio ‐ Some left-shift of myeloid elements ‐ Full erythroid maturation ‐ 5-10 megas per 40x hpf (increased), abnormal

nuclei (small hyperchromatic – SEE PHOTO) ‐ No extramedullary hematopoiesis noted in liver ‐ MIld hematopoiesis in spleen

TR13-12 6 hour old male, Trisomy D1/13 (no genetic studies reported, but phenotypically

compatible)

‐ Scant marrow space, appears 100% cellular ‐ 1:1 M:E ratio (Note: large number of

hematogones excluded in this estimate) ‐ Full erythroid maturation ‐ Scattered megas seen (cytologically

unremarkable, ? appropriate in number) ‐ Marked extramedullary hematopoiesis in liver

with a 1:1 M:E ratio present in the liver TR13-13 5 day old female, Trisomy 13, ABO incombatibility with

jaundice

‐ 100% cellular BM ‐ 2:1 M:E ratio -> varies to 1:1 in certain regions ‐ Full maturation of erythroid elements ‐ Megakaryocytes decreased in number, 0-1 per

40x hpf ‐ ? Liver (not examined) -> small foci of

extramedullary hematopoiesis noted on pathology report

TR13-14 1 year old male [Trisomy 13 never confirmed – these were pre-mortem

studies]

‐ 100% cellular BM (slightly hypercellular for age) ‐ M:E ratio 5-6:1

35  

Page 36: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

36  

‐ Myeloid cells showing some left shift ‐ Full erythroid maturation noted ‐ On avg 3-4 megakaryocytes per 40x hpf, with

small and condensed hyperchromatic nuclei (some with staghorn appearance)

‐ No extramedullary hematopoiesis was noted TR13-15 9 month old female, Trisomy D1 (Trisomy 13) ‐ >95% cellular

‐ M:E ratio 1-3:1 ‐ Slight left shift in myeloid cells ‐ Full maturation of erythroid elements ‐ Megakaryocytes show small hyperchromatic

nuclei with “Staghorn” appearance ‐ Slight increase in megakaryocyte numbers

noted per high power field (~5-7 megakaryocytes seen in most high power fields examined)

‐ Liver not available for review TR13-16 23 day old female, Partial Trisomy D1

Mother gravida 8:

One prior stillborn Four first trimester abortions Two normal kids (one male & one female) Similarities and differences with classic phenotype of trisomy 13

‐ Appropriate cellularity for age (>95% cellular) ‐ Increased hematogones ‐ M:E ratio is 5:1 ‐ Slight left shift in myeloid cells ‐ Full erythroid maturation visualized ‐ Megakaryocytes show small, “Staghorn”

appearing, hyperchromatic nuclei ‐ Slight increase in megakaryocyte numbers, 5-10

per high power field is noted ‐ Small minor clusters of hematopoiesis in the

liver are noted (with both erythroid and myeloid elements seen)

TR13-17 25 day old female, Trisomy D1 (Trisomy 13) ‐ Appropriate cellularity for age (>95% cellular) ‐ M:E ratio is 3-5:1 (morphology is poor) ‐ Left shift in myeloid cells is observed ‐ Full erythroid maturation is seen ‐ Megakaryocytes show small hyperchromatic

“Staghorn” nuclei ‐ ~2 megas per high power field, but given limited

cellularity, appears increased ‐ No observable hematopoiesis in the liver

(multiple sections assessed)

Page 37: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Supplementary Figure Legends

Fig. S1. Schematic of partial trisomy 13 cases with elevated and normal fetal hemoglobin (HbF)

levels. The figure on the left shows the 14 cases with elevated levels of HbF, from which the

proportion of each chromosomal region involved in cases with elevated HbF is derived in Figure

1B. The figure on the right shows the 43 cases with normal HbF levels, from which the

proportion of each chromosomal region involved in cases with normal HbF is derived in Figure

1B. Each case is shown in a vertical column and chromosomal regions included in each case

are shown in red.

Fig. S2. Relative expression in CD71+ erythroid precursors is shown for genes in the

chromosomal band 13q14 region. Relative expression is shown as a log ratio compared to a

panel of 78 other tissues (12). Genes that are known to play a role in erythropoiesis and globin

gene regulation consistently show a relative expression > 1 using this approach (including

BCL11A, GATA1, KLF1, and SOX6). All of these genes and their relative expression in CD71+

erythroid progenitors are shown in Table S2.

Fig. S3. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression at day 5 of

erythroid differentiation in primary human CD34+ derived cells transduced with pLVX-puro or

pLVX-miR-15a-16-1 lentiviruses. Quantification was performed using the ∆∆Ct method using

RNU19 as a control.

Fig. S4. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression in K562 cells

transduced with pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses. Quantification was

performed using the ∆∆Ct method using RNU19 as a control.

37  

Page 38: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Fig. S5. CD36 and CD45 staining of pLVX control (black) and pLVX-miR-15a-16-1 (cyan)

transduced erythroid progenitors on day 5 of differentiation show similar levels of staining. The

FACS plot shown are representative of three independent samples. The extent of staining is

similar overall with no major differences for mean fluorescence intensity of either marker. CD36

is expressed at this stage of erythropoiesis and low levels of CD45 are also present. This

suggests that these cells are phenotypically similar. There is a slight tendency for the pLVX-

miR-15a-16-1 cells to be slightly bigger, giving rise to correspondingly higher mean

fluorescence intensities of these markers. The high intensity tail present in CD45 samples is

from the small amount of myeloid cells present in the culture.

Fig. S6. A comparison of aggregate PCT compared with relative expression in erythroid tissues

of interest. Graphs are shown for bone marrow, fetal liver, and K562 cells showing relative

expression (as a log2 expression ratio) compared to a panel of 78 other tissues (12). The MYB

gene is highlighted in red in all of the graphs. The x-axis plots aggregate PCT (14) on a linear

scale, while the y-axis shows relative expression in the various tissues as a log2 ratio.

Fig. S7. Relative normalized expression of γ-globin (HBG) and ε-globin (HBE) from a MYB

siRNA dataset in primary erythroid cord blood progenitors (GSE13110). The expression of γ-

globin (HBG) and ε-globin (HBE) was normalized with all genes in the dataset and plotted

relative to the expression of adult β-globin (HBB). The data from control, mock siRNA, and

MYB siRNA experiments are shown, respectively, from left to right. These data confirm the role

that MYB plays in silencing the fetal and embryonic globin genes in human erythroid

progenitors.

Fig. S8. Representative cytospins from control or shMYB 1 transduced cells from day 6 of

erythroid differentiation. The shMYB cells appear more mature with smaller cell size and more

38  

Page 39: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

compact nuclei. Additionally an increased number of myeloid (monocytes) are found in various

cytospins, as exemplified by the image on the far right.

Fig. S9. Gene expression of known regulators of globin gene expression and switching.

Expression levels of the previously characterized regulators of globin gene expression,

BCL11A, GATA1, KLF1, ZFPM1, and SOX6 (13) are depicted after processing and normalizing

the microarray data from control cells and cells in which MYB has been knocked down with

shMYB (n = 4 per group). The data is depicted using log2 expression levels. The error bars

show the standard deviation of the mean.

Fig. S10. Gene set enrichment analysis (GSEA) demonstrates that precocious erythroid

differentiation occurs with MYB knockdown. GSEA (6, 7) was used to examine whether a gene

expression set derived from significantly upregulated genes between the MYB shRNA and

control cells (totaling 188 significantly different genes), was enriched when comparing later time

points of erythroid differentiation among G1E cells (9). For these comparisons, the 21 hour

(top) and 30 hour (bottom) time points were compared with the 0 hour time points. The G1E

expression data was derived from the Gene Expression Omnibus accession number GSE628

dataset (9).

Fig. S11. Marked upregulation of γ-globin gene expression upon knockdown of MYB in K562

cells. qRT-PCR was used to compare γ-globin gene expression in K562 cells transduced with

pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses following selection. All experimental samples

are significantly different than the controls (p < 0.001, n = 3-4 per group).

Fig. S12. Alteration in cell cycle regulators on knockdown of MYB. Increase in the relative

expression of cell cycle regulatory genes from microarray data of shMYB and control samples is

39  

Page 40: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

shown. The decrease of certain positive cell cycle regulators and increase of certain negative

cell cycle regulators is depicted in the graphs. All data is depicted as the normalized mean ±

the standard deviation (n=4 per group).

Fig. S13. Normal megakaryocyte morphology on bone marrow histological sections. These

sections are shown at the same magnification and with similar processing and staining of

samples, as those shown in Fig. 4. Examples of two normal megakaryocytes (with normal

nuclear morphology) are highlighted in the images with cyan arrows. All images are shown at

400X magnification and slides were stained with hematoxylin and eosin.

40  

Page 41: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

References

1. Sankaran VG, et al. (2008) Human fetal hemoglobin expression is regulated by the

developmental stage-specific repressor BCL11A. Science (New York, N.Y 322(5909):1839-1842.

2. Sankaran VG, Orkin SH, & Walkley CR (2008) Rb intrinsically promotes erythropoiesis by coupling cell cycle exit with mitochondrial biogenesis. Genes & development 22(4):463-475.

3. Sankaran VG, et al. (2009) Developmental and species-divergent globin switching are driven by BCL11A. Nature 460(7259):1093-1097.

4. Dai M, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research 33(20):e175.

5. Bianchi E, et al. (2010) c-myb supports erythropoiesis through the transactivation of KLF1 and LMO2 expression. Blood 116(22):e99-e110.

6. Mootha VK, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics 34(3):267-273.

7. Subramanian A, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43):15545-15550.

8. Watkins NA, et al. (2009) A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood 113(19):e1-9.

9. Welch JJ, et al. (2004) Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104(10):3136-3147.

10. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part II. Normal erythropoiesis. Archives of disease in childhood 27(133):214-221.

11. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part I. The normal bone marrow. Archives of disease in childhood 27(132):128-133.

12. Su AI, et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101(16):6062-6067.

13. Sankaran VG, Xu J, & Orkin SH (2010) Advances in the understanding of haemoglobin switching. British journal of haematology 149(2):181-194.

14. Friedman RC, Farh KK, Burge CB, & Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome research 19(1):92-105.

41  

Page 42: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

42  

15. Rogers JF (1984) Clinical delineation of proximal and distal partial 13q trisomy. Clin Genet 25(3):221-229.

16. Tharapel SA, Lewandowski RC, Tharapel AT, & Wilroy RS, Jr. (1986) Phenotype-karyotype correlation in patients trisomic for various segments of chromosome 13. J Med Genet 23(4):310-315.

17. Korbel JO, et al. (2009) The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc Natl Acad Sci U S A 106(29):12031-12036.

Page 43: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Elevated HbF Subject Ploidy (red=trisomy)

Subject number

Ban

d

2 4 6 8 10 12

pter-q11q12q13q14q21q22q31

q32-qter

Normal HbF Subject Ploidy (red=trisomy)

Subject number

Ban

d

10 20 30 40

pter-q11q12q13q14q21q22q31

q32-qter

Figure S1

Page 44: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

-4

-3

-2

-1

0

1

2

3

4

5

EBPL

DG

KH

FND

C3A

NU

DT1

5

C13o

rf1

SUG

T1L1

LRCH

1

THSD

1

HTR

2A

CYSL

TR2

KCN

RG

MG

C559

0

AKA

P11

BC02

5370

ENO

X1

TSC2

2D1

SUCL

A2

SERP

INE3

TNFS

F11

KIA

A05

64

C13o

rf18

P2RY

5

KCTD

4

CDA

DC1

PHF1

1

TRIM

13

SERP

2

FOXO

1

CAB3

9L

CCD

C70

C13o

rf31

SLC2

5A30

WD

FY2

ITM

2B

LECT

1

LCP1

GTF

2F2

FAM

124A

GU

CY1B

2

MLN

R

NEK

3

RCBT

B2

TPT1

DKF

Zp43

4F16

22

COG

3

DH

RS12

ATP

7B

KIA

A17

04

RCBT

B1

VPS3

6

EPST

I1

ALG

11

UTP

14C

CPB2 ESD

RNA

SEH

2B

C13o

rf30

ELF1

INTS

6

MED

4

NA

RG1L

NU

FIP1

KPN

A3

ZC3H

13

DN

AJC

15 RB1

WBP

4

MTR

F1

SIA

H3

SLC2

5A15

MRP

S31

DLE

U2

DLE

U2

ARL

11

SUG

T1

CKA

P2

C13o

rf15

CD71+ Erythroid

Figure S2

Page 45: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S3

Control miR-15a-16-10.0

0.5

1.0

1.5

2.0R

elat

ive

miR

-15a

Exp

ress

ion

Control miR-15a-16-10.0

0.5

1.0

1.5

2.0

Rel

ativ

e m

iR-1

6 Ex

pres

sion

Page 46: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S4

Control miR-15a-16-10

1

2

3

4

Rel

ativ

e m

iR-1

5a E

xpre

ssio

n

Control miR-15a-16-10.0

0.5

1.0

1.5

2.0

2.5

Rel

ativ

e m

iR-1

6 Ex

pres

sion

Page 47: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S5

CD36 CD45

Page 48: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S6

1.1

-6

-4

-2

0

2

4

6

Aggregate PCT

Log 2

Expr

essi

onR

atio

1.1

-5

0

5

10

Aggregate PCT

Log 2

Expr

essi

onR

atio

1.1

-6

-4

-2

0

2

4

6

Aggregate PCTLog 2

Expr

essi

onR

atio

Bone Marrow

Fetal Liver

K562 Cells

Page 49: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S7

Mock

Control s

iRNA

MYB siRNA

0

2

4

6

8

10

HB

G/H

BB

Mock

Control s

iRNA

MYB siRNA

0.0

0.2

0.4

0.6

0.8

HB

E/H

BB

Page 50: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S8

Control shMYB

Page 51: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

BCL11A

GATA1

ZFPM1KLF1

SOX60

5

10

15C ontrolshMYB

Nor

mal

ized

Exp

ress

ion

Leve

l (L

og2)

Figure S9

Page 52: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

G1E cells21 hours vs. 0 hours

p = 0.004

G1E cells30 hours vs. 0 hours

p < 0.001

Figure S10

Page 53: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S11

Control

shMYB 1

shMYB 2

0

5

10

15R

elat

iveγ -

Glo

bin

Expr

essi

on

Page 54: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

CDK2CDK4

CDK6MYC

CDKN1B

CDKN1C

CDKN2C

CDKN2D0.0

0.5

1.0

1.5

2.0C ontrolshMYB

Rel

ativ

e Ex

pres

sion

Figure S12

PositiveCell CycleRegulators

NegativeCell CycleRegulators

Page 55: Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović… · Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, ... Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish*

Figure S13