11
LETTERS https://doi.org/10.1038/s41587-019-0041-2 1 Ludwig Institute for Cancer Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK. 2 Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK. 3 Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China. 4 College of Chemistry, Nankai University, Tianjin, China. 5 Center for Mitochondrial Biology and Medicine and Center for Translational Medicine, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China. 6 These authors contributed equally: Yibin Liu, Paulina Siejka-Zielińska *e-mail: [email protected]; [email protected] Bisulfite sequencing has been the gold standard for mapping DNA modifications including 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) for decades 14 . However, this harsh chemical treatment degrades the majority of the DNA and generates sequencing libraries with low complex- ity 2,5,6 . Here, we present a bisulfite-free and base-level-res- olution sequencing method, TET-assisted pyridine borane sequencing (TAPS), for detection of 5mC and 5hmC. TAPS combines ten-eleven translocation (TET) oxidation of 5mC and 5hmC to 5-carboxylcytosine (5caC) with pyridine borane reduction of 5caC to dihydrouracil (DHU). Subsequent PCR converts DHU to thymine, enabling a C-to-T transition of 5mC and 5hmC. TAPS detects modifications directly with high sen- sitivity and specificity, without affecting unmodified cytosines. This method is nondestructive, preserving DNA fragments over 10 kilobases long. We applied TAPS to the whole-genome mapping of 5mC and 5hmC in mouse embryonic stem cells and show that, compared with bisulfite sequencing, TAPS results in higher mapping rates, more even coverage and lower sequenc- ing costs, thus enabling higher quality, more comprehensive and cheaper methylome analyses. DNA cytosine modifications are important epigenetic mecha- nisms that play crucial roles in a broad range of biological processes from gene regulation to normal development 7 . 5mC and 5hmC are by far the two most common epigenetic marks found in the mam- malian genome. 5hmC is generated from 5mC by the ten-eleven translocation (TET) family of dioxygenases 8 . TET can further oxidize 5hmC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), which exist in much lower abundances in the mammalian genome compared with 5mC and 5hmC (10-fold to 100-fold lower than 5hmC) 9 . Aberrant DNA methylation and hydroxymethylation have been associated with various diseases and are well accepted hallmarks of cancer 10,11 . Therefore, determination of the genomic distribution of 5mC and 5hmC is not only important for our under- standing of development and homeostasis, but is also invaluable for clinical applications 12,13 . The current gold standard and the only option for base-level res- olution and quantitative DNA methylation and hydroxymethylation analysis is bisulfite sequencing 1,2 and its derived methods, including TET-assisted bisulfite sequencing (TAB-Seq) 3 and oxidative bisul- fite sequencing (oxBS) 4 . All these methods employ bisulfite treat- ment to convert unmethylated cytosine to uracil while leaving 5mC and/or 5hmC intact. Since PCR amplification of the bisulfite-treated DNA reads uracil as thymine, the modification of each cytosine can be inferred at single-base resolution, where C-to-T transitions provide the locations of the unmethylated cytosines. There are, however, two main drawbacks to bisulfite sequencing. First, bisul- fite treatment is a harsh chemical reaction, which degrades most of the DNA due to depyrimidination under the required acidic and thermal conditions 5 . This severely limits its utility if sample DNA quantities are low. Second, bisulfite sequencing relies on the com- plete conversion of unmodified cytosine to thymine. Unmodified cytosine accounts for approximately 95% of the total cytosine in the human genome. Converting all these positions to thymine severely reduces sequence complexity, leading to poor sequencing quality, low mapping rates, uneven genome coverage and increased sequencing cost. Consequently, bisulfite sequencing suffers from pronounced sequencing biases due to selective and context-specific DNA degradation 6 . To solve these problems, a mild reaction that can directly detect 5mC and 5hmC at base-level resolution, with- out affecting unmodified cytosine, is desired to accurately estimate methylation levels. Recently, an elegant bisulfite-free and base-level resolution method for sequencing 5fC has been developed on the basis of the Friedländer synthesis reaction, which can induce a 5fC-to-T transi- tion 14,15 . However, this method has limited application since 5fC is a rare modification and there is no way to convert 5mC efficiently and completely to 5fC 16 . There is, however, a convenient way to convert 5mC and 5hmC to 5caC. The TET enzymes readily oxidize 5mC and 5hmC to the final oxidation product 5caC in vitro 9,17 . We envisioned that if we could induce a 5caC-to-T transition, it could be combined with TET oxidation of 5mC and 5hmC to enable direct detection of 5mC and 5hmC. Here, we present such a 5caC- to-T transition chemistry and its application for whole-genome base-level resolution detection of cytosine modifications. We started with an 11mer 5caC-containing DNA oligonucleotide as a model DNA, which we used to screen chemicals that could react with 5caC, as monitored by matrix-assisted laser desorp- tion/ionization mass spectroscopy (MALDI). We discovered that certain borane-containing compounds could efficiently react with the 5caC oligonucleotide, resulting in a molecular weight reduc- tion of 41 Da (Supplementary Fig. 1a and Fig. 1a). We chose pyri- dine borane and its derivative 2-picoline borane (pic-borane) for further study, as they are commercially available and environmen- tally benign reducing agents 18 . We repeated the reaction on 5caC single nucleoside and confirmed that these agents convert 5caC to Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution Yibin Liu 1,2,6 , Paulina Siejka-Zielińska 1,2,6 , Gergana Velikova 1,2 , Ying Bi  1,2 , Fang Yuan 1,2,3 , Marketa Tomkova  1 , Chunsen Bai 1,2,4 , Lei Chen 1,2,5 , Benjamin Schuster-Böckler 1 * and Chun-Xiao Song  1,2 * NATURE BIOTECHNOLOGY | www.nature.com/naturebiotechnology

B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: [email protected]; [email protected]

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

Lettershttps://doi.org/10.1038/s41587-019-0041-2

1Ludwig Institute for Cancer Research, Nuffield Department of Medicine, University of Oxford, Oxford, UK. 2Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK. 3Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China. 4College of Chemistry, Nankai University, Tianjin, China. 5Center for Mitochondrial Biology and Medicine and Center for Translational Medicine, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China. 6These authors contributed equally: Yibin Liu, Paulina Siejka-Zielińska *e-mail: [email protected]; [email protected]

Bisulfite sequencing has been the gold standard for mapping DNA modifications including 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) for decades1–4. However, this harsh chemical treatment degrades the majority of the DNA and generates sequencing libraries with low complex-ity2,5,6. Here, we present a bisulfite-free and base-level-res-olution sequencing method, TET-assisted pyridine borane sequencing (TAPS), for detection of 5mC and 5hmC. TAPS combines ten-eleven translocation (TET) oxidation of 5mC and 5hmC to 5-carboxylcytosine (5caC) with pyridine borane reduction of 5caC to dihydrouracil (DHU). Subsequent PCR converts DHU to thymine, enabling a C-to-T transition of 5mC and 5hmC. TAPS detects modifications directly with high sen-sitivity and specificity, without affecting unmodified cytosines. This method is nondestructive, preserving DNA fragments over 10 kilobases long. We applied TAPS to the whole-genome mapping of 5mC and 5hmC in mouse embryonic stem cells and show that, compared with bisulfite sequencing, TAPS results in higher mapping rates, more even coverage and lower sequenc-ing costs, thus enabling higher quality, more comprehensive and cheaper methylome analyses.

DNA cytosine modifications are important epigenetic mecha-nisms that play crucial roles in a broad range of biological processes from gene regulation to normal development7. 5mC and 5hmC are by far the two most common epigenetic marks found in the mam-malian genome. 5hmC is generated from 5mC by the ten-eleven translocation (TET) family of dioxygenases8. TET can further oxidize 5hmC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), which exist in much lower abundances in the mammalian genome compared with 5mC and 5hmC (10-fold to 100-fold lower than 5hmC)9. Aberrant DNA methylation and hydroxymethylation have been associated with various diseases and are well accepted hallmarks of cancer10,11. Therefore, determination of the genomic distribution of 5mC and 5hmC is not only important for our under-standing of development and homeostasis, but is also invaluable for clinical applications12,13.

The current gold standard and the only option for base-level res-olution and quantitative DNA methylation and hydroxymethylation analysis is bisulfite sequencing1,2 and its derived methods, including TET-assisted bisulfite sequencing (TAB-Seq)3 and oxidative bisul-fite sequencing (oxBS)4. All these methods employ bisulfite treat-ment to convert unmethylated cytosine to uracil while leaving 5mC and/or 5hmC intact. Since PCR amplification of the bisulfite-treated

DNA reads uracil as thymine, the modification of each cytosine can be inferred at single-base resolution, where C-to-T transitions provide the locations of the unmethylated cytosines. There are, however, two main drawbacks to bisulfite sequencing. First, bisul-fite treatment is a harsh chemical reaction, which degrades most of the DNA due to depyrimidination under the required acidic and thermal conditions5. This severely limits its utility if sample DNA quantities are low. Second, bisulfite sequencing relies on the com-plete conversion of unmodified cytosine to thymine. Unmodified cytosine accounts for approximately 95% of the total cytosine in the human genome. Converting all these positions to thymine severely reduces sequence complexity, leading to poor sequencing quality, low mapping rates, uneven genome coverage and increased sequencing cost. Consequently, bisulfite sequencing suffers from pronounced sequencing biases due to selective and context-specific DNA degradation6. To solve these problems, a mild reaction that can directly detect 5mC and 5hmC at base-level resolution, with-out affecting unmodified cytosine, is desired to accurately estimate methylation levels.

Recently, an elegant bisulfite-free and base-level resolution method for sequencing 5fC has been developed on the basis of the Friedländer synthesis reaction, which can induce a 5fC-to-T transi-tion14,15. However, this method has limited application since 5fC is a rare modification and there is no way to convert 5mC efficiently and completely to 5fC16. There is, however, a convenient way to convert 5mC and 5hmC to 5caC. The TET enzymes readily oxidize 5mC and 5hmC to the final oxidation product 5caC in vitro9,17. We envisioned that if we could induce a 5caC-to-T transition, it could be combined with TET oxidation of 5mC and 5hmC to enable direct detection of 5mC and 5hmC. Here, we present such a 5caC- to-T transition chemistry and its application for whole-genome base-level resolution detection of cytosine modifications.

We started with an 11mer 5caC-containing DNA oligonucleotide as a model DNA, which we used to screen chemicals that could react with 5caC, as monitored by matrix-assisted laser desorp-tion/ionization mass spectroscopy (MALDI). We discovered that certain borane-containing compounds could efficiently react with the 5caC oligonucleotide, resulting in a molecular weight reduc-tion of 41 Da (Supplementary Fig. 1a and Fig. 1a). We chose pyri-dine borane and its derivative 2-picoline borane (pic-borane) for further study, as they are commercially available and environmen-tally benign reducing agents18. We repeated the reaction on 5caC single nucleoside and confirmed that these agents convert 5caC to

Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolutionYibin Liu1,2,6, Paulina Siejka-Zielińska1,2,6, Gergana Velikova1,2, Ying Bi   1,2, Fang Yuan1,2,3, Marketa Tomkova   1, Chunsen Bai1,2,4, Lei Chen1,2,5, Benjamin Schuster-Böckler1* and Chun-Xiao Song   1,2*

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

yanyan
高亮
Page 2: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

Letters Nature BiotechNology

DHU (Fig. 1b and Supplementary Note 1), a previously unknown reductive decarboxylation/deamination reaction (Supplementary Fig. 1b). Interestingly, we found pyridine borane and pic-borane can also convert 5fC to DHU through an apparent reductive defor-mylation/deamination mechanism (Supplementary Figs. 2 and 3). The detailed mechanisms of both reactions remain to be defined. Quantitative analysis of the borane reaction on the DNA oligo-nucleotide by high-performance liquid chromatography–tandem mass spectrometry (HPLC–MS/MS) confirms that it converts 5caC and 5fC to DHU with around 98% efficiency and has no activity against unmethylated cytosine, 5mC or 5hmC (Fig. 1c).

As a uracil derivative, DHU can be recognized by both DNA and RNA polymerases as thymine19. Therefore, borane reduction could induce both 5caC-to-T and 5fC-to-T transitions, and can be used for base-level resolution sequencing of 5fC and 5caC, which we termed pyridine borane sequencing (Supplementary Table 1). The borane reduction of 5fC and 5caC can be blocked by hydrox-ylamine conjugation20 and 1-ethyl-3-(3-dimethylaminopropyl)car-bodiimide coupling21, respectively (Supplementary Fig. 3), which allows pyridine borane sequencing to be used to sequence 5fC or 5caC specifically (Supplementary Table 1). More importantly, we can now use TET enzymes to oxidize 5mC and 5hmC to 5caC, and then subject 5caC to borane reduction in a process we call TAPS (Fig. 1d,e). TAPS can induce a C-to-T transition of 5mC and 5hmC, and therefore can be used for base-level resolution detection of 5mC and 5hmC. Furthermore, β-glucosyltransferase can label 5hmC with glucose and thereby protect it from TET oxidation3 and borane reduction (Supplementary Fig. 4), enabling the selective sequenc-ing of only 5mC, in a process we call TAPSβ (Fig. 1d,e). 5hmC sites can then be deduced by subtraction of TAPSβ from TAPS measure-ments. Alternatively, we can use potassium perruthenate (KRuO4), a reagent previously used in oxBS4, to replace TET as a chemical

oxidant to specifically oxidize 5hmC to 5fC (Supplementary Fig. 4). This approach, which we call chemical-assisted pyridine borane sequencing (CAPS), can be used to sequence 5hmC spe-cifically (Fig. 1d,e). Therefore, TAPS and related methods can, in principle, offer a comprehensive suite to sequence all four cytosine epigenetic modifications (Fig. 1d,e and Supplementary Table 1).

We next aimed to evaluate the performance of TAPS in com-parison with bisulfite sequencing, the current standard and most widely used method for base-level resolution mapping of 5mC and 5hmC. We tested both Naegleria TET-like oxygenase (NgTET1)22 and mouse Tet1 (mTet1)3 since both can efficiently oxidize 5mC to 5caC in vitro. To confirm the 5mC-to-T transition, we applied TAPS to model DNAs containing fully methylated CpG sites and showed that it can effectively convert 5mC to T, as demonstrated by restric-tion enzyme digestion (Supplementary Fig. 5) and Sanger sequencing (Fig. 2a). We also validated TAPSβ and CAPS by Sanger sequenc-ing (Supplementary Fig. 6). DHU is close to a natural base and is compatible with various PCR polymerases and isothermal DNA or RNA polymerases (Supplementary Fig. 7) and has a similar PCR effi-ciency without bias to T and C during PCR (Supplementary Fig. 8). We next applied TAPS to genomic DNA from mouse embryonic stem cells (mESCs). HPLC–MS/MS quantification showed that, as expected, 5mC accounts for 98.5% of cytosine modifications in the mESCs genomic DNA; the remainder is composed of 5hmC (1.5%) and trace amounts of 5fC and 5caC, and no DHU (Fig. 2b). After TET oxidation, about 96% of cytosine modifications were oxidized to 5caC and 3% were oxidized to 5fC (Fig. 2b). After borane reduc-tion, over 99% of the cytosine modifications were converted into DHU (Fig. 2b). These results demonstrate that both TET oxidation and borane reduction work efficiently on genomic DNA. Both TET oxidation and borane reduction are mild reactions, with no nota-ble DNA degradation compared with bisulfite treatment (Fig. 2c).

-3ʹm/z = 3361.2

5ʹ-5ʹ- -3ʹm/z = 3320.2

Pyridine borane

Base BS TAB-Seq oxBS TAPS TAPSβ CAPS

C T T T C C C

5mC C T C T T C

5hmC C C T T C T

0

0.2

0.4

0.6

0.8

1

3,200 3,300 3,400 3,500 3,600

Rel

ativ

e in

tens

ity

Mass (m/z)

3362.7

0

0.2

0.4

0.6

0.8

1

3,200 3,300 3,400 3,500 3,600

Rel

ativ

e in

tens

ity

Mass (m/z)

3321.2

C 5mC 5hmC

TET

βGT

KRuO4TET

pic-borane

C 5caC 5caC

C 5mC 5gmC

C 5mC 5fCC 5caC 5gmC

C DHU DHU C 5mC DHUC DHU 5gmC

PCR DHU as T

C T T C C TC T C

TAPS CAPSTAPSβ

Borane reduction

PCR DHU as T

a

b c

d

e

Cytosinemodification in oligo

Conversionto DHU

C 0%5mC 0%

5hmC 0%5fC 97.6±0.1%

5caC 98.1±0.6%

NH

O

ON

O

OH

HO

N

NH2

ON

O

OH

HO

HO

O

NBH3

Pyridine borane

5caC DHU

Fig. 1 | Borane reduction on DNa oligonucleotides. a, MALDI characterization of 5caC-containing 11mer model DNA treated with pyridine borane. Experiment was performed once. Calculated mass is shown in black, and observed mass is shown in red. Conversion rate for other boranes are summarized in Supplementary Fig. 1a. b, Pyridine borane reduction of 5caC to DHU single nucleoside. c, The conversion rates of dC and various cytosine derivatives quantified by HPLC–MS/MS. Data shown as mean ± s.d. of three independent experiments (n = 3). d, Overview of the TAPS, TAPSβ and CAPS. e, Comparison of bisulfite sequencing and related methods versus TAPS and CAPS for 5mC and 5hmC sequencing.

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

yanyan
高亮
yanyan
高亮
yanyan
高亮
yanyan
高亮
yanyan
高亮
Page 3: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

LettersNature BiotechNology

Another notable advantage over bisulfite sequencing is that TAPS is nondestructive and can preserve DNA over 10 kilobases (kb) long (Fig. 2c). Moreover, DNA remains double stranded after TAPS, and the conversion is independent of DNA length (Supplementary Fig. 9).

We subsequently performed whole-genome sequencing on two samples of genomic DNA from E14 mESCs23, one converted using TAPS and the other using standard whole-genome bisulfite sequencing (WGBS) for comparison. We used 100 ng DNA for TAPS, compared with 200 ng DNA for WGBS. To assess the accu-racy of TAPS, we added three different types of spike-in controls. Lambda-DNA, where all CpGs were fully methylated, was used to estimate the false-negative rate (non-conversion rate of 5mC); a 2 kb unmodified amplicon was used to estimate the false-positive rate (conversion rate of unmodified C); and a synthetic oligonucleotide spike-in containing both a methylated and hydroxymethylated C surrounded by any other base (N5mCNN and N5hmCNN, respec-tively) was used to compare the conversion rate on 5mC and 5hmC in different sequence contexts. We found mTet1 leads to a higher conversion rate than NgTET1, while pyridine borane leads to a lower false-positive rate than pic-borane (Supplementary Fig. 10). As a result, the combination of mTet1 and pyridine borane achieved the highest 5mC conversion rate (96.5% and 97.3% in lambda and synthetic spike-ins, respectively) and the lowest conversion rate

of unmodified C (0.23%) (Fig. 3a,b and Supplementary Fig. 10). A false-negative rate between 2.7% and 3.5%, with a false-positive rate of only 0.23%, is comparable to bisulfite sequencing: a recent study showed nine commercial bisulfite kits had average false-neg-ative and false-positive rates of 1.7% and 0.6%, respectively24. The synthetic spike-ins suggest that TAPS works well on both 5mC and 5hmC, and that TAPS performs only slightly worse in non-CpG contexts. The conversion for 5hmC is 8.2% lower than 5mC, and the conversion for non-CpG contexts is 11.4% lower than for CpG contexts (Fig. 3a).

WGBS data requires special software, such as Bismark25, both for the alignment and modification-calling steps. In contrast, our processing pipeline uses a standard genomic aligner (bwa)26, followed by a custom modification-calling tool that we call ‘asTair’. When processing simulated WGBS and TAPS reads (derived from the same semi-methylated source sequence), TAPS/asTair was more than three times faster than WGBS/Bismark (Fig. 3c).

Due to the conversion of nearly all cytosine to thymine, WGBS libraries feature an extremely skewed nucleotide composition which can negatively affect Illumina sequencing. Consequently, WGBS reads showed substantially lower sequencing quality scores at cytosine/guanine base pairs compared with TAPS (Fig. 3d). To compensate for the nucleotide composition bias, at least 10–20%

a b

c

0

25

50

75

100

5mC

5hmC

5fC

5caC

DHUSpe

cies

per

cent

age

mESC_control NgTET1 pic-borane

TAPS

Ctrl TAPS BSCondition:

Genomic DNA: 0.5 – 1 kb 1 – 3 kb Unfragmented

Ctrl Ctrl TAPSTAPS BS BS

0.5 kb

1 kb

3 kb

10 kb

Chilled

0.5 kb

1 kb

3 kb

10 kb

Ctrl TAPS BS

0.5 – 1 kb 1 – 3 kb Unfragmented

Ctrl Ctrl TAPSTAPS BS BS

Genomic DNA:

Condition:

Fig. 2 | TaPS on model DNa and meSC genomic DNa. a, Sanger sequencing results for a model DNA containing fully methylated CpG sites before (top) and after (bottom) TAPS. Only 5mC is converted to T after TAPS. b, HPLC–MS/MS quantification of relative modification levels in the mESCs genomic DNA control, after NgTET1 oxidation and after pic-borane reduction. Error bars represent ±s.d. (n = 3 independent experiments). Individual data points are overlaid on the plot. c, Agarose gel images of mESC genomic DNA of various fragment lengths oxidized with mTet1CD and reduced with pyridine borane according to TAPS protocol or treated with bisulfite before (left panel) and after (right panel) cooling down on ice. DNA after TAPS remained double stranded and could be directly visualized on the gel. Bisulfite treatment caused more damage and fragmentation to the samples and DNA became single stranded and could be visualized only after being chilled on ice. TAPS conversion was complete for all genomic DNA regardless of fragment length as shown in Supplementary Fig. 9. Experiment was performed once.

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

yanyan
高亮
yanyan
高亮
yanyan
高亮
Page 4: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

Letters Nature BiotechNology

PhiX DNA (a base-balanced control library) is commonly added to WGBS libraries27. Accordingly, we supplemented the WGBS library with 15% PhiX. This, in combination with the reduced information content of bisulfite sequencing-converted reads, and DNA degra-dation as a result of bisulfite treatment, resulted in significantly lower mapping rates for WGBS compared with TAPS (Fig. 3e and Supplementary Table 2). Therefore, for the same sequencing cost (one NextSeq High Output run), the average depth of TAPS exceeded that of WGBS (21× and 13.1×, respectively; Supplementary Table 3). In addition, TAPS resulted in fewer uncovered regions, and overall showed a more even coverage distribution, even after down-sam-pling to the same sequencing depth as WGBS (interquartile range: 9 and 11, respectively; Supplementary Table 3 and Supplementary Fig. 11a). For example, CpG islands (CGIs) in particular were gen-erally better covered by TAPS, even when controlling for differences in sequencing depth between WGBS and TAPS (Fig. 4a), while both showed equivalent demethylation inside CGIs (Supplementary Fig. 12). Moreover, WGBS showed a slight bias of decreased modifi-cation levels in highly covered CpG sites (Supplementary Fig. 13a), while our results suggest that TAPS exhibits very little modification-coverage bias (Supplementary Fig. 13b). These results demonstrate that TAPS dramatically improved sequencing quality compared with WGBS, while effectively halving the sequencing cost.

The higher and more even genome coverage of TAPS resulted in a larger number of CpG sites covered by at least three reads. With TAPS, 88.3% of all 42,741,339 CpG sites in the E14 genome were

covered at this level, compared to only 77.6% with WGBS (Fig. 4b and Supplementary Fig. 11b). TAPS and WGBS resulted in highly correlated methylation measurements across chromosomal regions (Fig. 4c and Supplementary Fig. 14). On a per-nucleotide basis, 32,428,076 CpG positions were covered by at least three reads in both methods (Fig. 4b). We defined ‘modified CpGs’ as all CpG positions with a modification level of at least 10%28. Using this threshold, 97.0% of CpGs covered by three reads showed matching modification states between TAPS and WGBS. Of the CpGs that were found modified and covered by three reads in WGBS, 96.2% were recalled as modified by TAPS, indicating good agreement between WGBS and TAPS (Fig. 4d). When comparing modification levels on each CpG covered by at least three reads in both WGBS and TAPS, we observed good correlation between TAPS and WGBS (Pearson r = 0.68, P < 2 × 10−16, two-sided Pearson correlation test; Fig. 4e). Together, these results indicate that TAPS can directly replace WGBS, and, in fact, provides a more comprehensive view of the methylome than WGBS.

Finally, we tested TAPS with low-input DNA and showed that TAPS can work with as little as 1 ng of genomic DNA. TAPS also works effectively with down to 1 ng of circulating cell-free DNA. These results demonstrate the potential of TAPS for low-input DNA and clinical applications (Supplementary Fig. 15).

In summary, we have developed and demonstrated TAPS for whole-methylome sequencing. By using mild enzymatic and chemi-cal reactions to detect 5mC and 5hmC directly at base-level resolution,

TA

PS

WG

BS

0

300

600

900

Pro

cess

ing

time

(s)

Uni

que

map

ping

rat

e (%

)

0

10

20

30

40

50

60

70

80

90

100

0

0.1

0.2

0.3

35

30

25

20

15

A C

First in pair Second in pair

G T A C G T

A CFirst in pair Second in pair

G T A C G T

35

30

25

20

15Unmodified

0

25

50

75

100

a

c e

b d

97.3

89.185.9

80.50.23

Methylated Hydroxymethylated

Fra

ctio

n co

nver

ted

cyto

sine

s (%

)

TAPS

CpH

Sequencing base quality

CpG

WGBSTAPS WGBS

Fal

se p

ositi

ve r

ate

(%)

Phr

ed s

core

35

30

25

20

15

35

30

25

20

15

Phr

ed s

core

Fig. 3 | improved sequencing quality of TaPS over WGBS. a, Conversion rate in TAPS-treated synthetic spike-ins (CpN) methylated or hydroxymethylated at known positions. b, False-positive rate of TAPS from unmodified 2 kb spike-in. c, Total run time of TAPS and WGBS when processing 1 million simulated reads on one core of one Intel Xeon CPU. d, Sequencing quality scores per base for the first and second reads in all sequenced read pairs, as reported by Illumina BaseSpace; TAPS (top) and WGBS (bottom). Nucleotide is denoted by color. Boxplots visualize 10 million random sequencing reads showing medians, quantiles and non-outlier extreme values. e, Fraction of all sequenced read pairs (after trimming) that were uniquely mapped to the genome.

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 5: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

LettersNature BiotechNology

with high sensitivity and specificity without affecting unmodified cytosines, TAPS outperforms bisulfite sequencing in providing a high quality and more complete methylome at half the sequencing cost. As such, TAPS could replace bisulfite sequencing as the stan-dard in DNA methylcytosine and hydroxymethylcytosine analysis. Rather than introducing a bulky modification on cytosine in the bisulfite-free 5fC sequencing method, as reported recently14,15, TAPS converts modified cytosine into DHU, a near natural base, which can be ‘read’ as T by common polymerases and is potentially compatible with PCR-free DNA sequencing. TAPS is compatible with a variety of downstream analyses, including, but not limited to, pyrosequenc-ing, methylation-sensitive PCR, restriction digestion, MALDI mass spectrometry, microarray and whole-genome sequencing. Since TAPS can preserve long DNA, it could be extremely valuable when combined with long-read sequencing technologies, such as SMRT sequencing29 and nanopore sequencing30, to investigate certain diffi-cult to map regions. It is also possible to combine pull-down methods with TAPS to further reduce the sequencing cost and add base-level resolution information to the low-resolution affinity-based maps31. We demonstrated that TAPS could directly replace WGBS in routine use while reducing cost, complexity and time required for analysis. This could potentially lead to wider adoption of epigenetic analy-ses in academic research and clinical diagnostics. Future studies will focus on demonstrating TAPSβ and CAPS to distinguish 5mC and 5hmC in genome-wide applications32, as well as optimizing TAPS for sensitive low-input samples, such as circulating cell-free DNA12 and single-cell analysis33,34.

online contentAny methods, additional references, Nature Research report-ing summaries, source data, statements of data availability, and

associated accession codes are available at https://doi.org/10.1038/s41587-019-0041-2.

Received: 21 May 2018; Accepted: 22 January 2019; Published: xx xx xxxx

references 1. Lister, R. et al. Global epigenomic reconfiguration during mammalian brain

development. Science 341, 1237905 (2013). 2. Raiber, E.-A., Hardisty, R., van Delft, P. & Balasubramanian, S. Mapping

and elucidating the function of modified bases in DNA. Nat. Rev. Chem. 1, 0069 (2017).

3. Yu, M. et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149, 1368–1380 (2012).

4. Booth, M. J. et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336, 934–937 (2012).

5. Tanaka, K. & Okamoto, A. Degradation of DNA by bisulfite treatment. Bioorg. Med. Chem. Lett. 17, 1912–1915 (2007).

6. Olova, N. et al. Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data. Genome Biol. 19, 33 (2018).

7. Li, E. & Zhang, Y. DNA methylation in mammals. Cold Spring Harb. Perspect. Biol. 6, a019133 (2014).

8. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009).

9. Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300–1303 (2011).

10. Gal-Yam, E. N., Saito, Y., Egger, G. & Jones, P. A. Cancer epigenetics: modifications, screening, and therapy. Annu. Rev. Med. 59, 267–280 (2008).

11. Vasanthakumar, A. & Godley, L. A. 5-hydroxymethylcytosine in cancer: significance in diagnosis and therapy. Cancer Genet. 208, 167–177 (2015).

12. Chan, K. C. et al. Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc. Natl Acad. Sci. USA 110, 18761–18768 (2013).

5,331,989 741,648

TA

PS

WG

BS

32,428,076

0.00

0.25

0.50

0.75

1.00

1.00

TAPS

8 × 105

6 × 105

4 × 105

2 × 105

2 × 106

×107

Mod

ifica

tion

(%)

chr4

WGBSTAPS

–4 kb +4 kbCGI7

8

9

10

11

12

13

14

15a

c

b

e

d

Ave

rage

cov

erag

e

WGBSampled)TAPS (down-sa

TA

PS

WG

BS

29,075,696

4,566,155 1,156,610

0 0.25 0.50 0.75

WG

BS

0 2 4 6 8 10 12 14

100

80

60

40

20

0

Fig. 4 | Comparison of genome-wide methylome measurements by TaPS and WGBS. a, Average sequencing coverage depth of all mouse CpG islands (CGI, binned into 20 windows) and 4 kb flanking regions (binned into 50 equally sized windows). To account for differences in sequencing depth, all mapped TAPS reads were down-sampled to match the median coverage of WGBS across the genome. b, CpG sites covered by at least three reads by TAPS alone, both TAPS and WGBS, or WGBS alone. c, Example of the chromosomal distribution of modification levels (in %) for TAPS and WGBS. Average fraction of modified CpGs per 100 kb windows along mouse chromosome 4, smoothed using a Gaussian-weighted moving average filter with window size 10. d, Number of CpG sites covered by at least three reads and modification level ≥0.1 detected by TAPS alone, TAPS and WGBS, or WGBS alone. e, Heatmap representing the number of CpG sites covered by at least three reads in both TAPS and WGBS, broken down by modification levels as measured by each method. To improve contrast, the first bin, containing CpGs unmodified in both methods, was excluded from the color scale and is denoted by a star.

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 6: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

Letters Nature BiotechNology

13. Song, C. X. et al. 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Res. 27, 1231–1242 (2017).

14. Xia, B. et al. Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale. Nat. Methods 12, 1047–1050 (2015).

15. Zhu, C. et al. Single-cell 5-formylcytosine landscapes of mammalian early embryos and ESCs at single-base resolution. Cell Stem Cell 20, 720–731 e725 (2017).

16. Lu, X., Zhao, B. S. & He, C. TET family proteins: oxidation activity, interacting molecules, and functions in diseases. Chem. Rev. 115, 2225–2239 (2015).

17. He, Y. F. et al. Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 333, 1303–1307 (2011).

18. Sato, S., Sakamoto, T., Miyazawa, E. & Kikugawa, Y. One-pot reductive amination of aldehydes and ketones with alpha-picoline-borane in methanol, in water, and in neat conditions. Tetrahedron 60, 7899–7906 (2004).

19. Liu, J. & Doetsch, P. W. Escherichia coli RNA and DNA polymerase bypass of dihydrouracil: mutagenic potential via transcription and replication. Nucleic Acids Res. 26, 1707–1712 (1998).

20. Song, C. X. et al. Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153, 678–691 (2013).

21. Lu, X. et al. Chemical modification-assisted bisulfite sequencing (CAB-Seq) for 5-carboxylcytosine detection in DNA. J. Am. Chem. Soc. 135, 9315–9317 (2013).

22. Pais, J. E. et al. Biochemical characterization of a Naegleria TET-like oxygenase and its application in single molecule sequencing of 5-methylcytosine. Proc. Natl Acad. Sci. USA 112, 4316–4321 (2015).

23. Incarnato, D., Krepelova, A. & Neri, F. High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly. Genomics 104, 121–127 (2014).

24. Holmes, E. E. et al. Performance evaluation of kits for bisulfite-conversion of DNA from tissues, cell lines, FFPE tissues, aspirates, lavages, effusions, plasma, serum, and urine. PLoS ONE. 9, e93933 (2014).

25. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

26. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

27. Illumina. Illumina Whole-genome Bisulfite Sequencing on the HiSeq 3000/HiSeq 4000 Systems https://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/hiseq3000-hiseq4000-wgbs-application-note-770-2015-052.pdf (Illumina, 2016).

28. Wen, L. et al. Whole-genome analysis of 5-hydroxymethylcytosine and 5-methylcytosine at base resolution in the human brain. Genome Biol. 15, R49 (2014).00

29. Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).

30. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).

31. Song, C. X. et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat. Biotechnol. 29, 68–72 (2011).

32. Schutsky, E. K. et al. Nondestructive, base-resolution sequencing of 5-hydroxymethylcytosine using a DNA deaminase. Nat. Biotechnol. 36, 1083–1090 (2018).

33. Smallwood, S. A. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat. Methods 11, 817–820 (2014).

34. Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science 357, 600–604 (2017).

acknowledgementsWe would like to acknowledge P. Spingardi, G. Berridge and B. Kessler for helping with the HPLC–MS/MS; P. Brennan and G.F. Ruda for helping with the NMR; T. Brown and A. H. El-Sagheer for the DNA synthesis; F. Howe, S. Kriaucionis and C. Goding for critical reading of this manuscript. This work was supported by the Ludwig Institute for Cancer Research. The C.-X.S. laboratory is also supported by Cancer Research UK (grant nos. C63763/A26394 and C63763/A27122), NIHR Oxford Biomedical Research Centre and Conrad N. Hilton Foundation. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. F.Y., L.C. and Y.B. are supported by China Scholarship Council.

author contributionsY.L. and C.-X.S. conceived the study and designed the experiments. Y.L. and P.S.-Z. performed the experiments with help from Y.B., F.Y., C.B. and L.C. G.V. and B.S.-B. developed the processing software. G.V., M.T. and B.S.-B. analyzed data. Y.L., B.S.-B. and C.-X.S. wrote the manuscript.

Competing interestsA patent application has been filed by Ludwig Institute for Cancer Research Ltd for the technology disclosed in this publication.

additional informationSupplementary information is available for this paper at https://doi.org/10.1038/s41587-019-0041-2.

Reprints and permissions information is available at www.nature.com/reprints.

Correspondence and requests for materials should be addressed to B.S. or C.-X.S.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© The Author(s), under exclusive licence to Springer Nature America, Inc. 2019

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 7: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

LettersNature BiotechNology

MethodsPreparation of model DNA. Regular DNA oligonucleotides (and PCR primers) were purchased from IDT. The DNA oligonucleotide with 5caC modification was synthesized using Expedite 8900 Nucleic Acid Synthesis System (Thermo Fisher). The DNA oligonucleotide with DHU modification was synthesized using ABI model 394 DNA/RNA synthesizer (Thermo Fisher). Detailed protocols and sequences for all model DNAs are provided in Supplementary Note 2.

Preparation of spike-in DNAs used in sequencing. Fully CpG-methylated λ-DNA was prepared by methylation of unmethylated λ-DNA (Promega) with M.SssI enzyme (NEB) as described previously35. A 2 kb unmodified amplicon was PCR amplified from the pNIC28-Bsa4 plasmid (Addgene, catalog no. 26103). Synthetic oligonucleotide with N5mCNN and N5hmCNN sequences was produced by an annealing and extension method. Detailed protocols and sequences for all spike-in sequences used in sequencing can be found in Supplementary Note 3.

DNA digestion and HPLC–MS/MS analysis. DNA samples were digested with 2 U of Nuclease P1 (Sigma-Aldrich) and 10 nM deaminase inhibitor erythro-9-amino-β-hexyl-α-methyl-9H-purine-9-ethanol hydrochloride (Sigma-Aldrich). After overnight incubation at 37 °C, the samples were further treated with 6 U of alkaline phosphatase (Sigma-Aldrich) and 0.5 U of phosphodiesterase I (Sigma-Aldrich) for 3 h at 37 °C. The digested DNA solution was filtered with Amicon Ultra-0.5 ml 10K centrifugal filters (Merck Millipore) to remove the proteins and subjected to HPLC–MS/MS analysis.

The HPLC–MS/MS analysis was carried out with 1290 Infinity LC Systems (Agilent) coupled with a 6495B Triple Quadrupole Mass Spectrometer (Agilent). A ZORBAX Eclipse Plus C18 column (2.1 × 150 mm2, 1.8-μm, Agilent) was used. The column temperature was maintained at 40 °C, and the solvent system was water containing 10 mM ammonium acetate (pH 6.0, solvent A) and water–acetonitrile (60/40, v/v, solvent B) with 0.4 ml min−1 flow rate. The gradient was: 0–5 min, 0% solvent B; 5–8 min, 0–5.63% solvent B; 8–9 min, 5.63% solvent B; 9–16 min, 5.63–13.66% solvent B; 16–17 min, 13.66–100% solvent B; 17–21 min, 100% solvent B; 21–24.3 min, 100-0% solvent B; 24.3-25 min, 0% solvent B. The dynamic multiple reaction monitoring mode of the mass spectrometer was used for quantification. The source-dependent parameters were as follows: gas temperature 230 °C, gas flow 14 l min−1, nebulizer 40 p.s.i., sheath gas temperature 400 °C, sheath gas flow 11 l min−1, capillary voltage 1,500 V in the positive ion mode, nozzle voltage 0 V, high-pressure RF 110 V and low pressure RF 80 V, both in the positive ion mode. The fragmentor voltage was 380 V for all compounds, while other compound-dependent parameters are summarized in Supplementary Table 4.

Expression and purification of NgTET1. The pRSET-A plasmid encoding His-tagged NgTET1 protein (GG739552.1) was designed and purchased from Invitrogen. Protein was expressed in Escherischia coli BL21 (DE3) bacteria and purified as previously described with some modifications22. Briefly, for protein expression, bacteria from overnight small-scale culture were grown in LB medium at 37 °C and 200 r.p.m. until the absorbance A600 was between 0.7 and 0.8. Then cultures were cooled down to room temperature and target protein expression was induced with 0.2 mM isopropyl-β-d-1-thiogalactopyranoside. Cells were maintained for an additional 18 h at 18 °C and 180 r.p.m. Subsequently, cells were harvested and re-suspended in the buffer containing 20 mM HEPES (pH 7.5), 500 mM NaCl, 1 mM dithiothreitol, 20 mM imidazole, 1 µg ml−1 leupeptin, 1 µg ml−1 pepstatin A and 1 mM PMSF. Cells were broken with EmulsiFlex-C5 high-pressure homogenizer and lysate was clarified by centrifugation for 1 h at 30,000g and 4 °C. Collected supernatant was loaded on Ni-NTA resins and NgTET1 protein was eluted with buffer containing 20 mM HEPES (pH 7.5), 500 mM imidazole, 2 M NaCl, 1 mM dithiothreitol. Collected fractions were then purified on HiLoad 16/60 Sdx 75 (20 mM HEPES pH 7.5, 2 M NaCl, 1 mM dithiothreitol). Fractions containing NgTET1 were then collected, buffer exchanged to the buffer containing 20 mM HEPES (pH 7.0), 10 mM NaCl, 1 mM dithiothreitol and loaded on HiTrap HP SP column. Pure protein was eluted with the salt gradient, collected and buffer exchanged to the final buffer containing 20 mM Tris–Cl (pH 8.0), 150 mM NaCl and 1 mM dithiothreitol. Protein was then concentrated up to 130 µM, mixed with glycerol (30% v/v) and aliquots were stored at −80 °C.

Expression and purification of mTet1CD. The mTet1CD catalytic domain (NM_001253857.2, 4371-6392) with N-terminal Flag-tag was cloned into pcDNA3-Flag between the KpnI and BamH1 restriction sites. The sequence of the plasmid is provided in Supplementary Note 4. For protein expression, 1 mg plasmid was transfected into 1 l of Expi293F cell culture at density 1 × 106 cells ml−1 and cells were grown for 48 h at 37 °C, 170 r.p.m. and 5% CO2. Subsequently, cells were harvested by centrifugation, re-suspended in the lysis buffer containing 50 mM Tris–Cl pH 7.5, 500 mM NaCl, 1× cOmplete Protease Inhibitor Cocktail, 1 mM PMSF, 1% Triton X-100 and incubated on ice for 20 min. Cell lysate was then clarified by centrifugation for 30 min at 30,000g and 4 °C. Collected supernatant was purified on ANTI-FLAG M2 Affinity Gel and pure protein was eluted with buffer containing 20 mM HEPES pH 8.0, 150 mM NaCl, 0.1 mg ml−1 3× Flag peptide, 1× cOmplete Protease Inhibitor Cocktail, 1 mM PMSF. Collected fractions were concentrated and buffer exchanged to the final buffer containing 20 mM

HEPES pH 8.0, 150 mM NaCl and 1 mM dithiothreitol. Concentrated protein was mixed with glycerol (30% v/v), frozen in liquid nitrogen and aliquots were stored at −80 °C. The detailed protocol is described in Supplementary Note 5.

NgTET1 oxidation. Genomic DNA (500 ng) was incubated in 50 µl solution containing 50 mM MOPS buffer (pH 6.9), 100 µM ammonium iron(II) sulfate, 1 mM α-ketoglutarate, 2 mM ascorbic acid, 1 mM dithiothreitol, 50 mM NaCl and 5 μM NgTET1 for 1 h at 37 °C. After that, 4 U of Proteinase K (New England Biolabs) were added to the reaction mixture and incubated for 30 min at 37 °C. The product was cleaned up on 1.8× Ampure XP beads following the manufacturer’s instruction.

mTet1 oxidation. Genomic DNA (100 ng) was incubated in a 50 µl reaction containing 50 mM HEPES buffer (pH 8.0), 100 µM ammonium iron(II) sulfate, 1 mM α-ketoglutarate, 2 mM ascorbic acid, 1 mM dithiothreitol, 100 mM NaCl, 1.2 mM ATP and 4 μM mTet1CD for 80 min at 37 °C. After that, 0.8 U of Proteinase K (New England Biolabs) were added to the reaction mixture and incubated for 1 h at 50 °C. The product was cleaned up on Bio-Spin P-30 Gel Column (Bio-Rad) and 1.8× Ampure XP beads following the manufacturer’s instruction.

Pic-borane reduction. 2-Picoline-borane (100 mg) (pic-borane, Sigma-Aldrich) was dissolved in 187 µl of DMSO to give around a 3.26 M solution. For each reaction, 25 µl of pic-borane solution and 5 µl of 3 M sodium acetate solution (pH 5.2, Thermo Fisher) were added into 20 µl of DNA sample and incubated for 3 h at 70 °C and 850 r.p.m. in an Eppendorf ThermoMixer. The product was purified by Zymo-Spin column for genomic DNA or by Micro Bio-Spin 6 Columns (Bio-Rad) for DNA oligonucleotides following the manufacturer’s instructions.

Pyridine borane reduction. Oxidized DNA (50–100 ng) in 35 µl of water was reduced in a 50 µl reaction containing 600 mM sodium acetate solution (pH 4.3) and 1 M pyridine borane for 16 h at 37 °C and 850 r.p.m. in an Eppendorf ThermoMixer. The product was purified using Zymo-Spin columns.

5hmC blocking. 5hmC blocking was performed in a 20 µl solution containing 50 mM HEPES buffer (pH 8), 25 mM MgCl2, 200 µM UDP-Glc (New England Biolabs) and 10 U of β-glucosyltransferase (Thermo Fisher), and 10 μM 5hmC DNA oligonucleotide for 1 h at 37 °C. The product was purified using Micro Bio-Spin 6 Columns following the manufacturer’s instructions.

5hmC oxidation. 5hmC DNA oligonucleotide (46 μl) was denatured with 2.5 μl of 1 M NaOH for 30 min at 37 °C in a shaking incubator, then oxidized with 1.5 μl of solution containing 50 mM NaOH and 15 mM potassium perruthenate (KRuO4, Sigma-Aldrich) for 1 h on ice. The product was purified using Micro Bio-Spin 6 Columns following the manufacturer’s instructions.

5fC blocking. 5fC blocking was performed in 100 mM MES buffer (pH 5.0), 10 mM O-ethylhydroxylamine (Sigma-Aldrich), and 10 μM 5fC DNA oligonucleotide for 2 h at 37 °C. The product was purified using Micro Bio-Spin 6 Columns following the manufacturer’s instructions.

5caC blocking. 5caC blocking was performed in 75 mM MES buffer (pH 5.0), 20 mM N-hydroxysuccinimide (Sigma-Aldrich), 20 mM 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC, Fluorochem), and 10 μM 5caC DNA oligonucleotide for 0.5 h at 37 °C. The buffer was then exchanged to 100 mM sodium phosphate (pH 7.5), 150 mM NaCl using Micro Bio-Spin 6 Columns (Bio-Rad) following the manufacturer’s instructions. Ethylamine (10 mM) (Sigma-Aldrich) was added to the oligonucleotide and incubated for 1 h at 37 °C. The product was purified using Micro Bio-Spin 6 Columns following the manufacturer’s instructions.

Validation of TAPS conversion with TaqαI assay. The converted DNA sample was PCR amplified by Taq DNA Polymerase (New England Biolabs) with the corresponding primers. The PCR product was incubated with 4 units of TaqαI restriction enzyme (New England Biolabs) in 1× CutSmart buffer (New England Biolabs) for 30 min at 65 °C and visualized by 2% agarose gel electrophoresis. A detailed protocol is described in Supplementary Note 6.

Quantitative polymerase chain reaction (qPCR) assay. The required amount of DNA sample was added into 19 μl of PCR master mix containing 1 × Fast SYBRGreen Master Mix (Thermo Fisher), 200 nM of forward and reverse primers. For PCR amplification, an initial denaturation step was performed for 20 s at 95 °C, followed by 40 cycles of 3 s denaturation at 95 °C, 20 s annealing and elongation at 60 °C.

Sanger sequencing. The PCR product was purified by either Exonuclease I (New England Biolabs) and Shrimp Alkaline Phosphatase (New England Biolabs) or Zymo-Spin columns and then processed for Sanger sequencing.

DNA damage test on fragments with different length. mESC genomic DNA was spiked with 0.5% of CpG-methylated λ-DNA and left unfragmented or

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 8: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

Letters Nature BiotechNology

sonicated with a Covaris M220 instrument and size-selected to 500–1 kb or 1–3 kb using Ampure XP beads. DNA (200 ng) was single-oxidized with mTet1CD and reduced with pyridine borane complex as described above or converted with EpiTect Bisulfite Kit (Qiagen) according to manufacturer’s protocol. DNA (10 ng) before and after TAPS and bisulfite conversion was run on a 1% agarose gel. To visualize bisulfite conversion, gel was cooled down for 10 min in an ice bath. 5mC conversion in TAPS samples was tested by TaqαI digestion assay as described in Supplementary Note 6.

mESCs culture and isolation of genomic DNA. E14 mESCs (gift from S. Kriaucionis) were cultured on gelatin-coated plates in DMEM (Invitrogen) supplemented with 15% FBS (Gibco), 2 mM l-glutamine (Gibco), 1% non-essential amino acids (Gibco), 1% penicillin/streptavidin (Gibco), 0.1 mM β-mercaptoethanol (Sigma), 1,000 units ml−1 leukemia inhibitory factor (Millipore), 1 µM PD0325901 (Stemgent) and 3 µM CHIR99021 (Stemgent). Cultures were maintained at 37 °C and 5% CO2 and passaged every 2 d.

For isolation of genomic DNA, cells were harvested by centrifugation for 5 min at 1,000g and room temperature. DNA was extracted with Quick-DNA Plus kit (Zymo Research) according to the manufacturer’s protocol.

Preparation of mESC genomic DNA for TAPS and WGBS. For WGBS, mESC genomic DNA was spiked with 0.5% of unmethylated λ-DNA. For whole-genome TAPS, mESC genomic DNA was spiked with 0.5% of methylated λ-DNA and 0.025% of unmodified 2 kb spike-in control. DNA samples were fragmented by Covaris M220 instrument and size-selected to 200–400 bp using Ampure XP beads. DNA for TAPS was additionally spiked with 0.25% of N5mCNN and N5hmCNN control oligonucleotides after size-selection with Ampure XP beads.

WGBS. For WGBS, 200 ng of fragmented mESC genomic DNA with 0.5% of unmethylated λ-DNA were used. End-repair and A-tailing reaction and ligation of methylated adapter (NextFlex) were prepared with KAPA HyperPlus kit (Kapa Biosystems) according to manufacturer’s protocol. Subsequently, DNA underwent bisulfite conversion with EpiTect Bisulfite Kit (Qiagen) according to Illumina’s protocol36. The final library was amplified with KAPA Hifi Uracil Plus Polymerase (Kapa Biosystems) for 6 cycles and cleaned up on 1× Ampure XP beads. The WGBS sequencing library was paired-end 80 bp sequenced on a NextSeq 500 sequencer (Illumina) using one NextSeq High Output kit with 15% PhiX control library spike-in.

Whole-genome TAPS. For whole-genome TAPS, 100 ng of fragmented mESC genomic DNA spiked with 0.5% of methylated λ-DNA and 0.025% of unmodified 2 kb DNA control were used. End-repair and A-tailing reaction and ligation of Illumina Multiplexing adapters were prepared with KAPA HyperPlus kit according to the manufacturer’s protocol. Ligated DNA was oxidized with mTet1CD twice and then reduced with pyridine borane according to the protocols described above. The final sequencing library was amplified with KAPA Hifi Uracil Plus Polymerase for 5 cycles and cleaned up on 1× Ampure XP beads. The whole-genome TAPS sequencing library was paired-end 80 bp sequenced on a NextSeq 500 sequencer (Illumina) using one NextSeq High Output kit with 1% PhiX control library spike-in. A more detailed step-by-step protocol is provided in Supplementary Note 7.

Low-input whole-genome TAPS with double-stranded DNA library preparation kits. mESC genomic DNA prepared as described above for whole-genome TAPS was used for low-input whole-genome TAPS. Briefly, samples containing 100 ng, 10 ng and 1 ng of mESC genomic DNA were used for an end-repair and A-tailing reaction and ligated to Illumina Multiplexing adapters with KAPA HyperPlus kit according to the manufacturer’s protocol. Ligated samples were then oxidized with mTet1CD once and then reduced with pyridine borane according to the protocols described above. Converted libraries were amplified with KAPA Hifi Uracil Plus Polymerase for 5 cycles (100 ng), 8 cycles (10 ng) and 13 cycles (1 ng) and cleaned up on 1× Ampure XP beads.

Cell-free DNA TAPS. Cell-free DNA TAPS samples were prepared from 10 ng and 1 ng of cell-free DNA sample as described above for whole-genome TAPS. Briefly, cell-free DNA samples were used for an end-repair and A-tailing reaction and ligated to Illumina Multiplexing adapters with KAPA HyperPlus kit according to the manufacturer’s protocol. Ligated samples were then oxidized with mTet1CD once and then reduced with pyridine borane according to the protocols described above. Converted libraries were amplified with KAPA Hifi Uracil Plus Polymerase for 7 cycles (10 ng), 13 cycles (1 ng) and cleaned up on 1× Ampure XP beads.

WGBS data processing. Paired-end reads were downloaded as FASTQ files from Illumina BaseSpace and subsequently quality trimmed with Trim Galore! v0.4.4 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Read pairs where at least one read was shorter than 35 bp after trimming were removed. Trimmed reads were mapped to a genome combining the mm9 version of the mouse genome, λ-phage and PhiX (sequence from Illumina iGENOMES) using Bismark v0.19 and the --no_overlap option25. The ‘three-C’ filter was used to remove reads with excessive non-conversion rates. PCR duplicates were called

using Picard v1.119 MarkDuplicates (http://broadinstitute.github.io/picard/). Regions known to be prone to mapping artifacts were downloaded (https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded from further analysis37.

TAPS data pre-processing. Paired-end reads were downloaded from Illumina BaseSpace and subsequently quality trimmed with Trim Galore! v0.4.4. Read pairs where at least one read was shorter than 35 bp after trimming were removed. Trimmed reads were mapped to a genome combining spike-in sequences, PhiX (sequence from Illumina iGENOMES) and the mm9 version of the mouse genome using BWA mem v.0.7.12 (ref. 26) with default parameters. Regions known to be prone to mapping artifacts were downloaded (https://sites.google.com/site/anshulkundaje/projects/blacklists) and excluded from further analysis37.

Detection of converted bases in TAPS. Aligned reads were split into original top and original bottom strands using a custom python3 script (MF-filter.py). PCR duplicates were then removed with Picard MarkDuplicates on the original top and original bottom strands separately. Overlapping segments in read pairs were removed using BamUtil clipOverlap (https://github.com/statgen/bamUtil) on the deduplicated, mapped original top and original bottom reads separately. Modified bases were then detected using only uniquely mapped reads (MAPQ > 0) using samtools mpileup and a custom python3 script (MF-caller_MOD.py).

Sequencing quality analysis of TAPS and WGBS. Quality score statistics per nucleotide type were extracted and visualized from original FASTQ files as downloaded from Illumina BaseSpace with a python3 script (MF-phredder.py).

Coverage analysis of TAPS and WGBS. Per-base genome coverage files were generated with Bedtools v2.25 genomecov38. To compare the relative coverage distributions between TAPS and WGBS, TAPS reads were subsampled to the corresponding coverage median in WGBS using the –s option of samtools view. In the analyses comparing coverage in WGBS and subsampled TAPS, clipOverlap was used on both TAPS and WGBS bam files.

Analysis of cytosine modifications measured by TAPS and WGBS. The fraction of modified reads per base was calculated using Bismark and MF-caller_MOD.py, respectively. Genetic variants at CpG sites specific to the E14 cell line were excluded from the analysis23. Bedtools intersect, statistical analyses and figures were generated in python3, R and Matlab. Genomic regions were visualized using IGV v2.4.639. To plot the coverage and modification levels around CGIs, all CGI coordinates for mm9 were downloaded from the UCSC genome browser, binned into 20 windows, and extended by up to 50 windows of size 80 bp on both sides (as long as they did not reach half the distance to the next CGI). Average modification levels (in CpGs) and coverage (in all bases, both strands) in each bin were computed using bedtools map. The values for each bin were again averaged and subsequently plotted in Matlab.

Data processing time simulation. Synthetic pair-end sequencing reads were simulated using ART40 on the basis of the λ-phage genome (with parameters -p -ss NS50 --errfree --minQ 15 -k 0 -nf 0 -l 75 -c 1000000 -m 240 -s 0 -ir 0 -ir2 0 -dr 0 -dr2 0 -sam -rs 10). A total of 50% of all CpG positions were subsequently marked as modified and two libraries were produced, either as TAPS (convert modified bases) or as WGBS (convert unmodified bases), using a custom python3 script. The reads were then processed following the pipeline used for each of the methods in the paper. Processing time was measured with Linux command time. All steps of the analysis were performed in single-threaded mode on one Intel Xeon CPU with 250 GB of memory.

Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Code availabilityThe software used to process TAPS data can be downloaded from https://bitbucket.org/bsblabludwig/astair.

Data availabilityAll sequencing data are available through GEO Series accession code GSE112520. All relevant additional data have been published with the manuscript, either as part of the main text or in the Supplementary Information.

references 35. Wu, H., Wu, X. & Zhang, Y. Base-resolution profiling of active DNA

demethylation using MAB-seq and caMAB-seq. Nat. Protoc. 11, 1081–1100 (2016).

36. Illumina. Whole-genome Bisulfite Sequencing for Methylation Analysis https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_legacy/WGBS_for_Methylation_Analysis_Guide_15021861_B.pdf (Illumina, 2015).

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 9: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

LettersNature BiotechNology

37. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

38. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

39. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

40. Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

NaTure BioTeChNoLoGY | www.nature.com/naturebiotechnology

Page 10: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

1

nature research | reporting summ

aryApril 2018

Corresponding author(s): Benjamin Schuster-Boeckler, Chun-Xiao Song

Reporting SummaryNature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistical parametersWhen statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main text, or Methods section).

n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement

An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly

The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested

A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons

A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings

For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes

Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars State explicitly what error bars represent (e.g. SD, SE, CI)

Our web collection on statistics for biologists may be useful.

Software and codePolicy information about availability of computer code

Data collection Illumina BaseSpace

Data analysis Paired-end reads were download as FASTQ from Illumina BaseSpace and subsequently quality-trimmed with Trim Galore! v0.4.4 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Statistical analyses were performed in R and Matlab. The details are described in the Methods section. Custom software used to process TAPS data is available from https://bitbucket.org/bsblabludwig/astair

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Page 11: B-fr 5-ylcyt 5-ydroxymethylcyt esolutionstatic.tongtianta.site/paper_pdf/b2d79d30-b836-11e... · Paulina Siejka-Zielińska *e-mail: benjamin.schuster-boeckler@ludwig.ox.ac.uk; chunxiao.song@ludwig.ox.ac.uk

2

nature research | reporting summ

aryApril 2018

DataPolicy information about availability of data

All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability

All sequencing data are available through GEO Series accession number GSE112520 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112520, enter token cbkhokaqhraxpqx).

Field-specific reportingPlease select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences

For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences study designAll studies must disclose on these points even when the disclosure is negative.

Sample size No sample-size calculation was performed

Data exclusions No data were excluded from the analyses

Replication Yes, as described in figure legends and Methods.

Randomization No randomization was used for samples

Blinding Investigators were not blinded to sample allocation during data collection

Reporting for specific materials, systems and methods

Materials & experimental systemsn/a Involved in the study

Unique biological materials

Antibodies

Eukaryotic cell lines

Palaeontology

Animals and other organisms

Human research participants

Methodsn/a Involved in the study

ChIP-seq

Flow cytometry

MRI-based neuroimaging

Eukaryotic cell linesPolicy information about cell lines

Cell line source(s) Mouse embryonic stem cells E14 were gifted from Professor Skirmantas Kriaucionis. Expi293F™ cells were gifted from Dr Kilian Huber.

Authentication Mouse embryonic stem cells E14 and Expi293F™ cells were not authenticated.

Mycoplasma contamination The mouse embryonic stem cells were negative for mycoplasma test. Expi293F™ cells were not tested for mycoplasma contamination.

Commonly misidentified lines(See ICLAC register)

No commonly misidentified cell lines were used.