Proceedings of the National Academy of Sciences of the United States of America | 2021

Genome-wide detection of cytosine methylation by single molecule real-time sequencing

 
 
 
 
 
 
 
 
 
 
 
 

Abstract


Significance Single molecule real-time (SMRT) sequencing theoretically offers the opportunity to directly assess certain base modifications of native DNA molecules without any prior chemical/enzymatic conversions and PCR amplification, using kinetic signals of a DNA polymerase. However, the kinetic signal changes caused by 5mC modification are extremely subtle. Hence, the robust genome-wide measurement of 5mC modification has not been achieved. We enhanced 5mC detection using SMRT sequencing by holistically analyzing kinetic signals of a DNA polymerase and sequence context for every base within a measurement window. We employed a convolutional neural network to train a methylation classification model, leading to genome-wide 5mC detection. The sensitivity and specificity reached 90% and 94%, with a 99% correlation of overall methylation level with bisulfite sequencing. 5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human–mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq (r = 0.99; P < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses.

Volume 118
Pages None
DOI 10.1073/pnas.2019768118
Language English
Journal Proceedings of the National Academy of Sciences of the United States of America

Full Text