[PDF] A Stochastic Model for the Formation of Spatial Methylation Patterns

Abstract

DNA methylation is an epigenetic mechanism whose important role in development has been widely recognized. This epigenetic modification results in heritable changes in gene expression not encoded by the DNA sequence. The underlying mechanisms controlling DNA methylation are only partly understood and recently different mechanistic models of enzyme activities responsible for DNA methylation have been proposed. Here we extend existing Hidden Markov Models (HMMs) for DNA methylation by describing the occurrence of spatial methylation patterns over time and propose several models with different neighborhood dependencies. We perform numerical analysis of the HMMs applied to bisulfite sequencing measurements and accurately predict wild-type data. In addition, we find evidence that the enzymes' activities depend on the left 5' neighborhood but not on the right 3' neighborhood.

Full PDF

AA Stochastic Model for the Formation of SpatialMethylation Patterns

Alexander L¨uck , Pascal Giehr , J¨orn Walter , and Verena Wolf Department of Computer Science, Saarland University, Saarbr¨ucken, Germany Department of Biological Sciences, Saarland University, Saarbr¨ucken, Germany

Abstract.

DNA methylation is an epigenetic mechanism whose impor-tant role in development has been widely recognized. This epigeneticmodiﬁcation results in heritable changes in gene expression not encodedby the DNA sequence. The underlying mechanisms controlling DNAmethylation are only partly understood and recently diﬀerent mecha-nistic models of enzyme activities responsible for DNA methylation havebeen proposed. Here we extend existing Hidden Markov Models (HMMs)for DNA methylation by describing the occurrence of spatial methylationpatterns over time and propose several models with diﬀerent neighbor-hood dependencies. We perform numerical analysis of the HMMs appliedto bisulﬁte sequencing measurements and accurately predict wild-typedata. In addition, we ﬁnd evidence that the enzymes’ activities dependon the left 5’ neighborhood but not on the right 3’ neighborhood.

Keywords:

DNA Methylation, Hidden Markov Model, Spatial Stochas-tic Model

The DNA code of an organism determines its appearance and behavior by encod-ing protein sequences. In addition, there is a multitude of additional mechanismsto control and regulate the ways in which the DNA is packed and processed inthe cell and thus determine the fate of a cell. One of these mechanisms in cells isDNA methylation, which is an epigenetic modiﬁcation that occurs at the cyto-sine (C) bases of eukaryotic DNA. Cytosines are converted to 5-methylcytosine(5mC) by DNA methyltransferase (Dnmt) enzymes. The neighboring nucleotideof a methylated cytosine is usually guanine (G) and together with the GC-pairon the opposite strand, a common pattern is that two methylated cytosines arelocated diagonally to each other on opposing DNA strands. DNA methylationat CpG dinucleotides is known to control and mediate gene expression and istherefore essential for cell diﬀerentiation and embryonic development. In humansomatic cells, approximately 70-80% of the cytosine nucleotides in CpG dyadsare methylated on both strands and methylation near gene promoters varies con-siderably depending on the cell type. Methylation of promoters often correlateswith low or no transcription [20] and can be used as a predictor of gene expression[12]. Also signiﬁcant diﬀerences in overall and speciﬁc methylation levels exist a r X i v : . [ q - b i o . GN ] J u l etween diﬀerent tissue types and between normal cells and cancer cells fromthe same tissue. However, the exact mechanism which leads to a methylationof a speciﬁc CpG and the formation of distinct methylation patterns at certaingenomic regions is still not fully understood. Recently proposed measurementtechniques based on hairpin bisulﬁte sequencing (BS-seq) allow to determine onboth DNA strands the level of 5mC at individual CpGs dyads [15]. Based on asmall hidden Markov model, the probabilities of the diﬀerent states of a CpG canbe accurately estimated (assuming that enough samples per CpG are provided)[1,13].Mechanistic models for the activity of the diﬀerent Dnmts usually distin-guish de novo activities, i.e., adding methyl groups at cytosines independent ofthe methylation state of the opposite strand, and maintenance activities, whichrefers to the copying of methylation from an existing DNA strand to its newlysynthesized partner (containing no methylation) after replication [10,17]. Hence,maintenance methylation is responsible for re-establishment of the same DNAmethylation pattern before and after cell replication. A common hypothesis isthat the copying of DNA methylation patterns after replication is performedby Dnmt1, an enzyme that shows a preference for hemimethylated CpG sites(only one strand is methylated) as they appear after DNA replication. More-over, studies have shown that Dnmt1 is highly processive and able to methylatelong sequences of hemimethylated CpGs without dissociation from the targetDNA strand [10]. However, an exact transmission of the methylation informa-tion to the next cellular generation is not guaranteed. The enzymes Dnmt3a andDnmt3b show equal activities on hemi- and unmethylated DNA and are mainlyresponsible for de novo methylation, i.e., methylation without any speciﬁc pref-erence for the current state of the CpG (hemi- or unmethylated) [17]. However,by now evidence exists that the activity of the diﬀerent enzymes is not thatexclusive, i.e., Dnmt1 shows to a certain degree also de novo and Dnmt3a/bmaintenance methylation activity [2]. The way how methyltransferases interactwith the DNA and introduce CpG methylation was investigated in many in vitro studies. Basically, one can distinguish between two mechanisms. A distributiveone, where the enzyme periodically binds and dissociates from the DNA, leapingmore or less randomly from one CpG to another and a processive one in which theenzyme migrates along the DNA without detachment from the DNA [9,11,16], asillustrated in Fig. 1. Note that for Dnmt1, for instance, it is reasonable to assumethat it is processive in 5’ to 3’ direction since it is linked to the DNA replicationmachinery. In particular for the Dnmt3’s diﬀerent hypotheses about the pro-cessivity and neighborhood dependence exist [3,5], but the detailed mechanismsremain elusive.Several models that describe the dynamics of the formation of methylationpatterns have been proposed. In the seminal paper of Otto and Walbot, a dy-namical model was proposed that assumed independent methylation events fora single CpG. The main idea was to track the frequencies of fully, hemi- andunmethylated CpGs during several cell generations [18]. Later, reﬁned modelsallowed to distinguish between maintenance and de novo methylation on the G CG CGGC GC GC

MM M

CGGC M M processive methylation

CG CG CGGC GC GC

MM M

CGGC M distributive methylationDnmtDnmt Dnmt Fig. 1.

Dnmts can methylate DNA in a distributive manner, “jumping” randomly fromone CpG to another or in a processive way where the enzyme starts at one CpG andslides in 5’ to 3’ direction over the DNA. parent and daughter strands [7,19]. More sophisticated extensions of the origi-nal model of Otto and Walbot models have been successfully used to predict invivo data still assuming a neighbor-independent methylation process for a singleCpG site [2,8]. However, measurements indicate that methylation events at asingle CpG may depend on the methylation state of neighboring CpGs, which isnot captured by these models.Here, we follow the dynamical HMM approach proposed in [2] where knockoutdata was used to train a model that accurately predicts wild-type methylationlevels for BS-seq data of repetitive elements from mouse embryonic stem cells. Weextend this model by describing the methylation state of several CpGs insteadof a single CpG and use similar dependency parameters as introduced in [4].More speciﬁcally, we design diﬀerent models by combining the activities of thetwo types of Dnmts and test for both, maintenance and de novo methylation thehypotheses illustrated in Fig 1. The models vary according to the order in whichthe enzymes act, whether they perform methylation in a processive manner ornot, and how much their action depends on the left/right CpG neighbor. We usethe same BS-seq data as in [2], i.e. data where Dnmt1 or Dnmt3a/b was knockedout (KO) and learn the parameters of the diﬀerent models. Then, similar as in[2], we predict the behavior of the measured wild-type (WT), in which bothtypes of enzymes are active, by designing a combined model that describes theactivity of both enzymes and compare the results to the WT data.We found that all proposed models show a similar behavior in terms of pre-diction quality such that no model can be declared as the best ﬁt. However,our results indicate that Dnmt1 works independently of the methylation state ofits neighborhood, which is in accordance to the current hypothesis that Dnmt1is linked to the replication machinery and copies the methylation state on theopposite strand. On the other hand, Dnmt3a/b shows a dependency to the leftbut no dependency to the right, which supports hypotheses of processive orcooperative behavior. ig. 2.

A lattice of length L = 4 containing all possible states 0, 1, 2 and 3, formingthe pattern 0123. Consider a sequence of L neighboring CpG dyads , which is represented as alattice of length L and width two (for the two strands). Each cytosine in thelattice can either be methylated or not, leading to four possible states at eachposition l : – State 0 : Both sites are not methylated. – State 1 : The cytosine on the upper strand is methylated, the lower one not. – State 2 : The cytosine on the lower strand is methylated, the upper one not. – State 3 : Both cytosines are methylated.A sequence of four CpGs, each of which is in one of the four possible states, isshown in Fig. 2.For a system of length L there are in total 4 L possibilities to combine the statesof individual CpGs. These combinations are called patterns in the following. Apattern is denoted by a concatenation of states, e.g. 321, 0123 or 33221.In order to represent the pattern distribution as a vector it is necessary touniquely assign a reference number to each pattern. A pattern can be perceivedas a number in the tetral system, such that converting to the decimal systemleads to a unique reference number. After the conversion an additional 1 is addedin order to start the referencing at 1 instead of 0.Examples for L = 3: 000 −→ −→

28 (= 27 + 1)333 −→

64 (= 63 + 1)This reference number then corresponds to the position of the pattern in therespective distribution vector.

We describe the state of a sequence of L CpGs by a discrete-time Markov chainwith pattern distribution π ( t ), i.e., the probability of each of the 4 L patterns The exact nucleotide distance between two neighboring dyads is not considered here,but we assume that this distance is small. For the BS-seq data that we consider, theaverage distance between two CpGs is 14 bp and the maximal distance is 46 bp. fter t cell divisions. For the initial distribution π (0), we use the distributionmeasured in the wild-type when the cells are in equilibrium. Note that otherinitial conditions gave very similar results, i.e., the choice of the initial distribu-tion does not signiﬁcantly aﬀect the results. The reason is that also the KO datais measured after a relatively high number of cell divisions where the cells arealmost in equilibrium. Transitions between patterns are triggered by diﬀerentprocesses: First due to cell division the methylation on one strand is kept as itis (e.g. the upper strand), whereas the newly synthesized strand (the new lowerstrand) does not contain any methyl group. Afterwards, methylation is addeddue to diﬀerent mechanisms. On the newly synthesized strand a site can bemethylated if the cytosine at the opposite strand is already methylated ( mainte-nance ). It is widely accepted that maintenance in form of Dnmt1 is linked to thereplication machinery and thus occurs during/directly after the synthesis of thenew strand. Furthermore, CpGs on both strands can be methylated independentof the methylation state of the opposite site ( de novo ). The transition matrix P is deﬁned by composition of matrices for cell division, maintenance and de novomethylation of each site. Depending on which daughter cell is considered after cell replication, the upper( s = 1) or lower ( s = 2) strand is the parental one after cell division. Then, thenew pattern can be obtained by applying the following state replacements: s = 1 :  −→ −→ −→ −→ s = 2 :  −→ −→ −→ −→ i , applying the transforma-tion (1) to each of the L positions leads to a new pattern with reference number j (notation: i (1) (cid:32) j ). The corresponding transition matrix D s ∈ { , } L × L hasthe form D s ( i, j ) = (cid:40) , if i (1) (cid:32) j, , else. (2) For maintenance and de novo methylation, the single site transition matrices arebuilt according to the following rules:Consider at ﬁrst the (non-boundary) site l = 2 , . . . , L − l − l + 1 respectively. The remaining sites do not change and donot aﬀect the transition. The probabilities of the diﬀerent types of transitions in ? ? ?? ? ? ?? ?? ? ? ?? ? ? ??? ??? ??? ?? ? ??? ??? ??? ?? p p p p Maintenance De Novo

Fig. 3.

Possible maintenance and de novo transitions depicted for the lower strand,where ◦ denotes an unmethylated, • a methylated site and ? a site where the methy-lation state does not matter. Note that the same transitions can occur on the upperstrand. Fig. 3 have the form p =0 . · ( ψ L + ψ R ) x, (3) p =0 . · ( ψ L + ψ R ) x + 0 . · (1 − ψ L ) , (4) p =0 . · ( ψ L + ψ R ) x + 0 . · (1 − ψ R ) , (5) p =1 − . · ( ψ L + ψ R )(1 − x ) , (6)where x = µ is the maintenance probability, x = τ is the de novo probabilityand ψ L , ψ R ∈ [0 ,

1] the dependency parameters for the left and right neighbor.A dependency value of ψ i = 1 corresponds to a total independence on the neigh-bor whereas ψ i = 0 leads to a total dependence. Hence, µ and τ can be inter-preted as the probability of maintenance and de novo methylation of a singlecytosine between two cell divisions assuming independence from neighboringCpGs. Moreover, all CpGs that are part of the considered window of the DNAhave the same value for the parameters µ , τ , ψ L , and ψ R , since in earlier ex-periments only very small diﬀerences have been found between the methylationeﬃciencies of nearby CpGs [2].In order to understand the form of the transition probabilities consider atﬁrst a case with only one neighbor. The probabilities then have the form ψx ifthe neighbor is unmethylated and 1 − ψ (1 − x ) if the neighbor is methylated.Note that both forms evaluate to x for ψ = 1, meaning that a site is methylatedwith probability x , independent of its neighbor. For ψ = 0 the probabilities be-come 0 and 1, meaning that if there is no methylated neighbor the site cannotbe methylated or will be methylated for sure if there is a methylated neighborrespectively.The probabilities for two neighbors are obtained by a linear combination of theone neighbor cases, with ψ L for the left and ψ R for the right neighbor, and anadditional weight of 0 . ◦ for unmethylated, • formethylated site) the average methylation density ρ is used to compute the tran-sition probabilities at the boundaries (depicted here for de novo):? ◦ ◦ → ? • ◦ ˜ p = (1 − ρ ) · p + ρ · p , (7)? ◦ • → ? • • ˜ p = (1 − ρ ) · p + ρ · p , (8) ◦ ◦ ? → ◦ • ? ˜ p = (1 − ρ ) · p + ρ · p , (9) • ◦ ? → • • ? ˜ p = (1 − ρ ) · p + ρ · p . (10)Note that the same considerations hold for maintenance at the boundaries if theopposite site of the boundary site is already methylated.For each position l , there are four transition matrices: two for maintenance andtwo for de novo, namely one for the upper and one for the lower strand in eachprocess. In order to construct these matrices consider the three positions l − l and l + 1, where the transition happens at position l . Only the transitionsdepicted in Fig. 3 can occur. Furthermore the transitions are unique, i.e. for agiven reference number i the new reference number j is uniquely determined.For patterns not depicted in Fig. 3 no transition can occur, i.e. the referencenumber does not change.The matrix describing a maintenance event at position l and strand s has theform M ( l ) s ( i, j ) =  , if i = j and (cid:54) ∃ j (cid:48) : i (cid:32) j (cid:48) , − p, if i = j and ∃ j (cid:48) : i (cid:32) j (cid:48) ,p, if i (cid:54) = j and i (cid:32) j, , else, (11)where the probability p is given by one of the Eqs. (3)-(10) that describes the cor-responding case and x = µ . Note that M ( l ) s depends on s and l since it describesa single transition from pattern i to pattern j , which occurs on a particularstrand and at a particular location with probability p . We deﬁne matrices T ( l ) s for de novo methylation according to the same rules except that x = τ and thepossible transitions are as in Fig. 3, right.The advantage of deﬁning the matrices position- and process-wise is that dif-ferent models can be realized by changing the order of multiplication of thesematrices.It is important to note that 5mC can be further modiﬁed by oxidation to 5-hydroxymethyl- (5hmC), 5-formyl- (5fC) and 5-carboxyl cytosine(5caC) by Tetenzymes. These modiﬁcations are involved in the removal of 5mC from the DNAand can potentially interfere with methylation events. However, our data doesnot capture these modiﬁcations and therefore we are not able to consider thesemodiﬁcations in our model. For all subsequent models it is assumed that ﬁrst of all cell division happens andmaintenance methylation only occurs on the newly synthesized strand given by s ,hereas de novo methylation happens on both strands. Given the mechanismsin Fig. 1, the two diﬀerent kinds of methylation events, and the two types ofenzymes, there are several possibilities to combine the transition matrices. Weconsider the following four models, which we found most reasonable based onthe current state of research in DNA methylation:1. ﬁrst processive maintenance and then processive de novo methylation P s = L (cid:89) l =1 M ( l ) s L (cid:89) l =1 T ( l )1 L (cid:89) l =1 T ( l )2 , (12)2. ﬁrst processive maintenance and then de novo in arbitrary order P s = 1( L !) L (cid:89) l =1 M ( l ) s (cid:32) (cid:88) σ ∈ S L L (cid:89) l =1 T ( σ ( l ))1 (cid:33) (cid:32) (cid:88) σ ∈ S L L (cid:89) l =1 T ( σ ( l ))2 (cid:33) , (13)3. maintenance and de novo at one position, processive P s = L (cid:89) l =1 M ( l ) s T ( l )1 T ( l )2 , (14)4. maintenance and de novo at one position, arbitrary order P s = 1 L ! (cid:88) σ ∈ S L L (cid:89) l =1 M ( σ ( l )) s T ( σ ( l ))1 T ( σ ( l ))2 , (15)where S L is the set of all possible permutations for the numbers 1 , . . . , L .Note that the de novo events on both strands are independent, i.e. the de novoevents on the upper strand do not inﬂuence the de novo events on the lowerstrand and vice versa, such that [ T ( l )1 , T ( l (cid:48) )2 ] = 0 independent of ψ i . Obviously itis important whether maintenance or de novo happens ﬁrst, since the transitionprobabilities and the transitions themselves depend on the actual pattern. Fur-thermore in the case ψ i < M ( l ) s , M ( l (cid:48) ) s ] (cid:54) = 0 and [ T ( l ) s , T ( l (cid:48) ) s ] (cid:54) = 0for l (cid:54) = l (cid:48) . The total transition matrix is then given by a combination of the celldivision and maintenance/de novo matrices.Recall that we consider two diﬀerent types of Dnmts, i.e., Dnmt1 and Dnmt3a/b.If only one type of Dnmt is active (KO data) the matrix has the form P = 0 . · ( D · P + D · P ) (16)and if all Dnmts are active (WT data) P = 0 . · ( D · P · ˜ P + D · P · ˜ P ) , (17) [ A, B ] = AB − BA is the commutator of the matrices A and B . mT Cc − c d − d Fig. 4.

Conversions of the unobservable states u, m to observable states

T, C withrespective rates. where P s and ˜ P s have one of the forms (12)-(15). This leads to four diﬀerentmodels for one active enzyme or 16 models for all active enzymes respectively.In the second case P s represents the transitions caused by Dnmt1 and ˜ P s thetransitions caused by Dnmt3a/b. Note that if ψ L = ψ R = 1 all models are thesame within each case. The actual methylation state of a C cannot be directly observed. During BS-seq, with high probability every unmethylated C (denoted by u ) is convertedinto Thymine (T) and every 5mC (denoted by m ) into C. However, conversionerrors may occur and we deﬁne their probability as 1 − c and 1 − d , respectively,as shown by the dashed arrows in Fig. 4. It is reasonable that these conversionerrors occur independently and with approximately identical probability at eachsite and thus the error matrix for a single CpG takes the form ∆ =  c c (1 − c ) c (1 − c ) (1 − c ) c (1 − d ) cd (1 − c )(1 − d ) d (1 − c ) c (1 − d ) (1 − c )(1 − d ) cd d (1 − c )(1 − d ) d (1 − d ) d (1 − d ) d  . (18)Due to the independency of the events this matrix can easily be generalized forsystems with L > ∆ L = ∆ ⊗ ∆ L − for L ≥ . (19)Hence, ∆ L gives the probability of observing a certain sequence of C and Tnucleotides for each given unobservable methylation pattern. In order to computethe likelihood ˆ π of the observed BS-seq data, we therefore ﬁrst compute thetransient distribution π ( t ) of the underlying Markov chain at the correspondingtime instant t by solving π ( t ) = π (0) · P t (20)and then multiply the distribution of the unobservable patterns with the errormatrix. ˆ π = π ( t ) · ∆ L . (21) The number of cell divisions is estimated from the time of the measurement sincethese cells divide once every 24 hours. ote that this yields a hidden Markov model with emission probabilities ∆ L . Inthe following the values for c were chosen according to [2]. Since the value for d was not determined in [2], we measured the conversion rate d = 0 .

94 in anindependent experiment under comparable conditions (data not shown).

In order to estimate the parameters θ = ( µ, ψ L , ψ R , τ ), we employ a Maximum(Log)Likelihood Estimator (MLE)ˆ θ = arg max θ (cid:96) ( θ ) , (cid:96) ( θ ) = L (cid:88) j =1 log(ˆ π j ( θ )) · N j , (22)where ˆ π is the pattern distribution obtained from the numerical solution of (20)and (21) for a given time t and N j is the number of occurrences of pattern j in the measured data. The parameters θ = ˆ θ are chosen in such a way that (cid:96) is maximized. Visual inspection of all two dimensional cuts of the likelihoodlandscapes showed only a single local maximum.We employ the MLE twice in order to estimate the parameter vector ˆ θ forDnmt1 from the 3a/b DKO (double knockout) data and the vector ˆ θ a/b forDnmt3a/b from the Dnmt1 KO data, where transition matrix (16) is used. Thecorresponding time instants are t = 26 for the 3a/b DKO data and t = 41 forthe 1KO data.We approximate the standard deviations of the estimated parameters ˆ θ as fol-lows: Let I (ˆ θ ) = E [ −H (ˆ θ )] be the expected Fisher information, with the Hes-sian H (ˆ θ ) = ∇∇ (cid:124) (cid:96) (ˆ θ ). The inverse of the expected Fisher information is a lowerbound for the covariance matrix of the MLE such that we can use the approxi-mation σ (ˆ θ ) ≈ (cid:113) diag( −H (ˆ θ )).A prediction for the wild-type can be computed by combining the estimatedvectors such that in the model both types of enzymes are active. For this, weinsert ˆ θ in P s and ˆ θ a/b in ˜ P s in (17) to obtain the transition matrix for thewild-type. For our analysis we focused at the single copy genes Afp (5 CpGs) and Tex13 (10CpGs) as well as the repetitive elements IAP (intracisternal A particle) (6 CpGs),L1 (Long interspersed nuclear elements) (7 CpGs) and mSat (major satellite)(3 CpGs). Repetitive elements occur in multiple copies and are dispersed overthe entire genome. Therefore they allow capturing an averaged, more generalbehavior of methylation dynamics. If a locus contains more than three CpGs,the analysis is done for all sets of three adjacent sites independently, in order tokeep computation times short and memory requirements low. In the sequel, wemainly focus on the estimated dependency parameters ψ L and ψ R and on theprediction quality of the diﬀerent models.he estimates for all the available KO data and all suggested models obtainedusing the transition matrix in Eq. (16) are summarized as histograms in Fig. 5.Because of the diﬀerent possibilities to combine the four diﬀerent models inEq. (12)-(15) and because of the diﬀerent loci considered, in total there are 84estimates for each KO data set. We plot the number of occurrences N of ψ L (left) and ψ R (right) in diﬀerent ranges for both sorts of KO data (Dnmt1KOand Dnmt3a/b DKO).The estimates of ψ L spread over the whole interval [0 ,

1] while in the case of ψ R , nearly all estimates are larger than 0 .

99 and only in a few cases the depen-dency parameter is signiﬁcantly smaller than 1. Hence, in most cases the methy-lation probabilities are independent of the right neighbor for both Dnmt1KO andDnmt3a/b DKO. For ψ L the dependency parameter in the Dnmt3a/b DKO caseoccurs more often close to 1, meaning that the transitions induced by Dnmt1 havelittle to no dependency on the left neighbor. On the other hand for Dnmt1KOthe dependency parameter occurs more often at smaller values giving evidencethat there is a dependency on the left neighbor for the activity of Dnmt3a/b.Note that all models show a similar behavior in terms of the dependency param-eters for a given locus or position within a locus respectively, i.e. either ψ i ≈ ψ i < ψ R is usually close to 1 a smaller model with only three parameters θ = ( µ, ψ, τ ) can be proposed, where ψ is a dependency parameter for the leftneighbor. This model can either be obtained by ﬁxing ψ R = 1 in the originalmodel and setting ψ = ψ L or by redeﬁning the transition probabilities to ψx ifthe left neighbor is unmethylated and 1 − ψ (1 − x ) if the left neighbor is methy-lated. In that case ψ and ψ L are related via ψ = 0 . ψ L + 1). Note that bothversions yield the same results.In order to check whether there is a signiﬁcant diﬀerence in the original andthe smaller model, we performed a Likelihood-ratio test with the null hypothesisthat the smaller model is a special case of the original model. Since the originalmodel with more parameters is always as least as good as the smaller model,our goal is to check in which cases the smaller model is suﬃcient. Indeed if ψ R was estimated to be approximately 1 the Likelihood-ratio test indicates that thesmaller model is suﬃcient (p-value ≈ ψ R diﬀers signiﬁcantly from 1 the original model has to be used (p-value < . Models 1-4 . Forthe prediction, the notation ( x, y ) is used to refer to Model x for the Dnmt3a/bDKO (only Dnmt1 active) and Model y for the Dnmt1KO case (only Dnmt3a/bactive). One instance of the prediction, for which Model 1 was used for bothDnmt1KO and Dnmt3a/b DKO, i.e. (1 , L N Dnmt1KODnmt3a/b DKO ψ R N Dnmt1KODnmt3a/b DKO

Fig. 5.

Histograms for the estimated dependency parameters ψ L and ψ R for all sets ofthree adjacent CpGs in all loci and for all suggested models. repetitive element (L1) in Tab. 1. While the standard deviation of the estimatedparameters for µ is always of the order 10 − and for τ of order 10 − , it is usu-ally of order 10 − for ψ i . Depending on the model, locus and position, standarddeviations up to order 10 − may occur for the dependency parameters in a fewcases.In Fig. 6 the predictions for the pattern distribution together with the WTpattern distribution and a prediction from the neighborhood independent model( ψ L = ψ R = 1) for all loci are shown in the main plot. As an inset the distribu-tions are shown on a smaller scale to display small deviations. With the exceptionof patterns 0 and 64 (which corresponds to no methylation/full methylation ofall sites) in L1 and pattern 64 in all loci, where the diﬀerence between WT andthe numerical solution is about 10%, the diﬀerence is always small ( < KL = L (cid:88) j =1 π j (WT) log (cid:18) π j (WT) π j (pred) (cid:19) (23)that we list in Tab. 2. The diﬀerence in KL between the “best” and the “worst”case is about 0 .

01. The mean and standard deviation for KL was obtained viabootstrapping of the wild-type data (10 .

000 bootstrap samples for each model).Since no conﬁdence intervals of the parameters are included, this standard devi-ation can be regarded as a lower bound. However, even with these lower boundsthe intervals of KL overlap for all models, such that no model can be favorized. able 1. Estimated parameters for the KO data and model based on Eq. (12) for theloci Afp and L1 with sample size n .KO µ ψ L ψ R τ n LocusDnmt1 0 . ± .

062 0 . ± .

076 1 . ± .

094 0 . ± .

016 134 AfpDnmt3a/b 0 . ± .

003 0 . ± .

011 1 . ± .

006 10 − ± .

011 186 AfpDnmt1 0 . ± .

051 0 . ± .

067 1 . ± .

122 0 . ± .

004 1047 L1Dnmt3a/b 0 . ± .

037 1 . ± .

038 0 . ± .

045 10 − ± .

002 805 L1

Index of pattern Z P ( Z ) Wild-typePrediction (neighborhood dependent)Prediction (neighborhood independent) (a) Afp

Index of pattern Z P ( Z ) (b) L1 Index of pattern Z P ( Z ) (c) IAP Index of pattern Z P ( Z ) (d) Tex13 Index of pattern Z P ( Z ) (e) mSat Fig. 6.

The ﬁgures show an example for the predicted (neighborhood dependent andneighborhood independent) and the measured pattern distribution for each locus. Theinset shows a zoomed in version of the distribution. ndex of pattern Z P ( Z ) Wild-typePrediction (a) (1,1)

Index of pattern Z P ( Z ) (b) (1,2) Index of pattern Z P ( Z ) (c) (1,3) Index of pattern Z P ( Z ) (d) (1,4) Index of pattern Z P ( Z ) (e) (2,1) Index of pattern Z P ( Z ) (f) (2,2) Index of pattern Z P ( Z ) (g) (2,3) Index of pattern Z P ( Z ) (h) (2,4) Fig. 7.

The ﬁgures show the predicted and the measured pattern distribution for all16 models for mSat. The inset shows a zoomed in version of the distribution. ndex of pattern Z P ( Z ) (i) (3,1) Index of pattern Z P ( Z ) (j) (3,2) Index of pattern Z P ( Z ) (k) (3,3) Index of pattern Z P ( Z ) (l) (3,4) Index of pattern Z P ( Z ) (m) (4,1) Index of pattern Z P ( Z ) (n) (4,2) Index of pattern Z P ( Z ) (o) (4,3) Index of pattern Z P ( Z ) (p) (4,4) Fig. 7. (cont.)

The ﬁgures show the predicted and the measured pattern distributionfor all 16 models for mSat. The inset shows a zoomed in version of the distribution. able 2.

Kullback-Leibler divergence KL for the 16 models.Model (1 ,

1) (1 ,

2) (1 ,

3) (1 , KL . ± . . ± . . ± . . ± . ,

1) (2 ,

2) (2 ,

3) (2 , KL . ± . . ± . . ± . . ± . ,

1) (3 ,

2) (3 ,

3) (3 , KL . ± . . ± . . ± . . ± . ,

1) (4 ,

2) (4 ,

3) (4 , KL . ± . . ± . . ± . . ± . In [4] location- and neighbor-dependent models are proposed for single-strandedDNA methylation data in blood and tumor cells. The (de-)methylation ratesdepend on the position of the CpG relative to the 3’ or 5’ end and/or on themethylation state of the left neighbor only. The dependency is realized by theintroduction of an additional parameter. In our proposed models we use double-stranded DNA and can therefore include hemi-methylated sites and even distin-guish on which strand the site is methylated. Furthermore we allow dependencieson both neighbors by introducing two diﬀerent dependency parameters. In con-trast [6] copes with the neighborhood dependency indirectly by allowing diﬀerentparameter values for diﬀerent sites. In order to reduce the dimensionality of theparameter vector, a hierarchical model based on beta distributions is proposed.Another diﬀerence to our model is the distinction between de novo rates for par-ent and daughter strand. However, this can easily be included in future work. Adensity-dependent Markov model was proposed [14]. In this model, the proba-bilities of (de-)methylation events may depend on the methylation density in theCpG neighborhood. In addition, a neighboring sites model has been developed,in which the probabilities for a given site are directly inﬂuenced by the states ofneighboring sites to the left and right [14]. When these models were tested ondouble-stranded methylation patterns from two distinct tandem repeat regions ina collection of ovarian carcinomas, the density-dependent and neighboring sitesmodels were superior to independent models in generating statistically similarsamples. Although this model also includes the dependence on the methylationstate on the left and right neighbor for double-stranded DNA the approach isdiﬀerent. The transition probabilities of the neighbor-independent model aretransformed into a transition probability of a neighbor-dependent model by in-troducing only one additional parameter. The state of the left and right neighborare taken into account by exponentiating this parameter by some norm. In addi-tion, this approach does not allow the intuitive interpretation of the dependencyparameter.

Conclusion