[PDF] The Velocity of the Propagating Wave for Spatially Coupled Systems with Applications to LDPC Codes

Abstract

We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of density evolution equations that tracks the profile of decoding errors along the spatial direction of coupling. It is known that, for suitable boundary conditions and after a transient phase, the error profile exhibits a "solitonic behavior". Namely, a uniquely-shaped wavelike solution develops, that propagates with constant velocity. Under this assumption we derive an analytical formula for the velocity in the framework of a continuum limit of the spatially coupled system. The general formalism is developed for spatially coupled low-density parity-check codes on general binary memoryless symmetric channels which form the main system of interest in this work. We apply the formula for special channels and illustrate that it matches the direct numerical evaluation of the velocity for a wide range of noise values. A possible application of the velocity formula to the evaluation of finite size scaling law parameters is also discussed. We conduct a similar analysis for general scalar systems and illustrate the findings with applications to compressive sensing and generalized low-density parity-check codes on the binary erasure or binary symmetric channels.

Full PDF

11 The Velocity of the Propagating Wavefor Spatially Coupled Systems withApplications to LDPC Codes

Rafah El-Khatib,

Student Member, IEEE and Nicolas Macris,

Member, IEEE

Abstract

We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of densityevolution equations that tracks the proﬁle of decoding errors along the spatial direction of coupling. It is known that,for suitable boundary conditions and after a transient phase, the error proﬁle exhibits a “solitonic behavior”. Namely, auniquely-shaped wavelike solution develops, that propagates with constant velocity. Under this assumption we derivean analytical formula for the velocity in the framework of a continuum limit of the spatially coupled system. Thegeneral formalism is developed for spatially coupled low-density parity-check codes on general binary memorylesssymmetric channels which form the main system of interest in this work. We apply the formula for special channelsand illustrate that it matches the direct numerical evaluation of the velocity for a wide range of noise values. Apossible application of the velocity formula to the evaluation of ﬁnite size scaling law parameters is also discussed.We conduct a similar analysis for general scalar systems and illustrate the ﬁndings with applications to compressivesensing and generalized low-density parity-check codes on the binary erasure or binary symmetric channels.

Index Terms

Message passing, density evolution, potential functional, threshold saturation, soliton, wave propagation, com-pressive sensing.

I. I

NTRODUCTION

Spatial coupling was ﬁrst introduced by Felstrom and Zigangirov in the context of low-density parity-check(LDPC) codes [1]. Spatially coupled codes have been shown to be capacity-achieving on binary memorylesssymmetric (BMS) channels under belief propagation (BP) decoding. The capacity-achieving property is due tothe “threshold saturation” of the BP threshold of the coupled system towards the maximum a-posteriori (MAP)threshold of the uncoupled code ensemble [2], [3]. Spatial coupling has also been applied to several other problemsbesides channel coding [4], [5], such as lossy source compression [6], [7], compressive sensing [8], [9], [10], randomconstraint satisfaction problems [11], [12], [13], and a coupled Curie-Weiss (toy) model [14], [15].

R. El-Khatib and Nicolas Macris are with the School of Computer and Communication Sciences, ´Ecole Polytechnique F´ed´erale de Lausanne(EPFL), Lausanne, Switzerland, e-mails: { rafah.el-khatib,nicolas.macris } epﬂ.ch. August 1, 2018 DRAFT a r X i v : . [ c s . I T ] J a n Consider a single (uncoupled) LDPC code. To construct the corresponding coupled code of spatial length L c + w ,we take L c + w replicas of the single system and “couple” every w adjacent single systems by means of a uniformwindow function. At every iteration of the BP algorithm, the variable and check nodes of the coupled graph exchangemessages which are described by a set of coupled density evolution (DE) iterative equations. The solution to the DEequations, called the decoding proﬁle x , is a vector of “error distributions” (more precisely, distributions of the BPlog-likelihood estimates) along the spatial axis of coupling. More speciﬁcally, let the integer z ∈ {− w + 1 , . . . , L c } denote the position along the spatial direction of the graph construction (on which the replicas are spread). Then the z th component of x , call it x z , denotes the distribution of the BP log-likelihood estimate at the z th position. In thespecial case of the binary erasure channel (BEC) this component is reduced to the usual scalar erasure probability ≤ x z ≤ at position z along the spatial axis of coupling.Spatially coupled codes perform well, and are capacity achieving, due to the threshold saturation phenomenonthat is proved for general BMS channels in [2], [3]. More speciﬁcally, as long as the channel noise is below theMAP threshold, the decoding proﬁle of a spatially coupled code converges to the all- ∆ ∞ vector after enoughiterations of the BP algorithm, where ∆ ∞ is the Dirac mass at inﬁnite log-likelihood (i.e. perfect knowledge ofthe bits). In the special case of the BEC, the all- ∆ ∞ vector corresponds to a vector of scalar erasure probabilitiesdriven to zero by DE iterations. On the other hand, the probability distribution of the log-likelihoods of bits of thecorresponding uncoupled code only converge to ∆ ∞ when the channel noise is below the BP threshold (which islower than the MAP threshold).The threshold saturation phenomenon is made possible due to “seeding” at the boundaries of the spatially coupledcode. Seeding means that we ﬁx the bits at the boundaries so that the probability distributions of their log-likelihoodsare ∆ ∞ . This facilitates BP decoding near the boundaries, and this effect is propagated along the rest of the coupledchain. The minimum size of the seed that guarantees the propagation of the decoding effect is of the same orderas the size w of the coupling window; however, an exact determination of the minimum possible such size is anstill an interesting open question.When the channel noise is between the BP and the MAP thresholds, and the underlying uncoupled ensemblehas a unique non-trivial stable BP ﬁxed point that blocks decoding, an interesting phenomenon that has beenempirically observed is the appearance of a solitonic decoding wave after a certain number of transient iterationsof the BP algorithm. This soliton is characterized by a ﬁxed shape that seems independent of the initial conditionand has a constant traveling velocity that we henceforth call v . This phenomenology is discussed in more detailsin Section II-C. Figures 2 and 3 in Section II-C show an example of the transient phase and of the soliton in thecase of spatially coupled codes on the BEC. The main goal of this work is to derive a formula for the velocity ofthe soliton for general BMS channels .The decoding wave has recently been studied in the context of coding when transmission takes place over theBEC. In [16], it is proved that the solitonic wave solution exists and bounds on the velocity of the soliton arederived. However, the independence of the unique shape of the wave from the initial conditions remains an openquestion. In [17], more complex coupled systems are studied, where it is possible to have more than one non-trivialstable BP ﬁxed point, and there again some bounds on the velocity of the soliton are provided. The solitonic August 1, 2018 DRAFT behavior has also been studied for the coupled Curie-Weiss toy model [14] and in [15] a formula for the velocity,as well as an approximation, are derived and tested numerically.In the ﬁrst part of this work, we derive a general formula for the velocity of the wave in the asymptotic limit L c (cid:29) w (cid:29) in the context of coding when transmission takes place over general BMS channels (see Equ.(15)). This limit enables us to formulate the problem in a “continuum limit” which makes the derivations quitetractable. We show, with the use of numerical simulations, that this continuum limit yields good approximationsfor the velocity of the original discrete system. For simplicity, we limit ourselves to the case where the underlyinguncoupled LDPC code has only one non-trivial stable BP ﬁxed point .Our derivation rests on the assumption that the soliton indeed appears. More precisely, we assume that after aninitial transient phase, the decoding proﬁle develops a unique shape, independent of the initial condition, and travelswith a constant velocity v . This assumption can be strictly true only in an asymptotic limit of a very large chainlength and a large iteration number (or time). It is an interesting open problem to make this space-time asymptoticlimit precise and rigorously prove that the soliton appears and is independent of the initial condition. We conjecturethat our velocity formula is exact in such a limit.The formula for the velocity of the wave greatly simpliﬁes when we consider transmission over the BEC, becausethe decoding proﬁle reduces to a scalar vector of erasure probabilities. For transmission over general BMS channels,we also simplify the analysis by applying the Gaussian approximation [18], [19]. This consists of approximatingthe DE densities and the channel by suitable “symmetric” Gaussian densities. Since the mean m and the variance σ of these symmetric Gaussian densities are related by σ = 2 m , the analysis then reduces to that of a one-dimensional scalar system, whose technical difﬁculty is similar to that of the case of transmission over the BEC.We thus obtain a more tractable velocity formula and compare the numerical predictions of these velocity formulaswith the empirical value of the velocity for ﬁnite values of L c and w . Good agreement is found, on practically thewhole range of values within [ (cid:15) BP , (cid:15) MAP ] , even for small values of the window size w .It is of theoretical as well as practical interest to have a hold on the analytical expression of the velocity ofthe wave. The velocity is also related to other fundamental quantities that describe a coding system, such as theﬁnite-size scaling law that predicts the error probability of ﬁnite-length spatially coupled codes. In [20], the scalinglaw for a ﬁnite-length spatially coupled ( (cid:96), r, L c ) code, when transmission takes place over the BEC, is derived.Involved in this scaling law are parameters that can be estimated using the value of the velocity of the decodingwave. Using values of the velocity computed in our work, we provide reasonably good estimates of these parameters.In the second part of this work, we consider general spatially coupled scalar bipartite systems (that are notrestricted to coding) governed by a general message passing algorithm. In this setting, the system is scalar (one-dimensional) since the messages exchanged between the nodes are scalar. Due to seeding at the boundary, the“proﬁle” (we no longer call it the “decoding proﬁle”) exhibits the same phenomenology as in coding. Namely, asolitonic behavior appears after a short transient phase. We derive a formula for the velocity of the soliton for suchsystems and illustrate it on two applications: compressive sensing and generalized LDPC (GLDPC) codes.The derivations of the velocity formulas in both parts of the work use the same tools and assumptions. Wecombine the use of the “ potential functional ” introduced and used in a series of works [3], [11], [14], [21], [22], August 1, 2018 DRAFT as well as the continuum limit L c (cid:29) w (cid:29) which makes the derivations analytically tractable. The potential is a“variational formulation” of the message passing algorithm on coding systems. It is a functional whose stationarypoints are the ﬁxed points of the density evolution equations described by this algorithm, and has been used toprove threshold saturation in [3] for general BMS channels. A signiﬁcant part of the formalism in [3] is used in thepresent work. We also note that potential formulations have been used to characterize the ﬁxed point(s) of generalscalar systems at the MAP threshold using displacement convexity in [23], [24]. An extension of the ideas in theseworks could shed some light on the question of the independence of the soliton’s shape from the initial conditions.Section II introduces a few preliminary notions that we will need and reviews the phenomenology of the solitonicwave. In Section III, we formulate the continuum limit and state our main formula for the velocity of the solitonon general BMS channels; the derivation is presented in Section IV. Comparisons with numerical experimentsare presented in Section V. These concern transmission over the BEC as well as general BMS channels in theso-called Gaussian approximation. We also discuss a possible application of our formula to scaling laws for ﬁnite-size ensembles in Section VI. The case of general scalar spatially coupled systems is treated in Section VII, andillustrated for the examples of generalized LDPC codes (on the BEC or BSC channels) and compressive sensing.We present concluding remarks and propose further directions in Section VIII.A summary of this work has appeared in [25], [26].II. P RELIMINARIES

We consider (almost) the same setting as in [3] and adopt most of the notation introduced in that work. For moreinformation about the formalism in these preliminaries, one can consult [27].We denote by M ( ¯ R ) the space of probability measures x on the extended real numbers α ∈ ¯ R = R ∪ {∞} .Here α ∈ ¯ R should be interpreted as a “log-likelihood variable”. We call the measure x symmetric if (cid:82) E d x ( α ) = (cid:82) − E e − α d x ( α ) for all measurable sets E ⊂ ¯ R .We deﬁne an entropy functional H : M → R that maps a ﬁnite probability measure from M ( ¯ R ) to a real number.It is deﬁned as H ( x ) = (cid:90) d x ( α ) log (1 + e − α ) . (1)Note that this is a linear functional. Linearity is used in an important way to compute the entropy of convexcombinations of measures (which also yield probability measures). But we will also compute the “entropy” associatedto differences of measures by setting H ( x − x ) ≡ H ( x ) − H ( x ) . In other words, the entropy functional isextended in an obvious way to the space of signed measures.In the remainder of this work, we will use the Dirac masses ∆ ( α ) and ∆ ∞ ( α ) at zero and inﬁnite log-likelihood,that have entropies H (∆ ) = 1 and H (∆ ∞ ) = 0 , respectively.We will also use the standard variable-node and check-node convolution operators (cid:16) and (cid:24) for log-likelihoodratio message distributions involved in DE equations [27]. For x , x ∈ M ( ¯ R ) , the usual convolution x (cid:16) x isthe density of α = α + α , August 1, 2018 DRAFT and x (cid:24) x is the density of α = 2 tanh − (tanh α α . More formally, for any measurable set E ∈ R , the operators are deﬁned by  ( x (cid:16) x )( E ) = (cid:82) d x ( α ) x ( E − α ) , ( x (cid:24) x )( E ) = (cid:82) d x ( α ) x (cid:16) − (cid:16) tanh( E/ α/ (cid:17)(cid:17) . (2)We note that the identities of the (cid:16) and (cid:24) operators are ∆ ∞ and ∆ , and their annihilators are ∆ and ∆ ∞ . Moreexplicitly,  ∆ ∞ (cid:24) x = x , ∆ (cid:24) x = ∆ , ∆ (cid:16) x = x , ∆ ∞ (cid:16) x = ∆ ∞ . (3)Each operation, taken separately, is associative, commutative, and linear. However, when they are taken together,there is no distributive law; also, they don’t associate in the sense that x (cid:16) ( x (cid:24) x ) (cid:54) = ( x (cid:16) x ) (cid:24) x and x (cid:24) ( x (cid:16) x ) (cid:54) = ( x (cid:24) x ) (cid:16) x . We will also use the so-called duality rules  H ( x (cid:16) y ) + H ( x (cid:24) y ) = H ( x ) + H ( y ) ,H ( x (cid:16) a ) + H ( x (cid:24) a ) = H ( a ) ,H ( a (cid:16) b ) + H ( a (cid:24) b ) = 0 , (4)where x , y ∈ M ( ¯ R ) and a , b are differences of probability measures a = x − x , b = x − x , x i ∈ M ( ¯ R ) , i = 1 , , , . A. Single System

Consider an (uncoupled) LDPC( λ, ρ ) code ensemble and transmission over the BMS channel. Here, λ ( y ) = (cid:80) l λ l y l − and ρ ( y ) = (cid:80) r ρ r y r − are the usual edge-perspective variable-node and check-node degree distributions,respectively. The node-perspective degree distributions L and R are deﬁned by L (cid:48) ( y ) = L (cid:48) (1) λ ( y ) and R (cid:48) ( y ) = R (cid:48) (1) ρ ( y ) , respectively. Moreover, consider communication over a family of BMS channels whose distribution c h ( α ) in the log-likelihood domain is parametrized by the channel entropy H ( c h ) = h .Let ˜ x ( t ) denote the variable-node output distribution of the BP algorithm at iteration t ∈ N . We can track theaverage behavior of the BP decoder by means of the DE iterative equations that are written as a recursion in termsof the variable-node output distribution as follows ˜ x ( t +1) = c h (cid:16) λ (cid:16) ( ρ (cid:24) (˜ x ( t ) )) , (5)with initial condition ˜ x (0) = ∆ (equivalently, we can take the perhaps more natural initial condition ˜ x (0) = c h ).There are two thresholds of interest for us. The ﬁrst one is the algorithmic threshold; it is deﬁned for a familyof BMS channels whose channel distributions c h ( α ) : R → M ( ¯ R ) are ordered by degradation and parametrized bytheir entropy H ( c h ) = h . It is also called the BP threshold of the family and is deﬁned as h BP = { h ∈ [0 ,

1] : ˜ x = c h (cid:16) λ (cid:16) ( ρ (cid:24) (˜ x )) = ⇒ ˜ x = ∆ ∞ } . In the literature, this quantity is often denoted by c ( h ) . August 1, 2018 DRAFT

The second threshold corresponds to optimal (MAP) decoding h MAP = { h ∈ [0 ,

1] : lim inf n →∞ n E [ H ( X n | Y n ( h ))] > } , where H ( X n | Y n ( h )) is the conditional Shannon entropy of the input given by the channel observations, and E isthe expectation over the code ensemble.The potential functional W s ( x ) , x ∈ M ( ¯ R ) , of the “single” or uncoupled system is W s ( x ) = 1 R (cid:48) (1) H ( R (cid:24) ( x )) + H ( ρ (cid:24) ( x )) − H ( x (cid:24) ρ (cid:24) ( x )) − L (cid:48) (1) H ( c (cid:16) L (cid:16) ( ρ (cid:24) ( x ))) . (6)The ﬁxed point form of the DE equation (5) is obtained by setting to zero the functional derivative of W s ( x ; c ) with respect to x . In other words, x = c h (cid:16) λ (cid:16) ( ρ (cid:24) ( x )) is equivalent to lim γ → γ ( W s ( x ) + γη ) − W s ( x )) = 0 , (7)where η is a difference of two probability measures (see [3] for the proof of this statement). The BP and MAPthresholds, h BP and h MAP , respectively, can be obtained from the analysis of the stationary points of the potentialfunction. See [3], [28] for more details and a rigorous discussion of this issue.

Remark about notation.

In the remainder of this work, most of the time, we omit the subscript h from c h and theargument α from x ( α ) . This is because we will need a subscript (resp. an argument) z that represents the positionalong the chain in the discrete (resp. continuous) case. B. Spatially Coupled System

For standard LDPC codes, the BP threshold h BP is, in general, lower than the MAP threshold h MAP . The deﬁnitionsof the BP and MAP thresholds above extend to the spatially coupled setting. Spatial coupling exhibits two attractiveproperties. First, the MAP threshold is conserved under coupling in the limit L c → + ∞ and for all w . The proofof this statement is found in [28] (see also [29], [30]). Second, the BP threshold of the coupled system saturatesto its MAP threshold as proved in [2], [3]. The main consequence of threshold saturation is that one can decodeperfectly up to the h MAP .Let us now describe the density evolution and potential functional formalism for the spatially coupled codeensemble. Consider L c + w “replicas” of the single system described in Section II-A, placed on the spatial coordinates z ∈ {− w + 1 , . . . , L c } . The system at position z is coupled to w neighboring systems by means of a couplingwindow. For simplicity, we consider a uniform coupling window. We denote by ˜ x ( t ) z the variable-node outputdistribution at position z ∈ {− w + 1 , . . . , L c } on the spatial axis and at time t ∈ N . The DE equation of thecoupled system takes the form ˜ x ( t +1) z = c z (cid:16) λ (cid:16) (cid:32) w w − (cid:88) i =0 ρ (cid:24) (cid:16) w w − (cid:88) j =0 ˜ x ( t ) z + i − j (cid:17)(cid:33) . (8)In this equation, c z = c , for z ∈ { , . . . , L c } and c z = ∆ ∞ for z ∈ {− w + 1 , . . . , } . Furthermore, we ﬁx theleft boundary to x ( t ) z = ∆ ∞ for z ∈ {− w + 1 , . . . , } , for all t ∈ N . These conditions express perfect information August 1, 2018 DRAFT at the left boundary which is what enables seeding and the decoding wave propagation along the chain of coupledcodes. The initial condition (8) is x (0) z = ∆ for z ∈ { , . . . , L c + w − } .It will be convenient to work with a smoothed version of the proﬁle ˜ x ( t ) z , namely x ( t ) z = w w − (cid:80) i =0 ˜ x ( t ) z − i , which isthe check-node input distribution. Then, using this change of variables, (8) can be rewritten as x ( t +1) z = 1 w w − (cid:88) i =0 c z − i (cid:16) λ (cid:16) (cid:32) w w − (cid:88) j =0 ρ (cid:24) (cid:16) x ( t ) z − i + j (cid:17)(cid:33) . (9)Just as in the single system case, this DE equation can be expressed as the stationarity condition of a potentialfunctional (see [3]) W ( x ) = L (cid:88) z = − w +1 (cid:110) R (cid:48) (1) H ( R (cid:24) ( x z )) + H ( ρ (cid:24) ( x z )) − H ( x z (cid:24) ρ (cid:24) ( x z )) − L (cid:48) (1) H (cid:16) c z (cid:16) L (cid:16) (cid:0) w w − (cid:88) i =0 ρ (cid:24) ( x z + i ) (cid:1)(cid:17)(cid:111) . (10)where x = ( x − w +1 , . . . , x L + w − ) . The ﬁxed point form of (9) is equivalent to lim γ → γ − ( W ( x + γη ) − W ( x )) = 0 for η = ( η − w +1 , . . . , η L + w − ) where η i are differences of probability measures. C. Phenomenological observations

Our derivation is far from rigorous and is based on an assumption derived from a phenomenological pictureobserved from simulations. We summarize the main observations in this paragraph for the case of transmission overthe BEC channel. This channel also gives us the opportunity to illustrate the formalism outlined in Sections II-Aand II-B in a concrete case.The BEC has channel distribution c (cid:15) ( α ) = (cid:15) ∆ + (1 − (cid:15) )∆ ∞ , where (cid:15) is the erasure probability, and H ( c (cid:15) ) = (cid:15) (hence h = (cid:15) ). The density of the BP estimates of log-likelihood variables can be parametrized as x ( t ) ( α ) = x ( t ) ∆ ( α ) + (1 − x ( t ) )∆ ∞ ( α ) , where x ( t ) ∈ [0 , is interpreted as the erasure probability at iteration t ∈ N . TheDE equation becomes a one-dimensional iterative map x ( t +1) = (cid:15)λ (1 − ρ (1 − x ( t ) )) (11)over scalars in [0 , . These iterations are always initialized with x (0) = 1 or, equivalently, x (0) = (cid:15) . Thecorresponding ﬁxed point equation is the stationarity condition for the potential function W BEC ( x ) = 1 R (cid:48) (1) (1 − R (1 − x )) − xρ (1 − x ) − (cid:15)L (cid:48) (1) L (1 − ρ (1 − x )) . (12)Note that the potential function is deﬁned up to a constant which is set here so that W BEC (0) = 0 . Figure 1 illustratesthe potential function for a (3 , -regular Gallager ensemble, for several values of (cid:15) . For (cid:15) < . , the potentialfunction (12) is strictly increasing, and equivalently the DE iterations are driven to the unique minimum at x = 0 .At (cid:15) BP = 0 . a horizontal inﬂexion point appears and a second non-trivial local minimum x BP appears; thisminimum corresponds to the non-trivial ﬁxed point reached by DE iterations. It is known that the MAP thresholdis equal to the erasure probability where the non-trivial local minimum is at the same height as the trivial one andthat decoding becomes impossible once the non-trivial minimum becomes a global minimum. For this example,this happens when (cid:15) MAP = 0 . . Figure 1 also shows the energy gap that is deﬁned for (cid:15) BP ≤ (cid:15) ≤ (cid:15) MAP as ∆ E = W BEC ( x BP ) − W BEC (0) . At the MAP threshold, we have ∆ E = 0 . August 1, 2018 DRAFT x W s (x) Below the BP thresholdAt the BP thresholdBetween the BP and MAP thresholdsAt the MAP threshold

Fig. 1. We plot the potential function of the (3 , regular LDPC when transmission takes place over the BEC ( (cid:15) ) for several values of theerasure probability (cid:15) . Note that x = 0 is always a trivial stationary point. For (cid:15) < . (cid:15) BP the potential function is strictly increasing and x = 0 is the only minimum. At (cid:15) BP = 0 . a horizontal inﬂexion point develops and a second non-trivial minimum exists for larger (cid:15) . At (cid:15) MAP = 0 . the trivial and non-trivial minimum are at the same height and the energy gap ∆ E vanishes. z x z Fig. 2. We consider the spatially coupled LDPC( λ ( x ) , ρ ( x ) ) ensemble, with λ ( x ) = 0 . x +0 . x +0 . x and ρ ( x ) = x , with transmissionover the BEC( . ). The parameters of coupling are L c + w = 16 and w = 3 . We plot the decoding proﬁle during the ﬁrst 10 iterations, wherethe proﬁle is initialized to 0 when z ≤ and to 1 elsewhere. We observe that the segment initialized to 1 decreases quickly and converges tothe BP threshold x BP = 0 . . We observe that the transient phase is only about 5 iterations here, before the decoding proﬁle converges to aﬁxed shape. Let us now describe the phenomenology of the soliton (decoding wave) for spatially coupled codes. Our discussionis limited to the case where the underlying code ensemble has a single non-trivial DE ﬁxed point (equivalently, thepotential function has a single non-trivial local minimum). One can show that this is always the case for regularcode ensembles. For irregular degree distributions the situation may be more complicated with many non-trivialﬁxed points that appear. For transmission over the BEC, equation (9) reads x ( t +1) z = 1 w w − (cid:88) i =0 (cid:15) z − i λ (cid:16) w w − (cid:88) j =0 (cid:0) − ρ (1 − x ( t ) z − i + j ) (cid:1)(cid:17) . (13)Here, (cid:15) z = 0 for z ∈ {− w + 1 , . . . , } and (cid:15) z = (cid:15) for z ∈ { , . . . , L c } . Furthermore, we ﬁx the left boundaryto x ( t ) z = 0 for z ∈ {− w + 1 , . . . , } , for all t ∈ N . These are the “perfect seeding” conditions which enable theinitiation of decoding. The initial condition for the iterations is x (0) z = 1 (or (cid:15) ) for z ∈ { , . . . , L c + w − } .The evolution of the decoding wave can be decomposed into two phases: a transient and a stationary phase. In the August 1, 2018 DRAFT z x z Iteration 30Iteration 60Iteration 90Iteration 120Iteration 150

Fig. 3. We consider the (3 , -regular LDPC spatially coupled code with L = 50 , w = 3 on the BEC( . ). We plot the error probabilityalong the spatial dimension and observe the “decoding wave”. This “soliton” is plotted every 30 iterations until iteration 150 and is seen tomake a quick transition from zero error probability to the BP-value x BP = 0 . of the error probability. The optimal (MAP) noise thresholdis (cid:15) MAP = 0 . . transient phase, we observe a proﬁle of erasure probabilities x z changing shape. The segment initialized to x (0) z = 1 quickly drops to x z ≈ x BP where it remains stuck on the far right for large values of z . The seeding region, on theother hand, starts progressing towards the right-hand side and, after a few iterations, a ﬁxed proﬁle shape develops.This transient phase is illustrated in Figure 2 for an irregular code. Overall, it only lasts for a few iterations (ofthe order of iterations in this example). After this transient phase is over, one observes a stationary phase with a solitonic behavior , as depicted in Figure 3. The proﬁle of erasure probabilities has a stationary shape with a frontat position z front that moves, at a constant speed, towards the right. The soliton is relatively well-localized withinapproximately w positions and quickly approaches x z → for z < z front and x z → x BP for z > z front . The stationaryphase and its soliton are depicted in Figure 3 for a ﬁnite spatially coupled (3 , -regular ensemble with chain length L c = 50 and w = 3 for (cid:15) = 0 . . In this ﬁgure, we plot the decoding proﬁle every 30 iterations starting at the th iteration (the leftmost curve) and until the th iteration (the rightmost curve). The kink increases sharplyfrom x z = 0 to x z = x BP = 0 . over a width of the order of w = 6 .III. C ONTINUUM LIMIT AND MAIN RESULT

A. Continuum Limit

We now consider the coupled system in the continuum limit , in which the length of the coupling chain L c isﬁrst taken very large L c → + ∞ , and then the window size is taken very large w → + ∞ . The continuum limit hasalready been considered for the special case of the BEC in [16], [23], [24]. We slightly abuse notation by keepingthe same symbols for the proﬁle, the spatial position, and the channel distribution in the continuum limit. Thus, wedenote by x ( · , · ) the continuous proﬁle of distributions and set x ( zw , t ) ≡ x ( t ) z . We then replace zw → z so that thenew z is the continuous variable on the spatial axis, z ∈ R .In view of the discussion of the phenomenology in Section II-C, we consider the class of proﬁles satisfying the“natural boundary conditions” x ( z, t ) → ∆ ∞ when z → −∞ for all t ∈ R , x ( z, t ) → x BP when z → + ∞ for all t ∈ N , where x BP is the unique non-trivial stable ﬁxed point of DE for the single system Equ. (5). August 1, 2018 DRAFT0

The BMS channel distribution is now also continuous, and we denote by c ( z ) the channel distribution at thecontinuous spatial position z ∈ R . The DE equation (9) then takes the form x ( z, t + 1) = (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) (cid:0) x ( z − u + s, t ) (cid:1)(cid:17) . (14)The initial condition at t = 0 is given by a proﬁle x ( z, that interpolates between the two limiting values of theboundary condition, namely x ( z, → ∆ ∞ when z → −∞ and x ( z, → x BP when z → + ∞ . B. Statement of Main Result

We consider the channel entropy h ∈ [ h BP , h MAP ] . The phenomenology tells us that: (i) after a transient phase, theproﬁle develops a ﬁxed shape X ( · ) ; (ii) the shape is independent of the initial condition; (iii) the shape travels atconstant speed v ; (iv) the shape satisﬁes the boundary conditions X ( z ) → ∆ ∞ for z → −∞ and X ( z ) → x BP for z → + ∞ . We thus formalize these observations and make an ansatz: Ansatz.

For each h ∈ [ h BP , h MAP ] there exist a constant v ≥ and a family of probability measures X ( z ) (indexedby z ∈ R ) satisfying the boundary conditions X ( z ) → ∆ ∞ for z → −∞ and X ( z ) → x BP for z → + ∞ , such that,for t → + ∞ and | z − vt | = O (1) , the solution of DE (14) is independent of the initial condition and satisﬁes x ( z, t ) → X ( z − vt ) .Implicit in this ansatz is that we restrict ourselves here to underlying code ensembles that have only one non-trivial (stable) BP ﬁxed point. This is true for regular codes for example (but is not limited to this case). Ensembleswith many non-trivial ﬁxed points could lead to more complicated phenomenologies as emphasized in [17] andwould require to modify the ansatz. Velocity of the soliton for general BMS channels.

Under the assumption above the velocity of the soliton is givenby v = ∆ E (cid:82) R d z H (cid:16) ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z ) (cid:24) (cid:17) , (15)where ∆ E is the energy gap deﬁned as ∆ E = W s ( x BP ) − W s (∆ ∞ ) , (16)and we recall that W s is the potential (6) of the uncoupled system , x BP is the non-trivial BP ﬁxed point to whichthe uncoupled system converges, and ∆ ∞ is the trivial ﬁxed point (Dirac mass at inﬁnity).Let us make a few remarks. In this formula, the prime denotes the derivative X (cid:48) ( z ) = lim δ → δ − ( X ( z + δ ) − X ( z )) which is to be interpreted as a difference between two measures. The energy gap is only deﬁned for h BP ≤ h ≤ h MAP ,that is, when the single potential W s has a non-trivial non-negative local minimum (see e.g. Figure 1). It is exactlyequal to zero when h = h MAP , which conﬁrms the fact that the velocity of decoding is zero (no decoding occurs) inthis case. Note also that, with our normalizations, W s (∆ ∞ ) = 0 . August 1, 2018 DRAFT1

Formula (15) involves the shape X ( · ) . Using the DE equation, the ansatz x ( z, t ) → X ( z − vt ) , and the approximation x ( z, t + 1) − x ( z, t ) ≈ − v X (cid:48) ( z − vt ) , valid for small v , we ﬁnd after a change of variables that X ( z ) is the solutionof X ( z ) − v X (cid:48) ( z ) = (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) (cid:16) X ( z − u + s ) (cid:17)(cid:17) . (17)To obtain the shape X ( z ) and the velocity v , one must iteratively solve the closed system of equations formed by (15)and (17). Note that the assumption of small v used above is strictly valid for h close to h MAP . However, numericalsimulations conﬁrm that in practice the resulting velocity formula is precise over the whole range [ h BP , h MAP ] .IV. D ERIVATION OF VELOCITY FORMULA FOR

BMS

CHANNELS

Let us brieﬂy outline of the main steps of derivation. We ﬁrst write down a potential functional which gives,in the continuous setting, the DE ﬁxed point equation corresponding to (14). This enables us to formulate the DEiterations as a sort of gradient descent equation (Section IV-A). From there on, we use the ansatz in Section III-Bto derive the velocity formula (15).

A. Density evolution as gradient descent

We call ∆ W ( x ) the potential functional of the coupled system in the continuum limit obtained from (10). Thislimit involves an integral over the spatial direction z ∈ R and, in order to get a convergent result, we must subtract a“reference energy”. Essentially, any static reference proﬁle, here called x ( z ) , that satisﬁes the boundary conditions x ( z ) → ∆ ∞ , z → −∞ and x ( z ) → x BP , z → + ∞ , will do the job. For concreteness, one can take a Heaviside-likeproﬁle x ( z ) = ∆ ∞ , z < , x ( z ) = x BP , z ≥ . The potential functional is thus deﬁned as follows, ∆ W ( x ) = (cid:90) R d z (cid:16) P ( z, x ) − P ( z, x ) (cid:17) , (18)where P ( z, x ) is a z -dependent functional of x equal to P ( z, x ) = 1 R (cid:48) (1) H ( R (cid:24) ( x ( z, t ))) + H ( ρ (cid:24) ( x ( z, t ))) − H ( x ( z, t ) (cid:24) ρ (cid:24) ( x ( z, t ))) − L (cid:48) (1) H (cid:16) c ( z ) (cid:16) L (cid:16) (cid:0) (cid:90) d s ρ (cid:24) ( x ( z + s, t )) (cid:1)(cid:17) . (19)In Appendix A, we calculate the functional derivative of ∆ W ( x ) in a direction η ( z, t ) deﬁned as δ ∆ W δ x [ η ( z, t )] = dd γ ∆ W ( x + γη ) (cid:12)(cid:12)(cid:12) γ =0 , (20)and ﬁnd δ ∆ W δ x [ η ( z, t )] = (cid:90) R d z H (cid:32)(cid:16) (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:0) (cid:90) d s ρ (cid:24) ( x ( z − u + s, t )) (cid:1) − x ( z, t ) (cid:17) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:33) (21)From (14) and (21) we deduce that (cid:90) R d z H (cid:16) ( x ( z, t + 1) − x ( z, t )) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:17) = δ ∆ W δ x [ η ( z, t )] (22) August 1, 2018 DRAFT2

We note that, using the duality rule (4) for a = x ( z, t + 1) − x ( z, t ) and b = ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (recall that η must be a difference of two measures so that b also is such a difference) and the associativity of (cid:24) , this equationcan also be formulated as (cid:90) R d z H (cid:16) ( x ( z, t + 1) − x ( z, t )) (cid:16) ( ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) (cid:17) = − δ ∆ W δ x [ η ( z, t )] (23)In this form, we recognize a sort of inﬁnite-dimensional gradient descent equation in a space of measures. Thisreformulation of DE forms the basis of the derivation of the velocity formula. B. Final steps of the derivation

The potential functional can be decomposed in a “single system” part and a contribution that contains the“interaction” due to coupling. We have ∆ W ( x ) = W s ( x ) + W i ( x ) , (24)with W s ( x ) = (cid:90) R d z { P s ( z, x ) − P s ( z, x ) } , (25) W i ( x ) = (cid:90) R d z { P i ( z, x ) − P i ( z, x ) } , (26)where P s ( z, x ) = 1 R (cid:48) (1) H ( R (cid:24) ( x ( z, t ))) + H ( ρ (cid:24) ( x ( z, t ))) − H ( x ( z, t ) (cid:24) ρ (cid:24) ( x ( z, t ))) − L (cid:48) (1) H (cid:0) c ( z ) (cid:16) L (cid:16) (cid:0) ρ (cid:24) ( x ( z, t )) (cid:1)(cid:1) , (27)and P i ( z, x ) = H (cid:16) c ( z ) (cid:16) L (cid:16) (cid:0) ρ (cid:24) ( x ( z, t )) (cid:1)(cid:17) − H (cid:16) c ( z ) (cid:16) L (cid:16) (cid:0) (cid:90) d u ρ (cid:24) ( x ( z + u, t )) (cid:1)(cid:17) . (28)Note, for future use, that in fact P s ( z, x ) = W s ( x ( z, t )) is the single system potential (6) “at position z ”. Withthese deﬁnitions, the gradient descent equation (23) can be written as (cid:90) R d z H (cid:16) ( x ( z, t + 1) − x ( z, t )) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:17) = δ W s δ x [ η ( z, t )] + δ W i δ x [ η ( z, t )] . (29)Now, we use the ansatz to compute the three terms in this equation in the regime t → + ∞ , z → + ∞ such that | z − vt | = O (1) . We will choose the direction η ( z, t ) = X (cid:48) ( z − vt ) .We start with the left-hand side of (29). From x ( z, t ) → X ( z − vt ) and the approximation x ( z, t + 1) − x ( z, t ) ≈− v X (cid:48) ( z − vt ) , together with the special choice of η ( z, t ) , we can rewrite the left hand side of (29) as v (cid:90) R d z H (cid:0) X (cid:48) ( z − vt ) (cid:24) ρ (cid:48) (cid:24) ( X ( z − vt )) (cid:24) X (cid:48) ( z, t ) (cid:1) . (30)Using the commutativity of the operator (cid:24) , this is equal to v (cid:90) R d z H (cid:0) ρ (cid:48) (cid:24) ( X ( z − vt )) (cid:24) X (cid:48) ( z − vt ) (cid:24) (cid:1) . (31)Note that we can shift the argument in the integrals z − vt → z , and this term becomes independent of time. August 1, 2018 DRAFT3

Now, we consider the ﬁrst functional derivative on the right hand side of (29), when η ( z, t ) = X (cid:48) ( z − vt ) . Itshould be clear that we can immediately make the change of variables in the integrals z − vt → z which simpliﬁesthe formulas. By the calculations in Appendix A, we ﬁnd δ W s δ X [ X (cid:48) ( z )] = (cid:90) R d z (cid:26) H (cid:0) X ( z ) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )] (cid:17) − H (cid:0) c (cid:16) λ (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )] (cid:1)(cid:27) . (32)In order to simplify the above, we remark the following dd z (cid:110) R (cid:48) (1) H ( R (cid:24) ( X ( z ))) − H ( X ( z ) (cid:24) ρ (cid:24) ( X ( z ))) + H ( ρ (cid:24) ( X ( z ))) − L (cid:48) (1) H (cid:0) c ( z ) (cid:16) L (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:1)(cid:111) = H ( ρ (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) − H ( X (cid:48) ( z ) (cid:24) ρ (cid:24) ( X ( z ))) − H ( X ( z ) (cid:24) ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) + H ( ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) − H ( c ( z ) (cid:16) λ (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )]) . Noticing that the ﬁrst two terms on the right-hand side cancel out, and using the duality rule (4) for the third term,we get the integrand in (32). In other words, the integrand in (32) equals dd z P s ( z, X ) = dd z W s ( z, X ( z )) and δ W s δ X [ X (cid:48) ( z )] = (cid:90) R d z dd z W s ( X ( z ))= W s ( x BP ) − W s (∆ ∞ )= ∆ E. (33)We now show that the functional derivative of the interaction part in (29) does not contribute when η ( z, t ) isreplaced by X (cid:48) ( z ) . By directly applying the deﬁnition of the functional derivative, we ﬁnd δ W i δ X [ X (cid:48) ( z )] = (cid:90) R d z (cid:110) H (cid:0) c (cid:16) (cid:2) λ (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:16) ( ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) (cid:3)(cid:1) − H (cid:0) c (cid:16) (cid:2) λ (cid:16) ( (cid:90) d u ρ (cid:24) ( X ( z + u ))) (cid:16) ( (cid:90) d s ρ (cid:48) (cid:24) ( X ( z + s )) (cid:24) X (cid:48) ( z + s )) (cid:3)(cid:1)(cid:111) . (34)We notice that the integrand is a total derivative; namely, it is equal to L (cid:48) (1) dd z (cid:110) H (cid:0) c (cid:16) L (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:1) − H (cid:0) c (cid:16) L (cid:16) ( (cid:90) d u ρ (cid:24) ( X ( z + u ))) (cid:1)(cid:111) Due to the boundary conditions, we have lim z →−∞ X ( z ) = lim z →−∞ X ( z + u ) = ∆ ∞ and lim z → + ∞ X ( z ) =lim z → + ∞ X ( z + u ) = x BP , and we can conclude that the total derivative integrates to zero, thus δ W i δ X [ X (cid:48) ( z )] = 0 . (35)Finally, replacing (31), (33), and (35) in (29) we get the simple relationship v (cid:90) R d z H (cid:0) ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z ) (cid:24) (cid:1) = ∆ E (36)which yields the velocity formula (15).V. A PPLICATIONS TO SPECIFIC CHANNELS AND COMPARISONS WITH N UMERICAL E XPERIMENTS

A. Binary Erasure Channel (BEC)

The formula for the velocity, when transmission takes place over the BEC, can be obtained by directly simplifyingthe general formula in (15). We note that, since the BEC yields a scalar system, one can also use the formula for

August 1, 2018 DRAFT4 general scalar systems in Section VII (that covers cases beyond coding theory also). We will suppose that theunderlying code LDPC ( λ, ρ ) is such that the DE equation has a single non-trivial ﬁxed point x BP (cid:54) = 0 . Furthermore,we ﬁx (cid:15) BP ≤ (cid:15) ≤ (cid:15) MAP (recall the channel entropy reduces to H ( c ) = h = (cid:15) here).The channel distribution can be written as c = (cid:15) ∆ + (1 − (cid:15) )∆ ∞ , and the proﬁle is of the form x ( z, t ) = x ( z, t )∆ + (1 − x ( z, t ))∆ ∞ where ≤ x ( z, t ) ≤ is the scalar erasure probability at position z and time t . Thistends to a ﬁxed shape X ( z ) = X ( z )∆ + (1 − X ( z ))∆ ∞ . (37)where ≤ X ( z ) ≤ satisﬁes lim z →−∞ X ( z ) = 0 , lim z → + ∞ X ( z ) = x BP . We have also X (cid:48) ( z ) = X (cid:48) ( z )∆ − X (cid:48) ( z )∆ ∞ . (38)We also note the following identities valid for scalar maps f, g : R → [0 , (such as λ , ρ , L , R and their derivatives)  f (cid:16) ( X ( z )) = f ( X ( z ))∆ + (1 − f ( X ( z )))∆ ∞ ,g (cid:24) ( X ( z )) = (1 − g (1 − X ( z ))∆ + g (1 − X ( z ))∆ ∞ . (39)Let us compute the denominator of (15). Using (3), (38), and (39) we have ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z ) (cid:24) = { (1 − ρ (cid:48) (1 − X ( z )))∆ + ρ (cid:48) (1 − X ( z ))∆ ∞ } (cid:24) { X (cid:48) ( z )∆ − X (cid:48) ( z )∆ ∞ } (cid:24) = { (1 − ρ (cid:48) (1 − X ( z )))∆ + ρ (cid:48) (1 − X ( z ))∆ ∞ } (cid:24) { X (cid:48) ( z ) ∆ ∞ − X (cid:48) ( z ) ∆ } = (1 − ρ (cid:48) (1 − X ( z ))) X (cid:48) ( z ) ∆ − (1 − ρ (cid:48) (1 − X ( z ))) X (cid:48) ( z ) ∆ + ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) ∆ ∞ − ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) ∆ } = ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) ∆ ∞ − ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) ∆ , and since H (∆ ) = 1 , H (∆ ∞ ) = 0 , and the entropy functional is linear, we obtain the denominator of (15) as H (cid:0) ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z ) (cid:24) (cid:1) = − ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) . For the numerator of (15), we have ∆ E = W BEC ( x BP ) − W BEC (0) , where the single system potential on the BEC isobtained from (6) using again (3) and (39). The exercise yields W BEC ( x ) = 1 R (cid:48) (1) (1 − R (1 − x )) − xρ (1 − x ) − (cid:15)L (cid:48) (1) L (1 − ρ (1 − x )) . Putting together these results, the velocity (15) becomes v BEC = − W BEC ( x BP ) − W BEC (0) (cid:82) R d z ρ (cid:48) (1 − X ( z )) X (cid:48) ( z ) . (40)(Note that, with our normalizations, W BEC (0) = 0 for all (cid:15) .) The erasure proﬁle X ( z ) has to be computed from theone-dimensional integral equation X ( z ) − v BEC X (cid:48) ( z ) = (cid:15) (cid:90) d u λ (cid:0) − (cid:90) d s ρ (1 − X ( z − u + s )) (cid:1) . (41) August 1, 2018 DRAFT5

Obviously, the velocity vanishes when (cid:15) → (cid:15) MAP since then W BEC ( x BP ) → W BEC (0) = 0 . An important quantity is theslope of the velocity at (cid:15)

MAP . To compute it, we remark that W BEC has an explicit dependence on (cid:15) , as well as animplicit one through x BP ( (cid:15) ) . Thus, d W BEC d (cid:15) = ∂W BEC ∂(cid:15) + ∂W BEC ∂x BP d x BP d (cid:15) = ∂W BEC ∂(cid:15) = − L (cid:48) (1) L (1 − ρ (1 − x BP )) , so that, for (cid:15) → (cid:15) MAP , the Taylor expansion to ﬁrst order yields W BEC ( x BP ) ≈ − ( (cid:15) − (cid:15) MAP ) 1 L (cid:48) (1) L (1 − ρ (1 − x MAP )) . Note that we used x BP ( (cid:15) ) → x BP ( (cid:15) MAP ) = x MAP where x MAP is deﬁned as the point x (cid:54) = 0 where the potential isstationary and vanishes. This yields the linear approximation for the velocity v l = ( (cid:15) − (cid:15) MAP ) L (cid:48) (1) L (1 − ρ (1 − x MAP )) (cid:82) R d z ρ (cid:48) (1 − X MAP ( z ))( X (cid:48) MAP ( z )) , (42)where X MAP ( · ) is the erasure probability proﬁle obtained when (cid:15) = (cid:15) MAP .It is interesting to compare (40) with the upper bound of Theorem 1 in [17] for a discrete system v B = α W BEC ( x BP ) − W BEC (0) (cid:80) z ∈ Z ρ (cid:48) (1 − x z )( x z − x z − ) , α ≤ . (43)In [17] the derivation of the bound yields α ≤ (for L c and w large enough) but it is conjectured based onnumerical simulations that α = 1 would be a tight bound. Obviously, (40) and (43) are consistent. We note forreference that another upper bound is also derived in [17], namely v B = α ( W BEC ( x BP ; (cid:15) ) − W BEC (0; (cid:15) ))2 W BP ( x u ; (cid:15) ) − W BP ( x BP ; (cid:15) ) , where and x u and x BP are respectively the non-trivial unstable and stable ﬁxed points of the potential of the uncoupledsystem W s ( · ; · ) . We do not discuss this further because in practice this turns out to be a very loose bound.We now compare the analytical velocity formula (40) with the empirical velocity (called v e below) obtained bysimulating the discrete DE equation; we show that it provides a very good approximation for the (real) value ofthe velocity even for relatively small values of w . For the simulations, we consider the spatially coupled (3 , and (4 , -regular code ensembles, as well as two irregular LDPC codes (described later). We run the simulations forseveral values of the chain length L c = 256 , and the window size w = 3 , , , . The empirical velocity is thevelocity calculated from erasure probability proﬁles of the discrete DE equation. Consider two (discrete) proﬁles x ( t ) and x ( t ) at any two iterations t and t , respectively, with t < t . After the transient phase is over, theproﬁles are identical up to translation. We call a “kink” the part of the proﬁle where there is a fast increase from to x BP in the erasure probability. The kink “position” is the coordinate such that the height is equal to x BP / , and ∆ z is the difference of two such positions (on two different proﬁles). Then, the empirical velocity v e is v e = ∆ zw ( t − t ) (44) August 1, 2018 DRAFT6

In practice, we get reliable results by taking pairs of proﬁles separated by iterations and averaging this ratioover every consecutive pair of proﬁles. Note that we normalize the velocity by w to be able to compare systemswith different window widths.In Table I, we give empirical values v e of the normalized velocities for the spatially coupled (4 , -regular codeensemble, with transmission over the BEC(0.6), when the spatial length is positions and the channel parameteris ﬁxed to (cid:15) = 0 . (between the BP and MAP thresholds), for different values of the window size w . We observe thatthe result of our formula v BEC gives a good estimate of the empirical velocity v e for all the demonstrated values ofthe window size. We also observe that the linear approximation gives a good estimate when the channel parameteris not too far from the MAP threshold (cid:15) MAP . The upper bound v B [17] gives a better estimate as the window sizegrows larger. TABLE IN

ORMALIZED VELOCITIES FOR

LDPC( x , x ) ON THE

BEC

WITH A SPATIAL LENGTH , FOR SEVERAL w SIZES , AND (cid:15) = 0 . . T HEVALUES IN THE TABLE CAN BE COMPARED TO v BEC = 0 . AND v l = 0 . . w = 3 w = 5 w = 8 w = 16 v e v B /α In Table II, we give empirical values of the normalized velocities for the spatially coupled (3 , -regular codeensemble, with transmission over the BEC( (cid:15) ), when the spatial length equals and the window size w = 8 , fordifferent values of the channel parameter (cid:15) . One can compare these values with those in [17] (up to a factor equalto w due to the normalization). The result of the formula v BEC gives the closest estimate to the empirical velocity v e for all values of (cid:15) . TABLE IIN

ORMALIZED VELOCITIES FOR

LDPC( x , x ) ON THE

BEC

FOR SPATIAL LENGTH , w = 8 , AND SEVERAL (cid:15)

VALUES . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . (cid:15) = 0 . v e v BEC v l v B /α v B /α Figures 4 and 5 show the empirical velocity v e , the analytical velocity v BEC , and the upper bound v B for thespatially coupled (3 , -regular code ensemble, with spatial length and window size w = 3 . We remark thatour formula ﬁts very well, for the (3 , -regular code, with the empirical velocity for all values of the channel August 1, 2018 DRAFT7

Channel parameter epsilon

Velocity

Empirical velocityAnalytical velocityUpper bound

Fig. 4. Normalized velocities v e , v BEC , and v B /α (in the order of the legend) for the (3 , -regular ensemble with spatial length , windowsize w = 3 , and (cid:15) BP = 0 . < (cid:15) < (cid:15) MAP = 0 . . Channel parameter epsilon

Velocity

Empirical velocityAnalytical VelocityUpper bound

Fig. 5. Normalized velocities v e , v BEC , and v B /α (in the order of the legend) of the decoding proﬁle for the (4 , -regular ensemble of spatiallength , window size w = 3 , and (cid:15) BP = 0 . < (cid:15) < (cid:15) MAP = 0 . . parameter (cid:15) ∈ [ (cid:15) BP , (cid:15) MAP ] = [0 . , . . The agreement is quite good also for the (4 , -regular code and very goodfor more than half of the interval [ (cid:15) BP , (cid:15) MAP ] ≈ [0 . , . .We also illustrate the results for two irregular code ensembles in Figures 6 and 7. The ﬁrst one has node degreedistributions L ( x ) = 0 . x + 0 . x + 0 . x and R ( x ) = x , spatial length , and window size w = 4 . Theagreement between v BEC and v e is excellent for the whole range (cid:15) ∈ [ (cid:15) BP , (cid:15) MAP ] = [0 . , . . The second onehas L ( x ) = 0 . x + 0 . x + 0 . x and R ( x ) = 0 . x + 0 . x , spatial length , and window size w = 3 . Theagreement between the velocities is also very good for most of the range (cid:15) ∈ [ (cid:15) BP , (cid:15) MAP ] = [0 . , . . B. Gaussian Approximation (GA)

DE equations relate probability densities and as such we may need to track an inﬁnite set of parameters (exceptfor the BEC where the space of densities can be parametrized by a single real number). In many situations, suchas the case when we have large degrees, for example, the densities are well approximated by Gaussians, whichenables us to project the DE equations down to a low dimensional space. There are several variants of the Gaussianapproximation (see for example [31], [18], [19]), and here we use it in a form called the “reciprocal channelapproximation” proposed in [18], [19].

August 1, 2018 DRAFT8

Channel parameter epsilon

Velocity

Empirical velocityAnalytical velocityUpper bound

Fig. 6. Normalized velocities v BEC , v e , and v B /α for an ensemble with L ( x ) = 0 . x + 0 . x + 0 . x , R ( x ) = x , spatial length ,window size w = 4 , and (cid:15) BP = 0 . < (cid:15) < (cid:15) MAP = 0 . . Channel parameter epsilon

Velocity

Empirical velocityAnalytical velocityUpper bound

Fig. 7. Normalized velocities v BEC , v e , v B /α for an ensemble with L ( x ) = 0 . x + 0 . x + 0 . x , R ( x ) = 0 . x + 0 . x , spatial length , window size w = 3 , and (cid:15) BP = 0 . < (cid:15) < (cid:15) MAP = 0 . . The idea is to assume that the densities of the LLR messages appearing in the DE equations are symmetricGaussian densities. Such densities take the form d x ( α ) = d α √ πσ exp (cid:16) − ( α − m ) σ (cid:17) , (45)with the mean m and variance σ satisfying σ = 2 m . Furthermore, the channel density c is replaced by thatcorresponding to a BIAWGNC( σ n ) with the same entropy H ( c ) . Density evolution can then conveniently beexpressed in terms of the entropies p ( t ) z = H ( x ( t ) z ) . This is done as follows. Let ψ ( m ) denote the entropy of asymmetric Gaussian density of mean m given by ψ ( m ) = 1 √ πm (cid:90) R d z e − ( z − m )24 m log (1 + e − z ) . (46)Thus ψ − ( p ) denotes the mean of a symmetric Gaussian density x of entropy p = H ( x ) . Take two symmetricGaussian densities x and x of means m and m and entropies p = ψ ( m ) and p = ψ ( m ) . We have, in For indications on the numerical implementation of this function see [27], pp.194 and 237.

August 1, 2018 DRAFT9 z p ( z ) Fig. 8. The proﬁle of entropies p ( z, t ) plotted every 10 iterations starting from iteration 20. We take the (3 , -regular LDPC code with chainlength L c = 30 , window size w = 3 , and we consider the BIAWGN channel with mean ψ − ( H ( c )) = 2 . . general, H ( x (cid:16) x ) = ψ ( ψ − ( p ) + ψ − ( p )) , (47)which just expresses the fact that a usual convolution of two Gaussian densities of means m and m is a Gaussiandensity of mean m + m . On the other hand x (cid:24) x is not exactly Gaussian so there is no exact formula but theidea here is to preserve the duality rule H ( x (cid:24) x ) + H ( x ⊗ x ) = H ( x ) + H ( x ) . Writing this relation as − H ( x (cid:24) x ) = H ((∆ − x ) (cid:16) (∆ − x )) , and noting that H (∆ − x ) = 1 − p , H (∆ − x ) = 1 − p suggests the approximation H ( x (cid:24) x ) = 1 − ψ ( ψ − (1 − p ) + ψ − (1 − p )) . (48)Looking at the entropies of the DE equations (47) and (48) imply (we will limit ourselves to regular codes forsimplicity)  H ( x ( t ) ) = H ( c (cid:16) y ( t ) (cid:16) ( (cid:96) − ) ,H ( y ( t +1) ) = H ( x ( t ) (cid:24) ( r − ) . (49)and setting p ( t ) = H ( x ( t ) ) , q ( t ) = H ( y ( t ) we ﬁnd  p ( t ) = ψ (cid:0) ψ − ( H ( c )) + ( (cid:96) − ψ − ( q ( t ) ) (cid:1) ,q ( t +1) = 1 − ψ (cid:0) ( r − ψ − (1 − p ( t ) ) (cid:1) . (50)These equations can be combined into p ( t +1) = ψ (cid:16) ( (cid:96) − ψ − (cid:0) − ψ (( r − ψ − (1 − p ( t ) )) (cid:1) + ψ − ( H ( c )) (cid:17) . (51)The corresponding potential function is easily found from (6) W GA ( p ) = 1 r (cid:0) − ψ (cid:0) rψ − (1 − p ) (cid:1)(cid:1) + ψ (cid:0) rψ − (1 − p ) (cid:1) − ψ (cid:0) ( r − ψ − (1 − p ) (cid:1) − (cid:96) ψ (cid:0) ψ − ( H ( c )) + (cid:96)ψ − (cid:0) − ψ (( r − ψ − (1 − p )) (cid:1)(cid:1) . (52) August 1, 2018 DRAFT0

For the coupled system, we denote by p z the average over positions { z, . . . , z + w } of the entropy of symmetricGaussian densities emanating from the variable nodes. The coupled DE equations then take the form p ( t +1) z = 1 w w − (cid:88) i =0 ψ (cid:16) ( (cid:96) − ψ − (cid:16) w w − (cid:88) j =0 (cid:0) − ψ (( r − ψ − (1 − p ( t ) z − i + j )) (cid:1)(cid:17) + ψ − ( H ( c )) (cid:17) . (53)This coupled recursion can be solved with appropriate boundary conditions and one observes a scalar wavepropagation, as shown in Figure 8.We are now ready to discuss the application to the velocity formula. The continuum limit is obtained exactly asin Section III-A. The assumption that the density x ( z, t ) tends to a ﬁxed shape X ( z − v GA t ) after the transient phaseimplies that its entropy p ( z, t ) tends to P ( z − v GA t ) ≡ H ( X ( z − v GA t )) , where P ( z ) is a scalar function (independentof initial conditions) satisfying the integral equation P ( z ) − v GA P (cid:48) ( z ) = (cid:90) d u ψ (cid:16) ψ − ( H ( c )) + ( (cid:96) − (cid:96) − ψ − (cid:0) − (cid:90) d s ψ (cid:0) ( r − ψ − (1 − P ( z − u + s )) (cid:1)(cid:1)(cid:17) . (54)and the boundary conditions lim z →−∞ P ( z ) = 0 and lim z → + ∞ = p BP where p BP is the non-trivial ﬁxed point of(51). We will now show that the velocity formula reduces to v GA = − W GA ( p BP ) − W GA (0)( r − (cid:82) R d z P (cid:48) ( z ) ψ (cid:48)(cid:48) (( r − ψ − (1 − P ( z ))) ψ (cid:48) ( ψ − (1 − P ( z ))) . (55)To derive (55), we consider the denominator in (15) and write it as follows ( r − H (cid:0) x ( z ) (cid:24) ( r − (cid:24) x (cid:48) ( z ) (cid:24) (cid:1) = ( r −

1) lim δ → δ (cid:110) H (cid:0) x ( z ) (cid:24) ( r − (cid:24) x ( z + δ ) (cid:24) (cid:1) − H (cid:0) x ( z ) (cid:24) ( r − (cid:24) x ( z + δ ) (cid:24) x ( z ) (cid:1) + H (cid:0) x ( z ) (cid:24) ( r − (cid:24) x ( z ) (cid:24) (cid:1)(cid:111) . (56)Computing each entropy in the Gaussian approximation, we ﬁnd for the bracket on the right-hand side δ (cid:110)(cid:2) − ψ (( r − ψ − (1 − p ( z )) + 2 ψ − (1 − p ( z + δ ))) (cid:3) − (cid:2) − ψ (( r − ψ − (1 − p ( z )) + ψ − (1 − p ( z + δ ))) (cid:3) + (cid:2) − ψ ( rψ − (1 − p ( z ))) (cid:3)(cid:111) . (57)In Appendix B, we compute the limit of this term when δ → by appropriate Taylor expansions and ﬁnd − (cid:32) d P ( z )d z (cid:33) ψ (cid:48)(cid:48) (cid:0) ( r − ψ − (1 − P ( z )) (cid:1)(cid:16) ψ (cid:48) ( ψ − (1 − P ( z ))) (cid:17) . (58)This concludes the derivation of (55).Table III gives a comparison of analytical and empirical velocities, v GA and v e, GA , obtained for the (3 , and the (4 , -regular ensembles, for a spatial length of and a window size w = 3 for different values of ψ − ( H ( c )) = σ n / (twice the signal to noise ratio). We also plot both velocities for the (3 , -regular ensemble for the sameparameters in Figure 9. We conjecture that the errors incurred from these plots are due to numerical errors involvedin computing the functions ψ and its inverse. August 1, 2018 DRAFT1

Channel parameter

Velocities

Empirical velocityAnalytical velocity

Fig. 9. Normalized velocities v GA and v e for the spatially coupled (3 , -regular ensemble within the Gaussian approximation framework, fora spatial length of and w = 3 , as a function of ψ − ( H ( c )) = 2 /σ n .TABLE IIIN ORMALIZED VELOCITIES OF THE WAVES WITHIN THE G AUSSIAN APPROXIMATION ON THE (3 , AND (4 , - REGULAR CODE ENSEMBLESWITH L c = 100 , w = 3 . /σ n v GA (3 , v e (3 , v GA (4 , v e (4 , VI. A

PPLICATION TO S CALING L AWS FOR F INITE -L ENGTH C OUPLED C ODES

The authors in [20] propose a scaling law to predict the error probability of a ﬁnite-length spatially coupled ( (cid:96), r, L c ) code when transmission takes place over the BEC. The derived scaling law depends on scaling parameters ,one of which we will relate to the velocity of the decoding wave. The ( (cid:96), r, L c ) ensemble considered in [20] differsslightly from the purely random ensemble we consider in this work. However, as we will see, our formula forthe velocity yields results that are reasonably good for this application. We brieﬂy describe this ensemble and thescaling law.The ( (cid:96), r, L c ) ensemble combines the beneﬁts of purely random codes (that we consider in this work) andprotograph-based codes [32]. The randomness involved in the construction makes the ensemble relatively easy toanalyze, and the structure added to the construction due to its similarity to protograph-based codes improves theperformance of the code. The ensemble is constructed as follows: Make L c + w copies of an uncoupled code atpositions z = − w + 1 , . . . , L c . All edges are erased then reconnected such that a variable node at position z hasexactly one edge with each set of check nodes at positions z + i , where i = 0 , . . . , (cid:96) − . The check nodes arechosen such that the regularity of their degree is maintained. Therefore, every variable node has (cid:96) emanating edgesand every check node has r such edges.We consider transmission over the BEC. In this case, the BP decoder can be seen as a peeling decoder [33]. August 1, 2018 DRAFT2

Whenever a variable node is decoded, it is removed from the graph along with its edges. One way to track thispeeling process is to analyze the evolution of the degree distribution of the residual graph across iterations, whichserves as a sufﬁcient statistic. This statistic can be described by a system of differential equations, whose solutiondetermines the mean and variance of the fraction of degree-one check nodes and the variance around this mean atany time during the decoding process. We call ˆ r the mean.It has been shown in [20] that there exists a steady state phase where the mean and the variance are constant.It is exactly during this phase that one can observe the progression of the soliton. We note that here we considerone-sided termination instead of two-sided termination (as considered in [20]), so the fraction ˆ r here is equal tohalf the fraction called ˆ r ( ∗ ) in [20].Let (cid:15) ( (cid:96),r,L c ) denote the BP threshold of the ﬁnite-size ( (cid:96), r, L c ) ensemble (for large L c this is close to (cid:15) MAP dueto threshold saturation). We can write the ﬁrst-order Taylor expansion of ˆ r (cid:12)(cid:12) (cid:15) around (cid:15) < (cid:15) ( (cid:96),r,L c ) as ˆ r (cid:12)(cid:12) (cid:15) ≈ ˆ r (cid:12)(cid:12) (cid:15) ( (cid:96),r,Lc ) + γ ∆ (cid:15) + O (∆ (cid:15) ) . where ∆ (cid:15) = (cid:15) ( (cid:96),r,L c ) − (cid:15) . Thus, for a given (cid:15) < (cid:15) ( (cid:96),r,L c ) and since ˆ r (cid:12)(cid:12) (cid:15) ( (cid:96),r,Lc ) = 0 (by deﬁnition), we obtain γ ≈ ˆ r (cid:12)(cid:12) (cid:15) ∆ (cid:15) . This parameter γ enters in the scaling law and is therefore quite important. So far, it has been determined onlyexperimentally. It would clearly be desirable to have a theoretical handle on γ . It is argued in [20] that γ ≈ ¯ γ where ¯ γ = x BP /c and c is a real positive constant that behaves like ∆ (cid:15)/v BEC , i.e., ¯ γ ≈ x BP v BEC ∆ (cid:15) (59)It is expected that this formula becomes exact in an asymptotic limit where threshold saturation takes place (cid:15) ( (cid:96),r,L c ) → (cid:15) MAP . Using the linearization (42), we obtain ¯ γ → x MAP L (cid:48) (1) L (1 − ρ (1 − x MAP )) (cid:82) R d z ρ (cid:48) (1 − X MAP ( z ))( X (cid:48) MAP ( z )) . (60)The parameter ¯ γ is simply equal to the erasure probability times the slope of the velocity at (cid:15) MAP .We compare the values of γ and ¯ γ for different values of (cid:96) and r , at a channel parameter (cid:15) = (cid:15) ( (cid:96),r,L c ) − . ,in Table IV. The experimental values of γ are taken from [20] and, for those of ¯ γ , we use the analytical velocity(40). We observe that the numbers roughly agree. There are two reasons that can explain the discrepancies. Firstly,we derive the velocity for the purely random spatially coupled graph ensemble whereas the ensemble considered in[20] is more structured. Note also that as (cid:96) , r increase, the window size of the structured ensemble increases, so theﬁnite size effects at ﬁxed L c = 100 may be more marked. Secondly, expression (59) is valid when (cid:15) → (cid:15) ( (cid:96),r,L c ) ,whereas in IV ∆ (cid:15) = 0 . , which is relatively large (this choice in [20] is due to stability issues in numericalintegration techniques when (cid:15) → (cid:15) ( (cid:96),r,L c ) ). We conjecture that the second issue is the dominant reason for thedifference between the values of γ and ¯ γ and that, in fact, the velocity for the structured ensemble is not verydifferent from the one predicted by our formula (40). August 1, 2018 DRAFT3

TABLE IVV

ALUES OF γ AND ¯ γ FOR (cid:15) ( (cid:96),r,L ) − (cid:15) = 0 . , L c = 100 , AND SEVERAL VALUES OF (cid:96)

AND r l r (cid:15) MAP γ ¯ γ VII. V

ELOCITY FOR G ENERAL S CALAR C OUPLED S YSTEMS

In this section, we consider general scalar spatially coupled systems. That is, we do not restrict ourselves tocoding problems; however, we only consider systems in which the “density evolution type” analysis of messagepassing algorithms involves scalar values. Our main result is again a general formula for the velocity of the solitonin the framework of general scalar coupled systems. There are numerous systems that fall in this class - codingwith transmission on the BEC being one of the simplest - and in Sections VII-D, VII-E we will illustrate our resultswith two examples, namely generalized LDPC codes on the BEC and BSC, as well as compressive sensing.

A. General scalar systems

We adopt the framework and notation in [21]. We denote by (cid:15) ∈ [0 , (cid:15) max ] where (cid:15) max ∈ (0 , ∞ ) the interval ofvalues for the control parameter (cid:15) . Consider bounded, smooth functions that are increasing in both their arguments g : [0 , x max ( (cid:15) )] × [0 , (cid:15) max ] → [0 , y max ( (cid:15) )] and f : [0 , y max ( (cid:15) )] × [0 , (cid:15) max ] → [0 , x max ( (cid:15) )] where x max ( (cid:15) ) , y max ( (cid:15) ) ∈ (0 , ∞ ) and y max ( (cid:15) ) = g ( x max ( (cid:15) ); (cid:15) ) . The scalar recursions that interest us are x ( t +1) = f ( g ( x ( t ) ; (cid:15) ); (cid:15) ) , (61)where t ∈ N is the iteration number. The recursion is initialized with x (0) = x max . Since f ( g ([0 , x max ( (cid:15) )]))) ⊂ [0 , x max ( (cid:15) )] , the initialization of (61) implies that x (1) ≤ x (0) = x max and more generally x ( t +1) ≤ x ( t ) . Thus x ( t ) will converge to a limiting value x ( ∞ ) and this limit is a ﬁxed point since f and g are continuous. The ﬁxed pointsof the recursion (61) can be described as stationary points of a single system potential function U s deﬁned as U s ( x ) = xg ( x ; (cid:15) ) − G ( x ; (cid:15) ) − F ( g ( x ; (cid:15) ); (cid:15) ) , (62)where F ( x ; (cid:15) ) = (cid:82) xg (0; (cid:15) ) d s f ( s ; (cid:15) ) and G ( x ; (cid:15) ) = (cid:82) x d s g ( s ; (cid:15) ) . Without loss of generality, this function is normalizedso that U s ( x ) = 0 . August 1, 2018 DRAFT4

We deﬁne x good ( (cid:15) ) as the ﬁxed point of the recursion (61) that is reached with an initialization x (0) = 0 .Furthermore, the algorithmic threshold is deﬁned as (cid:15) a = sup { (cid:15) | x ( ∞ ) = x good } . (63)The monotonicity of f and g implies that, for (cid:15) < (cid:15) a , the basin of attraction of x good is the whole interval [0 , x max ( (cid:15) )] .Moreover x good is the unique stationary point of the potential function and is a minimum since it is an attractiveﬁxed point. For (cid:15) > (cid:15) a we have x ( ∞ ) (cid:54) = x good and we set x ( ∞ ) = x bad (where x bad depends on (cid:15) ). Note that this is anattractive ﬁxed point and is thus a (local) minimum of U s ( x ) . The two attractive ﬁxed points are separated by atleast one unstable ﬁxed point x unst which is a local maximum of U s ( x ) . We henceforth assume that there does notappear any other ﬁxed point besides x good , x unst , and x bad . With this assumption in mind, we deﬁne the energy gap as ∆ E = U s ( x bad ) − U s ( x good ) . (64)and the potential threshold as the unique value (cid:15) pot such that ∆ E = 0 (it can be shown that ∆ E in non-increasingin (cid:15) ).The corresponding spatially coupled recursions are obtained by placing L c + w replicas of the single system onthe spatial positions z ∈ {− w + 1 , . . . , L c } and coupling them with a uniform coupling window of size w . Thecoupled recursion takes the form x ( t +1) z = 1 w w − (cid:88) j =0 f (cid:0) w w − (cid:88) k =0 g ( x ( t ) z − j + k ; (cid:15) ); (cid:15) (cid:1) . (65)Motivated by the phenomenology observed in many examples (e.g. for the BEC or for compressve sensing), inorder to study the stationary phase where a soliton appears, we ﬁx the boundary conditions as x ( t ) z = x good , for z = {− w + 1 , . . . , − } and all t ∈ N . The initialization of the recursion is x (0) z = x bad , for z = { , . . . , L c } . Thecorresponding potential functional is given by U ( x ) = L c (cid:88) z = − w +1 (cid:0) x z g ( x z ; (cid:15) ) − G ( x z ; (cid:15) ) (cid:1) − L c (cid:88) z = − w +1 F (cid:16) w w − (cid:88) i =0 g ( x z + i ; (cid:15) ); (cid:15) (cid:17) , (66)where x = ( x − w +1 , . . . , x L c ) . The ﬁxed point equation (65) can be obtained by setting the derivative with respectto x of the potential U c ( x ) to zero.The spatially coupled recursions (65) display the threshold saturation property. Namely, for all (cid:15) < (cid:15) pot the ﬁxedpoint x ( ∞ ) z , z = − w + 1 , . . . , L , of the recursion (65) is equal to a constant proﬁle x good . In the remainder of thissection, we consider the range (cid:15) ∈ [ (cid:15) BP , (cid:15) MAP ] . It is for these values of the parameter (cid:15) that a soliton propagating atﬁnite speed is observed, after a transient phase lasting only for a few iterations. The soliton is a kink with a frontat position z front , making a quick transition of width O (2 w ) , between the two values x ( t ) z ≈ x good for z << z front and x ( t ) z ≈ x bad for z >> z front .The simplest example to keep in mind for the setting described above, as well as for the next paragraph, is thecase of LDPC ( λ, ρ ) codes with transmission over the BEC ( (cid:15) ) where f ( x ; (cid:15) ) = (cid:15)λ ( x ) and g ( x ; (cid:15) ) = ρ ( x ) and U s ( x ) is equal to (12). Here (cid:15) a = (cid:15) BP , (cid:15) pot = (cid:15) MAP , x good = 0 and x bad = x BP is the non-trivial BP ﬁxed point. Here, we mean the algorithmic threshold of the message passing algorithm underlying the recursion.

August 1, 2018 DRAFT5

B. Continuum limit and velocity formula for scalar systems

We consider the system in the limit L c (cid:29) w (cid:29) and formulate a continuum approximation. The coupledrecursion (65) becomes x ( z, t + 1) = (cid:90) d u f (cid:0) (cid:90) d s g ( x ( z − u + s, t ); (cid:15) ); (cid:15) (cid:1) . (67)We take the boundary condition x ( z, t ) → x good when z → −∞ and x ( z, t ) → x bad when z → + ∞ . This boundarycondition captures the proﬁles obtained after the transient phase has passed, and is well adapted to the study of thesoliton propagation. Velocity formula for scalar systems.

As before, we assume that there exists a constant v > such that, for t → + ∞ and | z − vt | = O (1) , the proﬁle x ( z, t ) → X ( z − vt ) , where X ( z ) is independent of the initial condition x ( z, and satisﬁes lim z →−∞ X ( z ) = x good , lim z → + ∞ X ( z ) = x bad . Under this assumption, the velocity of thesoliton is v = ∆ E (cid:82) R d z g (cid:48) ( X ( z ); (cid:15) ) X (cid:48) ( z ) , (68)where the shape X ( z ) satisﬁes X ( z ) − vX (cid:48) ( z ) = (cid:90) d u f (cid:0) (cid:90) d s g ( X ( z − u + s ); (cid:15) ); (cid:15) (cid:1) (69) C. Derivation of the velocity formula (68)The derivation of (68) follows closely that in Section IV-B, so we will be quite brief. The ﬁrst step is to introducea continuum version of U ( x ) , which we call ∆ U ( x ( · , · )) . We deﬁne x ( z ) as a static (time-independent) proﬁlethat satisﬁes the boundary conditions x ( z ) → x good when z → −∞ and x ( z ) → x bad when z → + ∞ (for example,one may take a Heaviside step function). This is a reference proﬁle in order to have well-deﬁned integrals in thefollowing expression ∆ U ( x ( · , · )) = (cid:90) R d z (cid:104) { x ( z, t ) g ( x ( z, t ); (cid:15) ) − G ( x ( z, t ); (cid:15) ) − F (cid:16) (cid:90) d u g ( x ( z − u, t ); (cid:15) ); (cid:15) (cid:1) }− { x ( z ) g ( x ( z ); (cid:15) ) − G ( x ( z ); (cid:15) ) − F (cid:0) (cid:90) d u g ( x ( z − u ); (cid:15) ); (cid:15) (cid:1) } (cid:105) . As long as x ( z, t ) and x ( z ) converge to their limiting values fast enough, the integrals over the spatial axis arewell deﬁned. Evaluating the functional derivative of ∆ U [ x ( · , · ); (cid:15) ] in an arbitrary direction η ( · , · ) , we ﬁnd that (67)is equivalent to a gradient descent equation (cid:90) R d z g (cid:48) ( x ( z, t ); (cid:15) ) (cid:0) x ( z, t + 1) − x ( z, t ) (cid:1) η ( z, t ) = − δ ∆ U δX [ η ( z, t )] (70)Now we use the ansatz x ( z, t ) → X ( z − vt ) and apply (70) for the special direction η ( z, t ) = X (cid:48) ( z − vt ) . Usingalso the approximation X ( z − vt ) ≈ X ( z ) − vX (cid:48) ( z ) for small v we get (after a change of variables z → z + vt ) v (cid:90) R d z X (cid:48) ( z ) g (cid:48) ( X ( z ); (cid:15) ) = δ ∆ U δx [ X (cid:48) ( z )] . (71) Deﬁned as lim γ → γ − (∆ U ( x ( · , · ) + γη ( · , · )) − ∆ U ( x ( · , · ))) August 1, 2018 DRAFT6

We then proceed to compute the right-hand side of (70). We split the potential functional into two parts: the “singlesystem potential” U s ( x ( · , · )) that remains if we ignore the coupling effect, and the “interaction potential” U i ( x ( · , · )) that captures the effect of coupling. That is, ∆ U = U s + U i , with U s ( x ( · , · )) = (cid:90) R d z (cid:104) { x ( z, t ) g ( x ( z, t ); (cid:15) ) − G ( x ( z, t ); (cid:15) ) − F ( g ( x ( z, t ); (cid:15) ) }− { x ( z ) g ( x ( z ); (cid:15) ) − G ( x ( z ); (cid:15) ) − F ( g ( x ( z ); (cid:15) ); (cid:15) ) } (cid:105) , U i ( x ( · , · )) = (cid:90) R d z (cid:104) { F ( g ( x ( z, t ); (cid:15) ); (cid:15) ) − F ( (cid:90) d u g ( x ( z − u, t ); (cid:15) ) }− { F ( g ( x ( z ); (cid:15) ); (cid:15) ) − F (cid:16) (cid:90) d u g ( x ( z − u ); (cid:15) ); (cid:15) (cid:1) } (cid:105) . The computation of each functional derivative at x ( z, t ) → X ( z − vt ) in the direction X (cid:48) ( z − vt ) yields δ U s δX [ X (cid:48) ( z )] = (cid:90) R d z X (cid:48) ( z ) (cid:16) X ( z ) g (cid:48) ( X ( z ); (cid:15) ) − g (cid:48) ( X ( z ); (cid:15) ) f ( g ( X ( z ); (cid:15) ); (cid:15) ) (cid:17) = (cid:90) R d z dd z (cid:110) X ( z ) g ( X ( z ); (cid:15) ) − G ( X ( z ); (cid:15) ) − F ( g ( X ( z ); (cid:15) ); (cid:15) ) (cid:111) = (cid:104) U s ( X ( z )) (cid:105) + ∞−∞ = U s ( x bad ; (cid:15) ) − U s ( x good ; (cid:15) ) . (72)and δ U i δX [ X (cid:48) ( z )] = (cid:90) R d zX (cid:48) ( z ) (cid:110) f ( g ( X ( z ); (cid:15) ); (cid:15) ) g (cid:48) ( X ( z )) − f ( (cid:90) d ug ( X ( z − u ); (cid:15) ); (cid:15) ) (cid:90) d ug (cid:48) ( X ( z − u ); (cid:15) ) (cid:111) = (cid:90) R d z dd z (cid:110) F ( g ( X ( z ); (cid:15) ); (cid:15) ) − F (cid:0) (cid:90) d u g ( X ( z − u ); (cid:15) ); (cid:15) (cid:1)(cid:111) = 0 . (73)Replacing (72) and (73) in (70), we obtain the velocity formula (68). D. Generalized LDPC (GLDPC) Codes

A GLDPC code is a code represented by a bipartite graph, such that the rules of the check nodes do not dependon parity (as do usual LDPC codes) but on a primitive BCH code. An attractive property of BCH codes is that theycan be designed to correct a chosen number of errors. For instance, one can design a BCH code so that it correctsall patterns of at most e erasures on the BEC, and all error patterns of weight at most e on the binary symmetricchannel (BSC). We consider a GLDPC code with degree-2 variable nodes and degree- n check nodes whose rulesare given by a primitive BCH code of blocklength n .We give a short description of a BCH code of blocklength n and minimum distance d = 2 e +1 (see [34] for moredetails. A BCH code is a cyclic code over a ﬁnite ﬁeld GF( b β ) where b is a prime power and β is an integer. Let a be a primitive element of GF( b β ). Then each element of GF( b β ) can be written in the form a i , i ∈ N . For eachelement a i we can deﬁne a minimal polynomial m i ( · ) which is the monic polynomial over GF( b ) with smallestdegree. The generator polynomial θ ( · ) over GF( b ) of the BCH code is deﬁned as the least common multiple of m ( · ) , . . . , m d ( · ) . August 1, 2018 DRAFT7

Consider transmission on the BEC or BSC and denote by (cid:15) the channel parameter. The density evolution recursionshave been derived in [35] for both channels, based on a bounded distance decoder for the BCH code. For n and e ﬁxed, we can write the update equations of the message passing algorithm as (61) with  f ( x ; (cid:15) ) = (cid:15)x,g ( x ) = (cid:80) n − i = e (cid:0) n − i (cid:1) x i (1 − x ) n − i − . Here, we have (cid:15) max = x max = y max = 1 . Moreover, one checks easily by differentiation (with respect to x ) that thepotential function U GLDPC ( x ) of the system is given by U GLDPC ( x ) = en g ( x ; (cid:15) ) − g (cid:48) ( x ; (cid:15) ) n x (1 − x ) − (cid:15) g ( x ; (cid:15) ) . For numerical implementation purposes, it is useful to note that g (cid:48) ( x ) = x e − (1 − x ) n − e − B ( e, n − e ) , where B ( a, b ) = ( a − b − a + b − denotes the Beta function, and that g ( x ) is equal to the regularized incomplete EulerBeta function so that g ( x ) = 1 B ( e, n − e ) (cid:90) x d s s e − (1 − s ) n − e − . This potential has x = 0 as a trivial stationary point (equivalently, this is a trivial ﬁxed point of DE as canbe seen from the expressions of f and g ) and develops a non-trivial minimum at x BP (cid:54) = 0 for (cid:15) > (cid:15) BP . Notethat (cid:15) BP is found as usual as the ﬁrst horizontal inﬂexion point. The MAP threshold is given by (cid:15) MAP such that U GLDPC ( x BP ) = U GLDPC (0) = 0 .The formula for the velocity of the soliton appearing for coupled GLDPC codes is found from (68). The energygap for (cid:15) BP ≤ (cid:15) ≤ (cid:15) MAP is now ∆ E = U GLDPC ( x BP ) − U GLDPC (0) . Figure 10 shows the velocities (normalized by w ) forthe spatially coupled GLPDC code with n = 15 and e = 3 , when the coupling parameters satisfy L c + w = 500 and w = 3 . We plot the velocities for (cid:15) ∈ [ (cid:15) BP , (cid:15) MAP ] = [0 . , . . We observe that the formula for the velocityprovides a very good estimation of the empirical velocity v e . E. Compressive Sensing

Let s be a length- n signal vector where the components are i.i.d. copies of a random variable S . We take m linear measurements of the signal and assume that the measurement matrix has i.i.d Gaussian elements distributedlike N (0 , / √ n ) . We deﬁne δ = m/n as the measurement ratio and ﬁx it to a constant value when n → ∞ . Therelation between δ and the parameter (cid:15) deﬁned in Section VII-A is (cid:15) = δ − . We assume that the power of thevariable S is normalized to 1; that is, E [ S ] = 1 . We also assume that each component of the signal s is corruptedby independent Gaussian noise of variance σ = 1 / snr . To recover s one implements the so-called approximatemessage passing (AMP) algorithm.It is well-known that the analysis of the AMP algorithm is given by state evolution [10]. Let Y = √ snr S + Z where Z ∼ N (0 , , let ˆ S ( Y ) = E S | Y [ S | Y ] the minimum mean square estimator, and set mmse ( snr ) = E S,Y [( S − ˆ S ( Y, snr )) ] , August 1, 2018 DRAFT8

Channel parameter epsilon

Velocity

Empirical velocityAnalytical velocity

Fig. 10. We consider a GLDPC code with n = 15 and e = 3 , with spatial length L + w = 500 and uniform coupling window with w = 3 . Weplot the normalized velocities v GLDPC and v e as a function of the channel parameter (cid:15) when (cid:15) is between the BP threshold (cid:15) s = (cid:15) BP = 0 . and the potential threshold (cid:15) c = (cid:15) MAP ≈ . . for the mmse function. The state evolution equations (which track the mean squared error of the AMP estimate)then correspond to the recursion (61) with  f ( x, δ ) = mmse ( snr − x ) ,g ( x, δ ) = snr − snr + xδ . Here x is interpreted as the mean square error predicted by the AMP estimate of the signal. State evolutionis initialized with x = 1 which corresponds to no knowledge about the signal. We will take δ as the controlparameter. Note that we have δ max = 1 , x max = mmse (0) , y max = g ( x max ) . The potential function is equal to U CS ( x ) = − x snr + xδ + δ ln (cid:16) x snr δ (cid:17) − I (cid:16) S ; √ snr S + Z (cid:17) + 2 I (cid:16) S ; (cid:115) snr + xδ S + Z (cid:17) , where I ( A ; B ) is the mutual information between two random variables A and B . To check that this potential givesback the correct state evolution equation as a stationarity condition, we simply differentiate it with respect to x thanks to the well known relation dd snr I ( S ; √ snr S + Z ) = mmse ( snr ) .To illustrate the potential function in a concrete case we take the Bernoulli-Gaussian distribution as the priordistribution over the signal components q ( s ) = (1 − ρ ) δ ( s ) + ρ e − s / √ π , Figure 11 shows the potential function for ρ = 0 . , snr = 10 , and several values of δ (the measurement fraction).We observe that, for δ > δ AMP = 0 . , there is a unique minimum x good which is a ﬁxed point of state evolutionwhen it is initialized to x = 1 . In this phase, there are enough measurements so that the reconstruction of the signalis good and the mean square error is small. At δ AMP = 0 . a horizontal inﬂexion point develops in the potentialfunction. For δ < δ AMP a second minimum appears at a higher mean square error x bad and the reconstruction of theAMP algorithm is bad. The optimal threshold corresponding to the minimum mean square error estimator is foundwhen δ is such that the two minima of the potential are at the same height, namely δ opt is given by the solution ofthe equation U SC ( x bad ) = U SC ( x good ) . For our example one ﬁnds δ opt = 0 . . This threshold is reached by the AMPalgorithm on the spatially coupled system. August 1, 2018 DRAFT9 -8 -6 -4 -2 x -0.100.10.20.30.40.5 U CS (x) δ = 0.220 δ = 0.208 δ = 0.170 δ = 0.157 Fig. 11. Potential function for compressive sensing with a Gaussian-Bernoulli prior. The sparsity parameter is ρ = 0 . and the signal to noiseratio snr = 10 . We show the potential for several values of the measurement fraction δ . For δ > δ AMP we have a single minimum x good . At δ AMP = 0 . there is a horizontal inﬂexion point and for smaller measurement fractions a second minimum x bad appears. At δ opt = 0 . thegap ∆ E vanishes. Measurement ratio delta

Velocity

Empirical velocityAnalytical velocity

Fig. 12. We consider the compressive sensing problem with snr = 10 and Gaussian-Bernoulli prior for the signal components with sparsityparameter ρ = 0 . . We have L c + w = 250 and uniform coupling window with w = 4 . We plot the normalized velocities v CS and v e as afunction of the measurement fraction δ when δ is between the potential threshold δ opt = 0 . and δ AMP = 0 . . Fix δ ∈ [ δ opt , δ AMP ] = [0 . , . . In this regime spatially coupled state evolution develops a soliton whichrepresents the proﬁle of mean square error along the spatial direction. The formula for the velocity v CS of thissoliton is obtained from (68) where the energy gap is now ∆ E = U SC ( x bad ) − U SC ( x good ) . Figure 12 shows thevelocities (normalized by w ) for the spatially coupled compressive sensing system with snr = 10 , ρ = 0 . and with the coupling parameters satisfy L c + w = 250 and w = 4 . We plot in Figure 12 the velocities for δ ∈ [ δ opt , δ AMP ] = [0 . , . . It is clear that the formula for the velocity provides a good estimation of theempirical velocity v e . VIII. C ONCLUSION

Our formulas for the velocities of the solitons that appear in spatially coupled codes for BMS channels and forgeneral spatially coupled scalar systems (e.g. compressive sensing) rest on the approximation of the discrete systemby a continuum one. We believe that this approximation becomes exact in an asymptotic limit of inﬁnite spatiallength and window size (keeping the order L c (cid:29) w (cid:29) ). It is an interesting open problem to quantify the quality August 1, 2018 DRAFT0 of this approximation already for L c inﬁnite and w ﬁnite but large. The numerical results tend to indicate that theapproximation is already quite good for small values of w , when it is of the order of a few positions.Another important and interesting open question concerns the proof of the ansatz, namely proving that the shapeof the soliton is unique and independent of the initial condition on the proﬁle. Settling this question would showthat the velocity of the soliton is independent of the size of the seed that initiates decoding or signal reconstructionat the boundaries.The formulas for the velocity involve the whole shape of the soliton in the denominator. For optimization purposesit would be desirable to have a good approximation (or bound) on the denominator that would involve only primitivequantities related to the underlying uncoupled ensemble (such as the degree distributions, the single system potential,etc). Such an approximation scheme has been proposed in [25] for the special case of the transmission over theBEC where it works quite well close to the MAP threshold. It would be desirable to ﬁnd an extension to the moregeneral situations considered in the present paper. A PPENDIX AF UNCTIONAL DERIVATIVES

In this appendix we derive equations (21) and (32).

A. Derivation of Equ. (21)We calculate the functional derivative of ∆ W ( x ) in a direction η ( z, t ) as follows. δ ∆ W δ x [ η ( z, t )] = ∂∂γ ∆ W ( x + γη ) (cid:12)(cid:12)(cid:12) γ =0 = ∂∂γ (cid:90) R d z P ( z, x + γη ) (cid:12)(cid:12)(cid:12) γ =0 , where the function P ( · , · ) is deﬁned in (19). Then, taking the derivative with respect to γ yields (cid:90) R d z (cid:110) H ( ρ (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) + H ( ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) − H ( ρ (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) − H ( x ( z, t ) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) − H (cid:16) c ( z ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) ( x ( z + s, t )) (cid:17) (cid:16) (cid:104) (cid:90) d u ρ (cid:48) (cid:24) ( x ( z + u, t )) (cid:24) η ( z + u, t ) (cid:105)(cid:17)(cid:111) . We notice that the ﬁrst and third terms in the integral cancel out due to the commutativity of the operator (cid:24) . Byrearranging the averaging functions in the last term, we obtain (cid:90) R d z (cid:110) H ( ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) − H ( x ( z, t ) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) − H (cid:16) (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) ( x ( z − u + s, t )) (cid:17) (cid:16) (cid:104) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:105)(cid:17)(cid:111) . August 1, 2018 DRAFT1

By noticing that y = (cid:82) d u c ( z − u ) (cid:16) λ (cid:16) ( (cid:82) d s ρ (cid:24) ( x ( z − u + s, t ))) is a probability measure and a = ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) is a difference of probability measures, we can use the second duality rule in (4) H ( y (cid:24) a )+ H ( y (cid:16) a ) = H ( a ) to rewrite the above as, freely using the commutativity of (cid:24) , (cid:90) R d z (cid:110) − H ( x ( z, t ) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t )) + H (cid:16)(cid:104) (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) ( x ( z − u + s, t )) (cid:17)(cid:105) (cid:24) ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:17)(cid:111) = (cid:90) R d z ρ (cid:48) (cid:24) ( x ( z, t )) (cid:24) η ( z, t ) (cid:24) (cid:16) (cid:90) d u c ( z − u ) (cid:16) λ (cid:16) (cid:16) (cid:90) d s ρ (cid:24) ( x ( z − u + s, t )) (cid:17) − x ( z, t ) (cid:17) . B. Derivation of Equ. (32)We calculate the functional derivative of W s ( X ) in the direction X (cid:48) ( z ) as follows. δ W s δ X [ X (cid:48) ( z )] = ∂∂γ W s ( X + γ X (cid:48) ) (cid:12)(cid:12)(cid:12) γ =0 = ∂∂γ (cid:90) R d z P s ( z, X + γ X (cid:48) ) (cid:12)(cid:12)(cid:12) γ =0 = (cid:90) R d z (cid:110) H ( ρ (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) + H ( ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) − H ( ρ (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) − H ( X ( z ) (cid:24) ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )) − H ( c ( z ) (cid:16) λ (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )]) (cid:111) . We notice here that on the right side of the last equality, the ﬁrst and third terms under the integral cancel out.Using the second duality rule in (4), and noticing that X ( z ) is a probability measure and ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z ) is adifference of probability measures, we can rewrite the functional derivative as (cid:90) R d z (cid:110) H ( X ( z ) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )]) − H ( c ( z ) (cid:16) λ (cid:16) ( ρ (cid:24) ( X ( z ))) (cid:16) [ ρ (cid:48) (cid:24) ( X ( z )) (cid:24) X (cid:48) ( z )]) (cid:111) . A PPENDIX BD ERIVATION OF EXPRESSION (58)Our goal in this appendix is to show that (57) reduces to (58) when δ → . We ﬁrst reorganize (57) as follows(up to multiplication by /δ ) ψ (cid:16) ( r − ψ − (1 − p ( z )) + 2 ψ − (1 − p ( z + δ )) (cid:17) − ψ (cid:16) ( r − ψ − (1 − p ( z )) + ψ − (1 − p ( z + δ ))+ ψ − (1 − p ( z )) (cid:17) + ψ (cid:16) ( r − ψ − (1 − p ( z )) + 2 ψ − (1 − p ( z )) (cid:17) . When we Taylor expand each entropy ψ ( · · · ) around ( r − ψ − (1 − p ( z )) to second order we observe that the ﬁrst order terms cancel and what remains is − ψ (cid:48)(cid:48) (cid:0) ( r − ψ − (1 − p ( z )) (cid:1)(cid:110) ψ − (1 − p ( z + δ )) − ψ − (1 − p ( z )) − (cid:0) ψ − (1 − p ( z + δ )) − ψ − (1 − p ( z )) (cid:1) (cid:111) . This is equal to − ψ (cid:48)(cid:48) (cid:0) ( r − ψ − (1 − p ( z )) (cid:1) × (cid:16) ψ − (1 − p ( z + δ )) − ψ − (1 − p ( z )) (cid:17) . August 1, 2018 DRAFT2

Next, we write ψ − (1 − p ( z + δ )) as ψ − (1 − p ( z ) − ( p ( z + δ ) − p ( z ))) and Taylor expand ψ − ( · · · ) around − p ( z ) to obtain − ( p ( z + δ ) − p ( z )) (( ψ − ) (cid:48) (1 − p ( z ))) × ψ (cid:48)(cid:48) (cid:0) ( r − ψ − (1 − p ( z )) (cid:1) . Multiplying by /δ , taking the limit δ → , and using the relation ( ψ − ) (cid:48) ( · · · ) = 1 / ( ψ (cid:48) ( ψ − ( · · · ))) we obtain − (cid:32) d p ( z )d z (cid:33) ψ (cid:48)(cid:48) (cid:0) ( r − ψ − (1 − p ( z )) (cid:1)(cid:16) ψ (cid:48) ( ψ − (1 − p ( z ))) (cid:17) . A CKNOWLEDGMENT

The authors would like to thank Ruediger Urbanke and Markus Stinner for discussions and suggestions onapplications to ﬁnite-length scaling laws, and Tongxin Li for interactions on compressive sensing during his summerinternship in EPFL. R

EFERENCES[1] A. J. Felstrom and K. S. Zigangirov, “Time-varying periodic convolutional codes with low-density parity-check matrix,”

IEEE Transactionson Information Theory , pp. 2181–2190, 1999.[2] S. Kudekar, T. Richardson, and R. Urbanke, “Spatially coupled ensembles universally achieve capacity under belief propagation,”

IEEETransactions on Information Theory , vol. 59, no. 12, pp. 7761–7813, 2013.[3] S. Kumar, A. J. Young, N. Macris, and H. D. Pﬁster, “Threshold saturation for spatially coupled LDPC and LDGM codes on BMSchannels,”

IEEE Transactions on Information Theory , vol. 60, no. 12, pp. 7389–7415, 2014.[4] M. Lentmaier, A. Sridharan, K. S. Zigangirov, and D. J. Costello Jr, “Terminated LDPC convolutional codes with thresholds close tocapacity,” in

International Symposium on Information Theory Proceedings (ISIT) . IEEE, 2005, pp. 1372–1376.[5] M. Lentmaier, A. Sridharan, D. J. Costello Jr, and K. Zigangirov, “Iterative decoding threshold analysis for LDPC convolutional codes,”

IEEE Transactions on Information Theory , vol. 56, no. 10, pp. 5274–5289, 2010.[6] V. Aref, N. Macris, R. Urbanke, and M. Vuffray, “Lossy source coding via spatially coupled LDGM ensembles,” in

Information TheoryProceedings (ISIT), 2012 IEEE International Symposium on . IEEE, 2012, pp. 373–377.[7] V. Aref, N. Macris, and M. Vuffray, “Approaching the Rate-Distortion Limit With Spatial Coupling, Belief Propagation, and Decimation,”

IEEE Transactions on Information Theory 61 (7), 3954-3979 , vol. 61, no. 7, pp. 3954–3979, 2015.[8] S. Kudekar and H. D. Pﬁster, “The effect of spatial coupling on compressive sensing,” in . IEEE, 2010, pp. 347–353.[9] F. Krzakala, M. M´ezard, F. Sausset, Y. Sun, and L. Zdeborov´a, “Probabilistic reconstruction in compressed sensing: algorithms, phasediagrams, and threshold achieving matrices,”

Journal of Statistical Mechanics: Theory and Experiment , vol. 2012, no. 08, p. P08009, 2012.[10] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling andapproximate message passing,” in

International Symposium on Information Theory Proceedings (ISIT) . IEEE, 2012, pp. 1231–1235.[11] S. H. Hassani, N. Macris, and R. Urbanke, “Coupled graphical models and their thresholds,” in

Proceedings of Information TheoryWorkshop IEEE (Dublin) , 2010, pp. 1–5.[12] ——, “Threshold saturation in spatially coupled constraint satisfaction problems,”

Journal of Statistical Physics , vol. 150, no. 5, pp.807–850, 2013.[13] D. Achlioptas, S. H. Hassani, N. Macris, and R. Urbanke, “Bounds for random constraint satisfaction problems via spatial coupling,” in

ACM-SIAM Symposium on Discrete Algorithms (SODA) , January 2016.[14] S. H. Hassani, N. Macris, and R. Urbanke, “Chains of mean ﬁeld models,”

Journal of Statistical Mechanics: Theory and Experiment , vol.P02011, 2012.

August 1, 2018 DRAFT3 [15] F. Caltagirone, S. Franz, R. G. Morris, and L. Zdeborov´a, “Dynamics and termination cost of spatially coupled mean-ﬁeld models,”

PhysicalReview E , vol. 89, no. 1, p. 012102, 2014.[16] S. Kudekar, T. Richardson, and R. Urbanke, “Wave-like solutions of general one-dimensional spatially coupled systems,” preprintarXiv:1208.5273 , 2012.[17] V. Aref, L. Schmalen, and S. ten Brink, “On the convergence speed of spatially coupled LDPC ensembles,” in

IEEE, 2013, pp. 342–349.[18] S.-Y. Chung and G. D. J. Forney, “On the capacity of low-density parity-check codes,” in

International Symposium on Information TheoryProceedings (ISIT) . IEEE, 2001, p. 320.[19] S.-Y. Chung, “On the construction of some capacity-approaching coding schemes,” Ph.D. dissertation, MIT, 2000.[20] P. Olmos and R. Urbanke, “A scaling law to predict the ﬁnite-length performance of spatially-coupled LDPC codes,”

IEEE Transactionson Information Theory , vol. 61, no. 6, pp. 3164–3184, 2014.[21] A. Yedla, Y.-Y. Jian, P. S. Nguyen, and H. D. Pﬁster, “A simple proof of Maxwell saturation for coupled scalar recursions,”

IEEETransactions on Information Theory , vol. 60, no. 11, pp. 6943–6965, 2014.[22] K. Takeuchi, T. Tanaka, and T. Kawabata, “A phenomenological study on threshold improvement via spatial coupling,”

IEICE Transactionson Fundamentals of Electronics, Communications and Computer Sciences , vol. 95, no. 5, pp. 974–977, 2012.[23] R. El-Khatib, N. Macris, and R. Urbanke, “Displacement convexity, a useful framework for the study of spatially coupled codes,” in

Information Theory Workshop (ITW), Sevilla, Spain . IEEE, 2013, pp. 1–5.[24] R. El-Khatib, N. Macris, T. Richardson, and R. Urbanke, “Analysis of coupled scalar systems by displacement convexity,” in

InternationalSymposium on Information Theory Proceedings (ISIT), Honolulu, Hawaii . IEEE, 2014, pp. 2321–2325.[25] R. El-Khatib and N. Macris, “The velocity of the decoding wave for spatially coupled codes on bms channels,” in

Information TheoryProceedings (ISIT), 2016 IEEE International Symposium on . IEEE, 2016, pp. 2119–2123.[26] ——, “The velocity of the propagating wave for general coupled scalar systems,” in

Information Theory Workshop, Cambridge, UK , 2016.[27] T. Richardson and R. Urbanke,

Modern coding theory . Cambridge University Press, 2008.[28] A. Giurgiu, N. Macris, and R. Urbanke, “Spatial coupling as a proof technique and three applications,”

IEEE Trans. Inform. Theory ,vol. 62, no. 10, pp. 5281–5295, 2016.[29] ——, “How to prove the Maxwell conjecture via spatial coupling - A proof of concept,” in

International Symposium on InformationTheory Proceedings (ISIT) . IEEE, 2012, pp. 458–462.[30] ——, “And now to something completely different: Spatial coupling as a proof technique,” in

International Symposium on InformationTheory Proceedings (ISIT) . IEEE, 2013, pp. 2443–2447.[31] S.-Y. Chung, T. Richardson, and R. Urbanke, “Analysis of sum-product decoding of low-density parity-check codes using a Gaussianapproximation,”

IEEE Transactions on Information Theory , vol. 47, no. 2, pp. 657–670, 2001.[32] J. Thorpe, “Low-density parity-check (LDPC) codes constructed from protographs,”

IPN progress report , vol. 42, no. 154, pp. 42–154,2003.[33] M. Stinner, L. Barletta, and P. M. Olmos, “Finite-length scaling based on belief propagation for spatially coupled LDPC codes,” arXivpreprint arXiv:1604.05111 , 2016.[34] V. Pless,

Introduction to the theory of error-correcting codes . John Wiley & Sons, 2011, vol. 48.[35] Y.-Y. Jian, H. D. Pﬁster, and K. R. Narayanan, “Approaching capacity at high rates with iterative hard-decision decoding,” in

InternationalSymposium on Information Theory Proceedings (ISIT) . IEEE, 2012, pp. 2696–2700.. IEEE, 2012, pp. 2696–2700.