[PDF] Probabilistic Eigenvalue Shaping for Nonlinear Fourier Transform Transmission

Abstract

We consider a nonlinear Fourier transform (NFT)-based transmission scheme, where data is embedded into the imaginary part of the nonlinear discrete spectrum. Inspired by probabilistic amplitude shaping, we propose a probabilistic eigenvalue shaping (PES) scheme as a means to increase the data rate of the system. We exploit the fact that for an NFT-based transmission scheme the pulses in the time domain are of unequal duration by transmitting them with a dynamic symbol interval and find a capacity-achieving distribution. The PES scheme shapes the information symbols according to the capacity-achieving distribution and transmits them together with the parity symbols at the output of a low-density parity-check encoder, suitably modulated, via time-sharing. We furthermore derive an achievable rate for the proposed PES scheme. We verify our results with simulations of the discrete-time model as well as with split-step Fourier simulations.

Full PDF

aa r X i v : . [ c s . I T ] A ug Probabilistic Eigenvalue Shaping forNonlinear Fourier Transform Transmission

Andreas Buchberger, Alexandre Graell i Amat,

Senior Member, IEEE,

Vahid Aref,

Member, IEEE, and Laurent Schmalen,

Senior Member, IEEE

Abstract —We consider a nonlinear Fourier transform (NFT)-based transmission scheme, where data is embedded into theimaginary part of the nonlinear discrete spectrum. Inspiredby probabilistic amplitude shaping, we propose a probabilisticeigenvalue shaping (PES) scheme as a means to increase thedata rate of the system. We exploit the fact that for an NFT-based transmission scheme, the pulses in the time domain areof unequal duration by transmitting them with a dynamicsymbol interval and ﬁnd a capacity-achieving distribution. ThePES scheme shapes the information symbols according to thecapacity-achieving distribution and transmits them together withthe parity symbols at the output of a low-density parity-checkencoder, suitably modulated, via time-sharing. We furthermorederive an achievable rate for the proposed PES scheme. We verifyour results with simulations of the discrete-time model as wellas with split-step Fourier simulations.

Index Terms —Discrete spectrum, nonlinear Fourier transform(NFT), probabilistic shaping, soliton communication.

I. I

NTRODUCTION P ULSE propagation in optical ﬁbers is severely impairedby nonlinear effects that should be either compensatedor utilized for the design of the communication system. Thenonlinear Fourier transform (NFT) [1] provides a method totransform a signal from the time domain into a nonlinearfrequency domain (spectrum), where the channel acts as a mul-tiplicative ﬁlter on the signal. The nonlinear spectrum consistsof a continuous and a discrete part. Both parts can be used totransmit information, either separately or jointly, and severalschemes have been presented in theory and practice [1]–[6].However, very little is known so far about the probabilitydensity function (PDF) of the received signal in the nonlinearspectral domain when it is contaminated by channel noise.In [7], a simpliﬁed communication system modulating onlythe imaginary part of the eigenvalues in the discrete nonlinearspectrum was presented. For this scheme, an approximationfor the conditional PDF of the channel can be obtained inclosed form. In general, for a given channel, the capacity-achieving distribution is not known and is often different

This work was funded by the European Union’s Horizon 2020 research andinnovation programme under the Marie Skłodowska-Curie grant agreementNo. 676448.A. Buchberger is with the Department of Electrical Engineering,Chalmers University of Technology, Gothenburg, SE-412 96, Sweden andNokia Bell Labs, Lorenzstr. 10, 70435 Stuttgart, Germany, e-mail: [email protected]. Graell i Amat is with the Department of Electrical Engineering,Chalmers University of Technology, Gothenburg, SE-412 96, Sweden, e-mail:[email protected]. Aref and L. Schmalen are with Nokia Bell Labs, Lorenzstr. 10, 70435Stuttgart, Germany, e-mail: { ﬁrstname.lastname } @nokia-bell-labs.com. from the conventional distribution with equispaced signalpoints and uniform signaling. Hence, some form of shaping isrequired [8]. Two popular methods of shaping are probabilisticshaping and geometric shaping. In geometric shaping, thecapacity-achieving distribution is mimicked by optimizing theposition of the constellation points for equiprobable signal-ing [9] whereas probabilistic shaping uses uniformly spacedconstellation points and approximates the capacity-achievingdistribution by assigning different probabilities to differentconstellation points [8].The main drawback of probabilistic shaping is its practi-cal implementation. An abundance of probabilistic shapingschemes have been presented, most suffering from high de-coding complexity, low ﬂexibility in adapting the spectralefﬁciency, or error propagation. For a literature review onprobabilistic shaping, we refer the reader to [10, Section II].Recently, a new scheme called probabilistic amplitude shap-ing (PAS) has been proposed in [10]. Compared to othershaping schemes, PAS yields high ﬂexibility and close-to-capacity performance over a wide range of spectral efﬁcien-cies for the additive white Gaussian noise (AWGN) channelwhile still allowing bit-metric decoding. Although originallyintroduced for the AWGN channel, PAS can be applied toother channels with a symmetric capacity-achieving inputdistribution assuming a sufﬁciently high spectral efﬁciency.In this paper, we consider a similar NFT-based transmissionscheme to the one presented in [7], where data is embeddedinto the imaginary part of the nonlinear discrete spectrum.As a means to increase the data rate, we demonstrate that theconcept of PAS can be adapted to this NFT-based transmissionsystem. In particular, we propose a probabilistic eigenvalueshaping (PES) scheme, enabling similar low complexity andbit-metric decoding as PAS. We take advantage of the de-pendence of the pulse length on the data for the NFT-basedtransmission system and transmit each pulse as soon as theprevious one has been transmitted rather than with a ﬁxedinterval as in [7], yielding increased data rate. Accordingly,we ﬁnd the capacity-achieving input distribution, maximizingthe time-scaled mutual information (MI). For ease of notation,we refer to the maximized MI as capacity noting that it is infact the constrained capacity of a system transmitting ﬁrst-order solitons. The PES scheme then shapes the informationsymbols according to the capacity-achieving distribution by adistribution matcher (DM). The information symbols are alsoencoded by a low-density parity-check (LDPC) encoder andthe parity symbols at the output of the encoder are suitablymodulated. The resulting sequence of modulated symbols and the sequence at the output of the DM are transmitted viatime-sharing. We further derive an achievable rate for such aPES scheme. We demonstrate via discrete-time Monte-Carloand split-step Fourier (SSF) simulations, that PES performs ataround from capacity using off-the-shelf LDPC codes.The proposed PES scheme yields a signiﬁcant improvementof up to twice the data rate compared to an unshaped systemas in [7].It is important to note that although ﬁrst-order solitons donot outperform conventional coherent systems due to theirspectrally inefﬁcient pulse shape compared to a Nyquist pulseshape, they have some other advantages. For instance, the ﬁrst-order soliton transmission does not require chromatic disper-sion (CD) compensation or digital backpropagation (DBP) asdispersion and nonlinearity are balanced and hence compen-sated. This work attempts to approach the limits of currentNFT-based systems. To improve the spectral efﬁciency further,one should use higher-order solitons as well as the continuouspart of the nonlinear spectrum together [4]. However, thechannel equalization will not be as easy as the one for the ﬁrst-order solitons and the channel model is not yet fully known.The remainder of the paper is organized as follows. InSection II, we describe pulse propagation in an optical ﬁberand the NFT-based transmission scheme. In Section III, weoptimize the input distribution and in Section IV, we intro-duce and describe the proposed PES scheme and derive anachievable rate. In Section V, we present numerical results forPES, both from Monte-Carlo simulation and SSF simulation,and in Section VI we draw some conclusions.Notation: The following notation is used throughout thepaper. R {·} and I {·} denote the real and the imaginary partof a complex number, respectively, and  = √− denotes theimaginary unit. Vectors are typeset in bold, e.g., x , randomvariables (RVs) are capitalized, e.g., X , and hence vectorsof RVs are capitalized bold, e.g., X . The PDF of an RV X is written as p X ( x ) and its expectation as E X { x } . Theconditional PDF of Y given X is denoted as p Y | X ( y | x ) . Theprobability mass function (PMF) of an RV X is denoted by P X ( x ) . The transpose of a vector or matrix is given as ( · ) T .A set is denoted by a capitalized Greek letter, e.g., Λ , and itscardinality by | Λ | . We write log a ( · ) for the logarithm of base a and ln( · ) for the natural logarithm.II. N ONLINEAR F OURIER T RANSFORM - BASED T RANSMISSION S YSTEM

A. Pulse Propagation and the Nonlinear Fourier Transform

Pulse propagation in optical ﬁbers is governed by a par-tial differential equation, the stochastic nonlinear Schr¨odingerequation (NLSE),  ∂u ( τ, ℓ ) ∂ℓ +  α u ( τ, ℓ ) − β ∂ u ( τ, ℓ ) ∂τ + γu ( τ, ℓ ) | u ( τ, ℓ ) | = n ( τ, ℓ ) , where u ( τ, ℓ ) denotes the envelope of the electrical ﬁeld as afunction of the position ℓ along the ﬁber and time τ , α theattenuation, β the second order dispersion, γ the nonlinearityparameter, and n ( τ, ℓ ) is a white Gaussian process in time and in space with spectral density σ . The spectral density dependson the system and for distributed Raman ampliﬁcation is givenas σ = αK T hν , where K T is the temperature-dependentphonon occupancy factor, and hν is the average photonenergy [7]. A general closed-form solution of the stochasticNLSE does not exist. In some special cases, e.g., for noisefreeand lossless ﬁbers, special solutions like, e.g., solitons, exist.Furthermore, we consider the NLSE in normalized form in thefocusing regime, i.e., β < , under the assumption of idealdistributed Raman ampliﬁcation, i.e., α = 0 ,  ∂q ( t, z ) ∂z + ∂ q ( t, z ) ∂t + 2 q ( t, z ) | q ( t, z ) | = 0 , (1)where t = τ / p | β | L/ , z = ℓ/L , q = u √ γL/ √ , and L isthe length of the ﬁber. In this case, the NLSE is an integrablepartial differential equation for which a pair of operators,called Lax pair, can be found. The eigenvalues of such anoperator remain invariant during noiseless propagation and theLax pair can be used to solve the partial differential equation.Solutions of (1) can be uniquely represented in terms of itseigenvalues via the so-called NFT. For a given position z , theNFT of a signal q ( t ) (we drop the position z for simplicity ofpresentation) with support on the time interval t ∈ [ t , t ] , iscalculated by solving the partial differential equation ∂ v ( t, λ ) ∂t = (cid:18) − λ q ( t ) − q ( t ) ∗ λ (cid:19) v ( t, λ ) , (2)where v ( t, λ ) = (cid:0) v ( t, λ ) v ( t, λ ) (cid:1) is the eigenvector of theauxiliary operator, with boundary conditions v (1) ( t, λ ) → (cid:0) (cid:1) T e λt , as t → t v (2) ( t, λ ) → (cid:0) (cid:1) T e − λt , as t → t , and λ is the spectral component. Solving (2) gives rise to thecontinuous and discrete nonlinear spectrum ˆ q ( λ ) = b ( λ ) a ( λ ) , λ ∈ R ˜ q ( λ i ) = b ( λ i )d a ( λ ) / d λ | λ = λ i , λ i ∈ C + , respectively, where a ( λ ) = lim t → t v (2)1 ( t, λ ) e λt , b ( λ ) =lim t → t v (2)2 ( t, λ ) e − λt , and λ i are the zeros of a ( λ ) , λ i ∈ C + , a ﬁnite set of isolated complex zeros, referred to aseigenvalues. Hence, the NFT represents the signal in thenonlinear spectral domain, where the inﬂuence of the channelon the signal is a multiplicative ﬁlter.As a counterpart to the NFT that transforms a signal fromthe time domain to the nonlinear spectral domain, the inversenonlinear Fourier transform (INFT) transforms a signal fromthe nonlinear spectral domain to the time domain. For anin-depth mathematical description of the INFT, we refer theinterested reader to [1]. B. Soliton Transmission

As in [7], we embed information in the imaginary part ofthe discrete spectrum, also referred to as eigenvalues. Hence,the input of the channel is an RV X ∈ Λ = { λ , . . . , λ M } ,where Λ is the set of eigenvalues, λ i is the i th eigenvalue, and M is the order of the modulation. The eigenvalues { λ i } are INFT Channel NFT λ q ( t, q ( t, ψ Fig. 1. Block diagram of the NFT-based system. assumed to be ordered in ascending order by their imaginaryparts. Furthermore, the output of the channel is an RV Y ∈ Ψ ,where Ψ = { y ∈ C : R { y } = 0 , I { y } ≥ } . A block diagramis depicted in Fig. 1. The information embedded in a singleeigenvalue λ ∈ Λ is transformed to a time-domain signal q ( t, via the INFT where the transmitter is located at position z = 0 along the ﬁber. At position z = 1 , the receiver calculatesthe discrete spectrum ψ ∈ Ψ from the received signal q ( t, via the NFT. The time-domain signal corresponds to ﬁrst ordersolitons, i.e., q ( t,

0) = 2 I { λ } sech(2 I { λ } t ) . For the NFT to be valid, the signal must have ﬁnite support,i.e., before transmitting the next pulse, the previous one musthave returned to zero. As the pulses in general have inﬁnitetails, we truncate them when they fall below a thresholdclose to zero. We deﬁne the pulse over the smallest supportcontaining a fraction (1 − δ ) of the energy of the pulse andhence, we can formally deﬁne the pulse width as follows. Deﬁnition 1.

The pulse width of λ is deﬁned as the smallestsupport containing a fraction (1 − δ ) of the energy of the pulse, T ( λ, δ ) , I { λ } ln (cid:18) δ − (cid:19) , where < δ < . The value of the cutoff parameter δ must be chosen in a waysuch that soliton-soliton interactions are negligible. For longertransmission distances, δ decreases, i.e., the pulses must bespaced further apart. Furthermore, the condition e − I { λ } ∆( λ,δ ) = e − ln ( δ − ) ≪ (3)must be fulﬁlled [7].At this point, it is important to comment on the mem-orylessness of the system emanating from the absence ofsoliton-soliton interactions. A pulse train of well-separatedﬁrst order solitons was investigated in [7] for launch powersof − . and .

45 dBm and transmission over

500 km and . It was shown via SSF simulations that thecorrelation between the symbols at the receiver is essentiallyzero, concluding that the channel is indeed memoryless in thetransmission range of

500 km to and transmit powerrange of − . to .

45 dBm for which the model (4) isapplicable. While this approach is not a rigorous proof, theresults indicate that memorylessness is a valid assumption.Although the transmission scheme is different in [7], theunderlying condition that any two pulses need to be sufﬁcientlyseparated is the same. Hence, we can treat the NFT-basedtransmission system in this work as a memoryless channel.In a practical system, we assume distributed Raman am-pliﬁcation and ampliﬁer-induced spontaneous emission (ASE)

Normalized time N o r m . a m p lit ud e FixedDynamic

Fig. 2. Comparison of a pulse sequence with static symbol intervals anddynamic symbol intervals. noise with received power spectral density σ to compensatefor the lossy ﬁber and be able to use the NFT to relate the inputand the output. The conditional PDF of such a system has beenderived via a perturbative approach and the Fokker-Planckequation method [11] and is used to design a communicationsystem in [7]. It is given by p Y | X ( ψ | λ ) = 2 σ s I { ψ } I { λ } e − I { λ } + I { ψ } σ I p I { λ } I { ψ } σ ! , (4)where ψ is the received symbol as in Fig. 1, and I ( · ) is the modiﬁed Bessel function of the ﬁrst kind of or-der one. The power spectral density of the received ASEnoise σ is normalized and relates to real world units as σ = γ √ L σ / (cid:16)p | β | (cid:17) . The signal-to-noise ratio (SNR)is deﬁned as SNR , E X { I { λ }} /σ . It is important to notethat the model (4) assumes the noise intensity to be small suchthat it can be treated as a perturbation to the soliton. Hence, themodel is only applicable if the signal energy is not the sameorder as that of the noise. Furthermore, for very high signalpowers, (4) is no longer valid either since the impact of theinelastic scattering effects (i.e., stimulated Raman or Brillouinscattering) is not considered within the 1st-order perturbationapproach. For a detailed derivation of the model, we refer thereader to [11].In [7], the shortest possible symbol interval is deﬁned bythe pulse duration of λ , i.e., the longest pulse. However,this tends to be inefﬁcient since especially for short pulses,the guard interval between two consecutive pulses is longerthan necessary and thereby limits the data rate. Here, weexploit the effect of varying pulse lengths and transmit eachpulse as soon as the previous one has returned to zero. Thisconcept is depicted in Fig. 2, where pulse sequences withﬁxed and varying symbol interval are compared. The ﬁgureclearly shows the advantage of a varying pulse interval andalso demonstrates the aforementioned inefﬁciencies. The datarate of a system with varying symbol intervals depends on thedistribution of the data. Thus, we deﬁne the average symbolinterval as follows. Deﬁnition 2.

The average symbol interval is ¯ T ( X ) , M X k =1 p X ( λ k ) T ( λ k ) = E X { T ( λ ) } . In [7], only eigenvalues with an imaginary part larger thanzero are used. We extend this by allowing I { λ } = 0 . Inthe time domain, this results in a pulse with amplitude zero,i.e., we do not transmit anything. We deﬁne its correspondingduration as the same as the duration of the shortest pulse, T ( λ = 0) , T ( λ M ) .As any practical system can handle only a maximum peakpower and a maximum bandwidth, we enforce a peak powerconstraint which relates to a maximum eigenvalue constraint.Especially in systems with lumped ampliﬁcation and erbium-doped ﬁber ampliﬁers (EDFAs), such a constraint is requiredas eigenvalues ﬂuctuate depending on their amplitude, whichdecreases the performance [12].We note that the varying symbol interval introduces addi-tional challenges on detection. In particular, an erroneouslydetected symbol may lead to error propagation, insertion errors(detection of symbols when none was transmitted), deletionerrors (not detecting a transmitted symbol), or the loss ofsynchronization. To calculate the capacity, however, we neglectthese effects. Hence, the results can be seen as an upper boundon the performance.III. C APACITY A CHIEVING D ISTRIBUTION

From Fig. 2, it is intuitive that pulses with short durationshould be transmitted more frequently than pulses with longduration. However, shorter pulses are more perturbed by noisethan longer pulses. Hence, the optimal input distribution tothe channel as described by the conditional PDF (4) is notthe conventional uniform distribution. The channel capacity isobtained by maximizing the MI, I ( X ; Y ) , E X,Y ( log p Y | X ( Y | X ) P ˜ λ ∈ Λ p Y | X ( Y | ˜ λ ) p X (˜ λ ) !) over all possible input distributions p X ( λ ) . Here, due to thevariable transmission duration, we need to consider the MIunder a variable cost constraint ¯ T ( · ) [13], I ( X ; Y ) , I ( X ; Y )¯ T ( X ) . (5)To emphasize that the cost of a symbol is its correspondingpulse duration, we refer to the MI in the form of (5) as time-scaled MI. We can therefore deﬁne the capacity as C , sup p X ( λ ) I ( X ; Y ) (6)where we set the supremum to zero if the set of distributionstherein is empty. The capacity-achieving distribution, denotedby p ∗ X ( λ ) , is in the set for which the supremum is non-zero.As the MI I ( X ; Y ) is concave in p X ( λ ) and ¯ T ( X ) islinear in p X ( λ ) and positive, the time-scaled MI I ( X ; Y ) isquasiconcave [14, Table 2.5.2]. We can solve (6) and obtainthe corresponding capacity-achieving distribution numerically.Exemplary results of the capacity-achieving distribution areshown in Fig. 3. We note that the lowest and highest ampli-tudes are always used with equal and high probability. Forlow SNRs, only these are used, i.e., on-off keying (OOK)is optimal. Furthermore, the capacity-achieving distribution is . . I { λ } p X ( λ ) I { λ } I { λ } (a) SNR = 12 .

14 dB (b) SNR = 19 .

71 dB (c) SNR = 29 .

51 dB

Fig. 3. Optimal distribution for different SNRs. discrete and is of exponential-like shape with the exception ofa point mass at zero as it can be seen in Fig. 3.Note that C assumes memorylessness, which does notnecessarily hold due to the variable symbol interval. Hence, C is, in fact, the constraint capacity under the assumptionof a memoryless channel and the constraint of transmittingonly ﬁrst-order solitons. However, for notational simplicity, werefer to it simply as capacity with its corresponding capacity-achieving distribution.In the case of a noiseless channel, it is possible to derivea closed form solution to (6) under the assumption of a ﬁnitediscretization. Lemma 1.

Let λ , λ , . . . , λ M be M ≥ eigenvalues with ≤ I { λ } < I { λ } < . . . < I { λ M } and let T ( λ k ) be thetime of transmitting a pulse with eigenvalue λ k . Let r be theunique real positive root of the polynomial P Mk =1 x − T ( λ k ) − .Then, in the noiseless case, the capacity is obtained as C = log ( r ) and the capacity-achieving distribution is given by P ⋄ X ( λ k ) = e − ln( r ) T ( λ k ) , k = 1 , . . . , M. (7) Proof.

Suppose that the k -th eigenvalue is transmitted withprobability P k . For any ﬁxed average symbol interval ¯ T ( X ) = P k P k T ( λ k ) , where T ( λ M ) ≤ ¯ T ( X ) ≤ T ( λ ) , we areinterested in the distribution that maximizes the entropy whileleading to the average symbol duration ¯ T ( X ) . It is known thatthis distribution takes the form [15, Ch. 12] P k = e − θT ( λ k ) ξ ( θ ) (8)where ξ ( θ ) = P i e − θT ( λ i ) ensures that P k P k = 1 and θ hasto be selected such that P k P k T ( λ k ) = ¯ T . In the noiselesscase, the MI is given by I ( X ; Y ) = H ( X ) . The entropy H ( X ) then is H ( X ) =: H ( θ ) = − M X k =1 P k log (cid:18) e − θT ( λ k ) ξ ( θ ) (cid:19) = 1ln(2) M X k =1 P k ( θT ( λ k ) + ln( ξ ( θ )))= θ ¯ T ( X )ln(2) + log ( ξ ( θ )) .

10 15 20 25 30 35 4000 . . . . SNR in dB I ( X ; Y ) i nb it/ s y m bo l/ no r m . ti m e M = 2 M = 4 M = 8 M = 16 C Fig. 4. Time-scaled MI of the optimal distribution for linearly spacedconstellations with M points (colored with markers solid), and of a systemas in [7] (with markers dotted). As a reference, the capacity C is plotted aswell (black solid without markers). For the cutoff parameter, δ = 0 . wasused. The time-scaled MI hence takes the form I ( X ; Y ) = θ ln(2) + log ( ξ ( θ ))¯ T ( X )= θ ln(2) + log ( ξ ( θ )) P k P k T ( λ k )= θ ln(2) + log ( P k e − θT ( λ k ) ) P k e − θT ( λ k ) P k e − θT ( λ k ) T ( λ k ) . In order to maximize I ( X ; Y ) , we ﬁnd the optimal parameter θ by setting ξ ( θ ) = 1 . This can be seen by setting the derivativeof I ( X ; Y ) to zero, with ∂∂θ I ( X ; Y ) =log X k e − θT ( λ k ) !(cid:18) P k e − θT ( λ k ) P k T ( λ k ) e − θT ( λ k ) (cid:19) var( T ( λ )) , where var( T ( λ )) denotes the variance of T ( λ k ) for the given θ . By assumption, as all T ( λ k ) are different, the middle partof this expression is strictly positive and var( T ( λ )) > .Hence, it is easy to see that this derivative can only be zeroif P k e − θT ( λ k ) = 1 . The optimal θ is hence found by setting ξ (ˆ θ ) = 1 . Consider the polynomial f ( x ) = M X k =1 x − T ( λ k ) − . As this polynomial is monotonically decreasing for positive x ,with lim x → + f ( x ) = + ∞ and lim x → + ∞ f ( x ) = − , f ( x ) has exactly one positive real root. Let r be the unique positivereal root of f ( x ) . Then ˆ θ = ln( r ) . Inserting ˆ θ into I ( X ; Y ) and (8) proves the lemma.We clearly see that (7) is of exponential shape with anadditional point mass at zero. Furthermore, we note that theshape of the distribution is mostly caused by the variable pulse DM bi ( · ) LDPC s ( · ) timeshare u λ λ par c =[ λ , λ par ] Fig. 5. Block diagram of the PES scheme. duration. The noise then determines the optimal location andoptimal number of constellation points.For a transmission system, the MI is an upper bound on theachievable rate. In Fig. 4 we evaluate the time-scaled MI forvarious input distributions for a cutoff parameter δ = 0 . .The capacity is depicted with a black solid line. To reduce thecomplexity of implementation, we constrain the constellation Λ to M linearly spaced points from λ = 0 to λ M , i.e., λ i = ( i − λ M M − for i = 1 , . . . , M, and plot the corresponding time-scaled MI in colored solidlines with markers. We note that the time-scaled MI is veryclose to the capacity curve until it saturates. Increasing themodulation order M shows signiﬁcant increase in the time-scaled MI. For comparison purposes, we also plot the time-scaled MI for a system with ﬁxed symbol duration and conven-tional uniform distribution on a linearly spaced constellationas in [7]. We observe that the rate saturates at very low valuesand that increasing the modulation order M shows only slightimprovement.IV. P ROBABILISTIC E IGENVALUE S HAPING

In the previous section, we observed a signiﬁcant gapbetween the time-scaled MI of the system in [7] and thecapacity. This gap is referred to as shaping gap. In order toclose it, we propose a PES system as shown in Fig. 5, inspiredby PAS [10].In the PAS scheme, the sequence of uniformly distributeddata bits is mapped to a sequence of positive amplitudesdistributed half Gaussian by a DM. The binary image of thissequence is encoded by a systematic forward error correction(FEC) code, resulting in uniformly distributed parity bits,which are then used to map the sequence of half Gaussiandistributed symbols to a stream of Gaussian distributed sym-bols.As the capacity-achieving distribution p ∗ X ( λ ) is not symmet-ric, PAS cannot be directly applied here. However, in order tokeep the beneﬁts of PAS, we wish to apply the DM beforethe FEC. We describe PES in the following with referenceto Fig. 5. The binary data sequence u of length k s bits ismapped by the DM to a sequence of eigenvalues λ ∈ Λ n s of length n s distributed according to p ∗ X ( λ ) . The constantcomposition distribution matcher (CCDM) can be used forthat purpose [16]. It is asymptotically optimal as its rate R s approaches the entropy of the desired channel input X , R s = k s n s → H ( X ) as n s → ∞ .

10 20 30 40

SNR in dB Λ p a r

10 20 30 40SNR in dB 10 20 30 40SNR in dB (a) M = 4 (b) M = 8 (c) M = 16 Fig. 6. Resulting constellations for the parity symbols for different SNRs. Note that the highest and the lowest eigenvalue is always occupied for everymodulation order.

For large block sizes, the gap between R s and H ( X ) issufﬁciently small and can be neglected. Note that some ofthe possible eigenvalues may occur with probability zero.We consider the modulation order M to be a power of twosuch that we can deﬁne its binary image. The binary imageof λ , bi ( λ ) , is then encoded by a systematic encoder withinformation block length k c , code length n c , and rate R c = k c n c .The code is denoted by C , with |C| = 2 k c . The parity bitsat the output of the encoder are mapped to a sequence ofeigenvalues λ par ∈ Λ par with modulation order M par = | Λ par | and Λ par ⊆ Λ by the block s ( · ) in Fig. 5 such that they areuniformly distributed.Assuming that a high code rate R c is used, we accepta small penalty with respect to the optimal channel inputdistribution and transmit λ and λ par via time-sharing. Themajor difference of PES compared to PAS is the fact that thechannel input distribution is not the optimal distribution dueto the time-sharing with the sequence λ par . Consequently, thiscauses a performance degradation. However, PES is highlyﬂexible as the spectral efﬁciency can be adapted by the DMand the code rate R c , and a single code can be used. Note thatevery eigenvalue is protected by the code as FEC is performedafter the DM and decoding and demapping can be performedindependently. Thus, PES shares these advantages with PAS.We wish for a high code rate R c to keep the performancedegradation due to the time-sharing low. More precisely, wewish to maximize the number of symbols distributed accordingto p ∗ X ( λ ) . The ratio between information symbols and codedsymbols, denoted by R ts , is an indication for the expectedperformance degradation, R ts = n c R c log ( M ) n c R c log ( M ) + n c (1 − R c )log ( M par ) = R c log ( M par )log ( M )(1 − R c ) + R c log ( M par ) . (9) A. Parity symbols

The parity symbols at the output of the FEC code encoderare uniformly distributed. In Fig. 4, we observed that OOKwith uniform signaling, i.e., Λ par = { λ , λ M } and M par = 2 ,is optimal for low SNR as it achieves capacity and performs reasonably well for high SNR. However, we note from Fig. 4that for a higher order modulation, even with uniform sig-naling, higher rates are possible. Hence, here we consider ascenario where M par > . We further increase the rate by onlyusing a subset of Λ and by picking the eigenvalues such thatthey are not uniformly spaced. Example 1.

Consider the information symbol alphabet

Λ = { λ , . . . , λ } with M = 8 . For the Λ par , we could pick Λ par = { λ , λ , λ , λ } with p X ( λ ) = { . , . , . , . } and M par = 4 . To ﬁnd the function s ( · ) that maps the parity symbols onto λ ∈ Λ par , we use a greedy algorithm as described in Algo-rithm 1. It starts with OOK, i.e., Λ par = { λ , λ M } . For eachof the remaining symbols λ ∈ Λ \ Λ par , it calculates the time-scaled MI of λ ∪ Λ par , ﬁnds the symbol λ for which the time-scaled MI of λ ∈ Λ \ Λ par is maximized, and adds it to Λ par .All symbols with a greater or equal imaginary part than λ areremoved, i.e., the eigenvalues { λ ′ ∈ Λ : I { λ ′ } ≥ I { λ }} areremoved. This process is repeated until there are no symbolsleft. We then choose the set of symbols that gives the highesttime-scaled MI as Λ par . We note that this procedure does notguarantee an optimal solution. However, for M = { , } anexhaustive search gives the same result as that of Algorithm 1.In Fig. 6, we show Λ par for different modulation orders andSNRs. For M = 4 , we note that for low SNR OOK gives thebest result. Increasing the SNR results in a third level beingadded. The same behavior is observed for M = 8 . Comparedto M = 4 , the third level is introduced at a slightly lowerSNR. This results from the fact that for M = 8 , differentconstellation points are available. For M = 16 , we note thatagain a third level appears when increasing the SNR. Whenfurther increasing it, this third level moves to an eigenvaluewith larger imaginary part and consequently a fourth level at aneigenvalue with lower imaginary part appears. This behaviorcan be observed repeatedly. To map the binary parity bits tothe constellation points, we require M par to be a power oftwo. As this is not always the case (see Fig. 6), we pick thelargest power of two that is smaller or equal than the numberof constellation points given by Algorithm 1. Algorithm 1

Algorithm to calculate the signal points for theparity symbols. With a slight abuse of notation, we denotethe time-scaled MI of a set Λ par by I (Λ par ) . We assume thesymbols in the set to be uniformly distributed. Input:

Constellation Λ Output:

Constellation Λ par Λ placed = { λ , λ M } Λ par = { λ , λ M } Λ not placed = Λ \ Λ par while Λ not placed = ∅ do for all λ i ∈ Λ not placed do Calculate I (Λ placed ∪ λ i ) end for λ max := arg max I ( · ) Λ placed = Λ placed ∪ λ max if I (Λ placed ) > I (Λ par ) then Λ par = Λ placed end if Λ not placed = Λ not placed \ { λ : λ ∈ Λ not placed , I { λ } ≥ I { λ max }} end while return Λ par B. Achievable Rate of Probabilistic Eigenvalue Shaping

To characterize the performance of PES, we derive theachievable rate of PES, denoted by R ps . We assume that thechannel is memoryless and that the decoder performs bit-metric decoding. Theorem 1.

The achievable rate of PES is R ps = R ts H ( X ) − m X i =1 H ( X B i | Y B i ) ! + (1 − R ts ) m par − m par X i =1 H ( X Bpar ,i | Y Bpar ,i ) ! . (10) Proof.

The achievable rate for PAS has been derived in [17].For a system employing time-sharing, the resulting achievablerate is the average of the achievable rate of the two transmis-sion schemes.In Fig. 7, we plot the capacity and the achievable rate (10)for different code rates R c = { / , / , / , / , / , / , / , / , / , / , / } and modulation orders for acutoff parameter δ = 0 . . Λ par and hence M par are chosenaccording to the results of Algorithm 1. For each modulationorder, we notice that the curves cross at a certain SNR. ForSNRs below this point, the lowest code rate (correspondingto the highest curve) gives the best performance whereas forSNRs above this point, the highest code rate (correspondingto the highest curve) gives the best performance. We note theinﬂuence of time-sharing, which results in a gap between theachievable rate and capacity. The gap increases for lower coderates R c as the channel input distribution deviates more fromthe optimal one.

10 15 20 25 30 35 4000 . . . . SNR in dB R p s i nb it/ s y m bo l/ no r m . ti m e M = 4 M = 8 M = 16 C Fig. 7. Achievable rates for different code rates with Λ par according toAlgorithm 1 for a cutoff parameter δ = 0 . . V. N

UMERICAL E VALUATION

In this section, we evaluate the performance of the PESscheme via discrete-time Monte-Carlo and SSF simulations.For the mapping bi ( · ) (see Fig. 5), we use Gray labeling. Also,for the FEC, we use the binary LDPC codes of the DVB-S2 standard with code length n c = 64800 and code rates R c = { / , / , / , / , / , / , / , / , / , / , / } . For the parity symbols, we use the constellation arisingfrom Algorithm 1, depicted in Fig. 6. A. Detection

For the SSF simulation, we simulate a continuous signal andhence, we require a detector. We use the following method todeal with the variable pulse durations: We set a threshold θ sufﬁciently higher than the noise. Once the magnitude of thesignal rises above θ , we save the time as t start and when themagnitude of the signal falls below θ , we save the time as t end . We then extend the interval bounded by t start and t end ,i.e., ˜ t start = t start − δ t and ˜ t end = t end + δ t . Calculating the NFTover the interval [˜ t start , ˜ t end ] using the spectral method [1, PartII, Section IV] and only considering the imaginary part of thediscrete eigenvalue gives the received symbol y . This approachrequires that the SNR is sufﬁciently high. As the model hasthe same requirement due to the perturbation approach, thisrequirement is fulﬁlled.It may happen that due to noise, a received pulse neverrises above the threshold θ . In this case, the shortest durationis assumed (i.e., the duration of the pulse with amplitudezero). This scenario can be avoided by choosing the thresholdsufﬁciently lower than the lowest amplitude. Furthermore,due to the shape of the capacity-achieving distribution, loweramplitudes are less likely, hence preventing this scenario.To ﬁnd the best threshold, we tested the performance fordifferent values of θ and found that the performance of athreshold at of the lowest non-zero amplitude of theconstellation works best. We observed that small deviations TABLE IS

IMULATION PARAMETERS . Span length l span

80 km

Second order dispersion β − .

137 ps km − Nonlinearity parameter γ . − km − Attenuation α . − Shortest pulse T short .

83 ns

Longest pulse T long . Bandwidth B . Avg. transmit power P − .

86 dBm

Cutoff parameter δ . of the threshold do not affect the performance signiﬁcantlywhereas setting the threshold too high (missing symbols withlow amplitude) or to low (detecting a symbol where thereis none) leads to performance degradation. Furthermore, weassume synchronization sequences spread sufﬁciently far apartin order not to impact the rate. We assume synchronization tobe ideal such that it is guaranteed that error propagation islimited. B. Numerical Results

We perform Monte-Carlo simulations of the discrete-timemodel (4) and show the results in Fig. 8, where we plot thetransmission rate at a bit error rate (BER) of − for M =4 , and . The highest transmission rate for each modulationorder corresponds to the highest code rate R c . We notice thatthe gap to capacity for M = 4 is smaller than for M = 8 and M = 16 . If we consider ∆ M = M − M par , i.e., the differenceof the modulation order of Λ and Λ par , we note that for a low M , ∆ M is low was well. For example, for M = 4 , ∆ M ≤ .Hence, the rate loss due to time-sharing is small. For M = 16 ,the gap to capacity is smaller than for M = 8 . Considering therelevant SNR range, we note that ∆ M is smaller for M = 16 than for M = 8 and thus explaining the smaller rate loss.We also simulated the transmission over a ﬁber usingSSF simulations transmitting a train of solitons. We considera single mode ﬁber (SMF) with parameters as in Table Iand two different ampliﬁcation schemes, distributed Ramanampliﬁcation and lumped ampliﬁcation using EDFAs. For bothschemes, the peak power constraint is chosen such that theeffect of the EDFAs can be neglected, i.e., λ max = 2  . Weemploy the detection schemes as described in V-A and choosethe cutoff-parameter δ = 0 . , i.e., . of the energy iscontained in the pulse, for which the condition (3) is fulﬁlled.This then leads to a similar cutoff parameter as in [7]. For eachmodulation order M = { , , } , we determine the furthestdistance over which we achieve a BER of less than − andconsider the rate gain compared to an unshaped system asin [7]. This results for M = { , , } in transmission over , , and at a rate gain of

20 % ,

26 % ,and

95 % , respectively. The results do not differ for distributedand lumped ampliﬁcation as this is ensured by the peak powerconstraint.

10 15 20 25 30 35 4000 . . . . SNR in dB I ( X ; Y ) i nb it/ s y m bo l/ no r m . ti m e M = 4 M = 8 M = 16 C Fig. 8. Performance of time sharing with parity symbols according toAlgorithm 1. The rate points correspond to a performance at BER = 10 − .The highest transmission rate for each modulation order corresponds to thehighest code rate R c . VI. C

ONCLUSION

In this paper, we presented a probabilistic shaping schemefor an NFT-based transmission system embedding informationin the imaginary part of the discrete spectrum. It shapesthe information symbols according to the capacity-achievingdistribution and transmits them via time-sharing together withthe uniformly distributed, suitably modulated parity symbols.We exploited the fact that the pulses of the signal in thetime domain are of unequal length to improve the data ratecompared to [7]. We used the time-scaled MI and derived thecapacity-achieving distribution in closed form for the noiselesscase and numerically in the general case. We showed thatprobabilistic eigenvalue shaping signiﬁcantly improves theperformance of an NFT-based transmission scheme, and canalmost double the data rate. As a possible extension of ourwork, the continuous spectrum can be used to increase thespectral efﬁciency [4].A

CKNOWLEDGMENTS

The authors would like to thank the anonymous reviewersfor their feedback and comments which helped to improve thispaper signiﬁcantly. Especially, we would like to acknowledgeone of the reviewers for proposing an elegant way to proveLemma 1, which is included in this paper.R

EFERENCES[1] M. I. Youseﬁ and F. R. Kschischang, “Information transmission usingthe nonlinear fourier transform, part I-III,”

IEEE Trans. Inf. Theory ,vol. 60, no. 7, pp. 4312–4369, Jul. 2014.[2] Z. Dong, S. Hari, T. Gui, K. Zhong, M. I. Youseﬁ, C. Lu, P. K. A. Wai,F. R. Kschischang, and A. P. T. Lau, “Nonlinear frequency divisionmultiplexed transmissions based on NFT,”

IEEE Photon. Technol. Lett. ,vol. 27, no. 15, pp. 1621–1623, Aug. 2015.[3] V. Aref, H. B¨ulow, K. Schuh, and W. Idler, “Experimental demonstrationof nonlinear frequency division multiplexed transmission,” in

Proc. 41stEur. Conf. Opt. Commun. (ECOC) , Valencia, Spain, Sep. 2015, pp. 1–3. [4] V. Aref, S. T. Le, and H. B¨ulow, “Demonstration of fully nonlinearspectrum modulated system in the highly nonlinear optical transmissionregime,” in

Proc. 42nd Eur. Conf. Opt. Commun. (ECOC) , D¨usseldorf,Germany, Sep. 2016, pp. 1–3.[5] A. Geisler and C. Schaeffer, “Experimental nonlinear frequency divisionmultiplexed transmission using eigenvalues with symmetric real part,”in

Proc. 42nd Eur. Conf. Opt. Commun. (ECOC) , D¨usseldorf, Germany,Sep. 2016, pp. 1–3.[6] S. Hari, M. I. Youseﬁ, and F. R. Kschischang, “Multieigenvalue com-munication,”

J. Lightw. Technol. , vol. 34, no. 13, pp. 3110–3117, Jul.2016.[7] N. A. Shevchenko, S. A. Derevyanko, J. E. Prilepsky, A. Alvarado,P. Bayvel, and S. K. Turitsyn, “Capacity lower bounds of the noncentralchi-channel with applications to soliton amplitude modulation,”

IEEETrans. Commun. , to appear.[8] G. D. Forney, R. Gallager, G. Lang, F. Longstaff, and S. Qureshi,“Efﬁcient modulation for band-limited channels,”

IEEE J. Sel. AreasCommun. , vol. 2, no. 5, pp. 632–647, Sep. 1984.[9] F.-W. Sun and H. C. A. van Tilborg, “Approaching capacity by equiprob-able signaling on the Gaussian channel,”

IEEE Trans. Inf. Theory ,vol. 39, no. 5, pp. 1714–1716, Sep. 1993.[10] G. B¨ocherer, F. Steiner, and P. Schulte, “Bandwidth efﬁcient andrate-matched low-density parity-check coded modulation,”

IEEE Trans.Commun. , vol. 63, no. 12, pp. 4651–4665, Dec. 2015.[11] S. A. Derevyanko, S. K. Turitsyn, and D. A. Yakushev, “Fokker-planckequation approach to the description of soliton statistics in optical ﬁbertransmission systems,”

J. Opt. Soc. Am. B , vol. 22, no. 4, pp. 743–752,Apr. 2005.[12] M. Zafruullah, M. Waris, and M. K. Islam, “Simulation and design ofEDFAs for long-haul soliton based communication systems,” in

Proc.Asia-Paciﬁc Conf. Commun. (APCC) , Penang, Malaysia, Sep. 2003.[13] S. Verd´u, “On channel capacity per unit cost,”

IEEE Trans. Inf. Theory ,vol. 36, no. 5, pp. 1019–1030, Sep. 1990.[14] I. M. Stancu-Minasian,

Fractional Programming , 1st ed. Dordrecht,The Netherlands: Kluwer Academic Publishers, 1997.[15] T. M. Cover and J. A. Thomas,

Elements of Information Theory , 2nd ed.Hoboken, NJ, USA: Wiley, 2006.[16] P. Schulte and G. B¨ocherer, “Constant composition distribution match-ing,”

IEEE Trans. Inf. Theory , vol. 62, no. 1, pp. 430–434, Jan. 2016.[17] G. B¨ocherer, “Achievable rates for probabilistic shaping,”