[PDF] Temporal Energy Analysis of Symbol Sequences for Fiber Nonlinear Interference Modelling via Energy Dispersion Index

Abstract

The stationary statistical properties of independent, identically distributed (i.i.d.) input symbols provide insights on the induced nonlinear interference (NLI) during fiber transmission. For example, kurtosis is known to predict the modulation format-dependent NLI. These statistical properties can be used in the design of probabilistic amplitude shaping (PAS), which is a popular scheme that relies on an amplitude shaper for increasing spectral efficiencies of fiber-optic systems. One property of certain shapers used in PAS -- including constant-composition distribution matchers -- that is often overlooked is that a time-dependency between amplitudes is introduced. This dependency results in symbols that are non-i.i.d., which have time-varying statistical properties. Somewhat surprisingly, the effective signal-to-noise ratio (SNR) in PAS has been shown to increase when the shaping blocklength decreases. This blocklength dependency of SNR has been attributed to time-varying statistical properties of the symbol sequences, in particular, to variation of the symbol energies. In this paper, we investigate the temporal energy behavior of symbol sequences, and introduce a new metric called energy dispersion index (EDI). EDI captures the time-varying statistical properties of symbol energies. Numerical results show strong correlations between EDI and effective SNR, with absolute correlation coefficients above 99% for different transmission distances.

Full PDF

PPREPRINT, FEBRUARY 25, 2021 1

Temporal Energy Analysis of Symbol Sequences forFiber Nonlinear Interference Modelling via EnergyDispersion Index

Kaiquan Wu,

Student Member, IEEE , Gabriele Liga,

Member, IEEE , Alireza Sheikh,

Member, IEEE ,Frans M. J. Willems,

Life Fellow, IEEE , and Alex Alvarado,

Senior Member, IEEE

Abstract —The stationary statistical properties of independent,identically distributed (i.i.d.) input symbols provide insights onthe induced nonlinear interference (NLI) during ﬁber transmis-sion. For example, kurtosis is known to predict the modulationformat-dependent NLI. These statistical properties can be usedin the design of probabilistic amplitude shaping (PAS), which is apopular scheme that relies on an amplitude shaper for increasingspectral efﬁciencies of ﬁber-optic systems. One property ofcertain shapers used in PAS—including constant-compositiondistribution matchers—that is often overlooked is that a time-dependency between amplitudes is introduced. This dependencyresults in symbols that are non -i.i.d., which have time-varyingstatistical properties. Somewhat surprisingly, the effective signal-to-noise ratio (SNR) in PAS has been shown to increase whenthe shaping blocklength decreases. This blocklength dependencyof SNR has been attributed to time-varying statistical propertiesof the symbol sequences, in particular, to variation of the symbolenergies. In this paper, we investigate the temporal energybehavior of symbol sequences, and introduce a new metric called energy dispersion index (EDI). EDI captures the time-varyingstatistical properties of symbol energies. Numerical results showstrong correlations between EDI and effective SNR, with absolutecorrelation coefﬁcients above for different transmissiondistances.

Index Terms —Constant-Composition Distribution Matching,Fiber Channel Models, Fiber Nonlinearities, Probabilistic Am-plitude Shaping.

I. I

NTRODUCTION C ONSTELLATION shaping (CS) and forward error cor-rection (FEC) are two crucial elements to realize nearcapacity-achieving transmission for the additive white Gaus-sian noise (AWGN) channel. In recent years, a novel codedmodulation framework called probabilistic amplitude shap-ing (PAS) [1] has attracted wide attention for its capacity-achieving performance. PAS elegantly integrates FEC andprobabilistic shaping (PS) [2], where the PS functionality

This work is supported by the Netherlands Organisation for ScientiﬁcResearch via the VIDI Grant ICONIC (project number 15685). The workof Alex Alvarado is supported by the European Research Council (ERC)under the European Union’s Horizon 2020 research and innovation programme(grant agreement No. 757791). The work of G. Liga is supported by theEuroTechPostdoc programme under the European Union’s Horizon 2020 re-search and innovation programme (Marie Skłodowska-Curie grant agreementNo 754462).The authors are with the Information and Communication Theory Lab,Signal Processing Systems Group, Department of Electrical Engineering,Eindhoven University of Technology, Eindhoven 5600 MB, The Netherlands(e-mails: { k.wu, g.liga, a.sheikh, f.m.j.willems, a.alvarado } @tue.nl).A. Sheikh is with imec, Holst Centre, High Tech Campus 31, 5656 AEEindhoven, The Netherlands (email: [email protected]). is enabled by an amplitude shaper. One popular amplitudeshaper is constant-composition distribution matcher (CCDM)[3]. Other well-known shapers include multiset-partition dis-tribution matcher (MPDM) [4], product distribution matcher(PDM) [5], enumerative sphere shaping (ESS) [6], [7], andshell mapping [8].In the context of ﬁber optical communications, numerousstudies about PAS have been carried out in simulation [9]–[11] and experiments [12], [13]. Record spectral efﬁciencies(SEs) have been achieved in ﬁeld trials [14], [15]. However,unlike the AWGN channel, SEs in the nonlinear ﬁber channelare limited by the Kerr effect. This effect causes nonlinearinterference (NLI), which becomes a substantial part of thetotal noise experienced by the transmitted signals [16].Mitigation of NLI can be achieved by optimizing statis-tical properties of the transmitted symbol sequences. Givena number of constellation points, it is possible to changethe constellation geometry [17], [18] or the probability massfunction (PMF) of the constellation symbols [9], [19], [20].These techniques are often referred to as geometric andprobabilistic shaping, respectively. In both cases, by assumingsymbols to be independent identically distributed (i.i.d.), thestationary statistical properties of symbols are optimized. Inthe former, the support of the PMF is optimized, while in thelater, the probabilities are optimized.An alternative approach is to manipulate the temporal structure of the symbol sequence, which is believed to exertgreat inﬂuence on the NLI [21], [22]. The NLI can bemodeled as inter-symbol interference (ISI) [23], i.e., the NLIexperienced by a transmitted symbol only depends on adja-cent symbols. The number of interfering symbols is usuallyreferred to as channel memory. Therefore, transmitted symbolsthat comply to certain temporal structures could be usedto suppress NLI. One example of this approach is the so-called temporal shaping introduced in [22], which generatescorrelated symbols from a ball-shaped constellation. By takingthe correlation between symbol energies into account, [22]showed that transmitting correlated symbols leads to a newcorrelated term compared to i.i.d. symbol sequences (see [22,Eqs. (5) and (9)]). Temporal shaping was later realized usinga ﬁnite state machine [24]. Although improved tolerance tononlinearities can be achieved by controlling the temporalstructure of the symbol sequences, a guiding principle for theoptimization of the temporal structure is still unknown. Thiscould be due to the fact that the statistical analysis of non- a r X i v : . [ c s . I T ] F e b REPRINT, FEBRUARY 25, 2021 2 i.i.d. symbol sequences is in general more difﬁcult than itsi.i.d. counterpart.PAS generates dependent symbols because the employedshaper imposes hidden temporal structure on the amplitudeblocks. Recently, several studies based on the PAS architecturehave observed the impact of the resulting temporal structureon the effective SNR. The effective SNR has been shownto depend on the shaping blocklength: using short shapingblocklengths has been shown to offer signiﬁcant effectiveSNR gains due to a weaker presence of nonlinearities [11],[25]–[27]. In [28], the relationship between carrier phaserecovery and shaping blocklength on the nonlinear shapinggain was studied. In addition, the symbol mapping strategy,i.e., the way the shaped amplitudes are mapped to multi-dimensional symbols, has also been shown to be a crucialfactor in the effective SNR performance of the system. It wasobserved in [25], [29] that instead of using four amplitudeshapers independently for I/Q and X/Y dimensions, usingone amplitude shaper across four dimensions improves theeffective SNR.The assumption of i.i.d. input symbols can be justiﬁed inscenarios such as (i) uncoded transmission, and (ii) systemsemploying a random interleaver that breaks dependenciesbetween symbols. Based on the i.i.d. assumption, state-of-the-art Gaussian noise (GN) and enhanced Gaussian noise(EGN) models provide insights about the connection betweenthe stationary statistical properties of the transmitted signaland the NLI. The GN model concludes that the NLI powerscales as the cube of the average symbol energy [30]–[32]. Byrelaxing the

Gaussianity assumption of the GN model [33], theEGN model shows that the standardized fourth moment (alsoknown as kurtosis ) of the transmitted symbols is an importantmetric in predicting the modulation-dependent NLI [34]–[36].To mitigate the NLI, an optimized PMF was designed in [20]with the help of kurtosis. In [37], the blockwise compositionof the symbol sequences having low kurtosis was selected fortransmission.To address the scenario of non-i.i.d. input symbols, theheuristic ﬁnite-memory GN model was proposed in [21],which considers a time-windowed symbol energy within thechannel memory for the computation of the NLI. Recently,to analyze the temporal structure of symbol sequences, itwas shown in [25], [38] that that symbol sequences withfewer clusters of identical symbols offer reduced NLI. Aheuristic metric called run ratio was also proposed in [25].The perturbative time-domain model of [39] can in principlealso be used to analyze non-i.i.d. input symbols.The discussion above leaves multiple open questions aboutthe effects of the temporal structure of the symbol sequenceson the NLI. Perhaps the most important question is that ametric that accounts for time-varying statistical properties ofsymbol sequences and enables a precise NLI prediction isstill unknown. This paper is devoted to presenting a detailedexplanation for the blocklength dependency on effective SNR, The sixth-order moment reﬂects self-channel interference, which ismarginal in long distance transmission using wavelength-division multiplexing(WDM). and introducing a metric that is able to capture the effect oftime-varying statistics on the NLI.The contributions of this paper are three. First, we showthat the temporal structure of the symbol sequence shapedby CCDM yields correlated symbols. The second contributionis to introduce a novel metric called energy dispersion index(EDI) to evaluate the variance of the symbol energies within awindow. This metric is inspired by the channel models in [21],[22] and is a function of the autocorrelation function of thesymbol energy’s variance within a window. An almost perfectcorrelation between the EDI and the effective SNR for differ-ent transmission distances is observed in numerical analysisusing CCDM. Lastly, we also give analytical expressions forthe autocorrelation and the EDI of the QAM symbol sequencesgenerated by CCDM.The rest of the paper is organized as follows. A review of thechannel memory estimation and relevant channel models forthe optical ﬁber channel is presented in Sec. II. In Sec. III, thestatistical properties of symbol sequences shaped by CCDMare studied. The EDI, which is the main contribution of thepaper, is presented along with the simulation results in Sec. IVand Sec. V. Conclusions are drawn in Sec. VI.II. M

ODELING

NLI

WITH W INDOWED E NERGY

In this section, the estimation of memory in the ﬁber-optic channel and the effective SNR considering the NLI arereviewed. We then discuss the effects of temporal energy be-havior of symbol sequences on the NLI, from the perspectivesof channel models with memory [21], [22], [39]. We end thissection with a review of metrics available in the literature thatpredict the NLI induced by input symbols.

A. Channel Memory and Effective SNR

In a nonlinear ﬁber channel, ampliﬁers along with con-ventional linear digital signal processing (DSP) effectivelycompensate chromatic dispersion (CD) and attenuation. Inthis paper, we consider the channel output after DSP stepsexcluding nonlinearity compensation. Thus, the residual noisecomes mainly from ampliﬁed spontaneous emission (ASE) andNLI. The ASE noise is a random process independent of thesignal, whereas the NLI noise is dependent on the signal. Theinterplay between CD and ﬁber nonlinearity induces nonlinearinteractions among a number of past and future symbols.The number of interfering symbols is often referred to as thechannel memory.For a ﬁber-optic system with group velocity dispersion β and optical bandwidth ∆ ω , the dispersive length L D is deﬁnedas [21, Sec. II-B] L D = 1 / (cid:0) ∆ ω | β | (cid:1) , (1)where L D indicates the distance scale over which pulsebroadening effects become signiﬁcant. Given the propagationdistance L , the one-sided channel memory M is roughlyapproximated as [21, Sec. II-B] M ≈ L/L D . (2) REPRINT, FEBRUARY 25, 2021 3

At time instant i , given the transmitted symbol X i , a vectorconsisting of X i along with its M neighbouring symbols isdenoted as X i + Mi − M = [ X i − M , . . . , X i , X i +1 , . . . , X i + M ] . Byconsidering the NLI and the ASE as AWGNs, the receivedsymbol Y i can be expressed as Y i = X i + Z ASE ,i + Z NLI ,i ( X i + Mi − M ) . (3)Due to the nonlinear interference, the NLI term Z NLI ,i isexpressed as a function of X i + Mi − M .The SNR is usually evaluated under the implicit assumptionof ergodic signal and noise processes. Therefore, the empiricalestimations of the signal power and noise variance converge tothe corresponding statistical values. Hence, the effective SNRincluding the NLI is deﬁned as

SNR eﬀ (cid:44) E (cid:2) | X | (cid:3) Var [ Z ASE ] + Var [ Z NLI ] (4) = E (cid:2) | X | (cid:3) E [ | Y − X | ] ≈ µ | X | µ | Y − X | . (5)Often the r.h.s. of (5) is used to approximate the effective SNRin simulations and experiments. B. Fiber Channel Models with Finite Memory

We review related channel models that consider the tem-poral effects of the transmitted symbols and also ﬁt thegeneral model in (3). A ﬁrst-order perturbation model can bederived from the nonlinear Schr¨odinger equation in discrete-time domain [39]. Assuming a single channel at time instant i = 0 , Z NLI , is modeled as [39, Eq. (57)] Z NLI , ( X + ∞−∞ ) = γ + ∞ (cid:88) h = −∞ + ∞ (cid:88) j = −∞ + ∞ (cid:88) l = −∞ S h,j,l X h X j X ∗ l , (6)where γ is the nonlinear coefﬁcient. The product of the symboltriplets are weighted by complex perturbation terms S h,j,l thatquantify self channel interference . In (6) we use the notation X + ∞−∞ to emphasize that (6) is in theory an inﬁnite-memorychannel.The perturbation terms S h,j,l exhibit very different magni-tudes depending on their indices (see for example [40, Fig. 5]).To capture the dominant terms contributing to the NLI, weconsider the terms with j = l as previously done in [22,Sec. II-B]. Furthermore, we then truncate the indices withinthe channel memory M . Hence, (6) is approximated as Z NLI , ( X + M − M ) ≈ γ M (cid:88) h = − M X h M (cid:88) l = − M S h,l,l | X l | , (7) Notation : Throughout this paper, random variables are denoted by up-percase letters X and their (deterministic) outcomes by the same letter inlowercase x . Sequences are denoted by boldface letters X (random) or x (deterministic). We use subscript and superscript to denote the boundary of arandom sequences, i.e., X ji = [ X i , X i +1 , . . . , X j ] . Expectations, variancesand autocovariances are denoted by E [ · ] , Var [ · ] , and Cov [ · ] , respectively.The probability of X i = x is denoted by P X i ( x ) . Conditional probability isdenoted as P Y i | X i ( y | x ) . The imaginary unit is  (cid:44) √− . The cross-channel interference can be expressed in the same form asthe symbol triplet product times the complex perturbation terms (see [39,Eq. (60)]). which shows that the NLI noise Z NLI , depends on theweighted symbol energies within the vector of X + M − M .For the transmission of non -i.i.d. symbol sequences, the sumof the ASE and the NLI in (3) is approximated in [21, Eq. (8)]as Z ASE , + Z NLI , ( X + M − M ) ≈ ˜ Z (cid:118)(cid:117)(cid:117)(cid:116) P ASE + η (cid:32) (cid:80) Mk = − M | X k | M + 1 (cid:33) , (8)where ˜ Z is a zero-mean unit-variance circularly symmetriccomplex Gaussian random variable, P ASE is the ASE noisevariance, and η is a real, non-negative constant quantifyingthe NLI. When compared to (7), (8) is a more radical approx-imation in the sense that the energies of all interfering symbolswithin a window of length M + 1 are assumed to make equalcontribution to NLI.In general, the expressions (7) and (8) show that the sumof (weighted) symbol energies within a window plays animportant role in determining the NLI. Transmitting correlatedsymbols can therefore suppress the NLI. These observationsmotivate us to focus on the symbol energies and their temporalcorrelations. C. Related Metrics for NLI

One well-known metric for showing variations of symbolenergies is peak-to-average power ratio (PAPR). PAPR hasbeen widely used for analysis of orthogonal frequency-divisionmultiplexing (OFDM) ﬁber transmission [41]. PAPR partiallyreﬂects energy variation of symbols and thus can be regardedas a rough indication of nonlinearity tolerance of the trans-mitted signal [42, Sec. II-D], [43, Sec. II]. PAPR is deﬁnedas Θ (cid:44) | X | max E [ | X | ] , (9)where | X | max is the maximum symbol energy in the symbolsequence.Another important metric (originated from the EGN model)is kurtosis. The EGN model predicts strong NLI if the trans-mitted symbols sequence has high kurtosis. For zero-meanconstellations, kurtosis is deﬁned as Φ (cid:44) E (cid:2) | X | (cid:3) E [ | X | ] . (10)To explain the blocklength dependent effective SNR, runratio was proposed in [25, Sec. III-B]. Symbol sequences withhigh run ratio are considered to be more prone to the NLI. Runratio is deﬁned asR r (cid:44) T (cid:32) T − (cid:88) i =1 ¯ δ ( x i − , x i ) (cid:33) , (11)where T is the number of transmitted symbols, and ¯ δ ( x i − , x i ) is a decision function which is equal to when x i − (cid:54) = x i or otherwise. REPRINT, FEBRUARY 25, 2021 4

AmplitudeShaper { , } k → A n Amplitudeto Bits Syst.FECEnc. N Nk Bits ζNn

Bits S I A I X I AmplitudeShaper { , } k → A n Amplitudeto Bits Syst.FECEnc.

N N S Q A Q X Q  L X In-phaseQuadrature

S/PInfo.Bits Nn Amplitudes Nn SignBits Nn QAMSymbols

Fig. 1. Transmitter schematic diagram for a PAS architecture using 1D symbolmapping.

III. QAM S

YMBOLS S HAPED BY

CCDMThis section ﬁrst reviews the generation of QAM symbolsequences in the PAS architecture. We then show that thesymbols generated by CCDM are statistically dependent fromeach other. Motivated by results in Sec. II-B, which showedthat the correlation among symbols could lead to a reducedNLI, we propose to analyze the autocorrelation of the symbolenergies. This analysis will be later used in Sec. IV for thederivation of the EDI.A block diagram of a PAS transmitter is illustrated in Fig. 1.To obtain a nonuniformly distributed QAM symbol sequence X , two sequences of shaped pulse-amplitude modulation(PAM) symbols X I and X Q are independently generatedin the in-phase and quadrature dimensions, respectively. Theresulting QAM symbols are then X = X I +  X Q . In thispaper, similar to [29, Sec. II-C], we refer this mapping strategyas 1D symbol mapping . When obtained from independentinformation bit sequences, X I and X Q are also independentof each other. A. System Model

As shown in Fig. 1, the generation of PAM symbols relieson a systematic FEC engine and an amplitude shaper. A ﬁxed-length amplitude shaper encodes k information bits to an am-plitude codeword of blocklength n , and thus the shaping rateis k/n . The amplitude set is denoted as A = { ∆ , , , . . . } where ∆ is a scaling factor.For simplicity of exposition, let us consider the in-phasedimension and one FEC codeword. As shown in Fig. 1, aFEC codeword consists of N amplitude codewords. Aftera serial to parallel conversion, N k information bits go intothe amplitude shaper, and ζN n bits go into FEC, where ζ denotes the fraction of information in sign bits (see [1,Eq. (29)]). The amplitude shaper converts N k bits into N amplitude codewords, which we call an amplitude sequence A I = [ A , A , . . . , A Nn − ] . The amplitudes in A I are thenlabeled into bits (see Fig. 1) and fed into the FEC along This mapping is also called inter-pairing in [25, Sec. II-B]. with ζN n bits. The labeling used is the binary-reﬂectedGray code [44]. The FEC parity bits together with ζN n bitsserve as sign bits S I to yield the PAM symbol sequence X I = [ X I, , X I, , . . . , X I,Nn − ] , where X I,i = ( − S I,i A I,i ( i = 0 , , . . . , N n − ). The same procedure is conducted inthe quadrature branch to generate X Q .The amplitude shaper imposes constraints on the amplitudesof each amplitude codeword. In this paper, the amplitudeshaper we consider is CCDM. Due to the constant-composition(CC) constraint, given blocklength n , the number of ampli-tudes a ∈ A in every codeword is a constant denoted by n a ,and thus n = (cid:80) a ∈A n a . For any realization of the amplitudesequence, the amplitude frequency distribution of a is n a /n .Therefore, the amplitude codewords generated by CCDM aresimply permuted versions of each other.In this paper, we make two assumptions about the CCDMcodebook. In the following section, these assumptions willallow us to consider the CCDM codewords as a genuine pro-cess of drawing amplitudes without replacement [3, Sec. IV].The ﬁrst assumption is that the codewords are equally likely,which is justiﬁed in the scenario where the information bits areindependent and uniformly distributed. The second assumptionis that the CCDM codebook contains all possible amplitudepermutations. The total number of permutations is given by themultinomial coefﬁcient of the composition (see [45, Eq. (5)]),which is denoted as N C . For k input bits, CCDM selects k permutations as codewords. In general, k < N C because (i) k is chosen to be variable to for example achieve rate adaptivity,or (ii) N C usually not be exactly a power of two. In this paper,we always use the largest possible value of k , and thus, (i)above can be ignored. On the other hand, due to (ii) above,CCDM does not always generate all N C permutations. In thispaper, we use emulated CCDM codewords that include allamplitude permutations. We will show later in the paper thatthis approximation is very good when compared to “exact”CCDM, where not all permutations are used.

B. Statistical Dependency Among CC Amplitudes

The dependency of amplitudes in a FEC frame is illustratedin Fig. 2. For notation simplicity, in what follows we drop thesubscripts of A and S that indicate in-phase or quadrature.Furthermore, we pay little attention to sign bits since theyonly determine the polarity of PAM symbols. In terms of theamplitude sequence A , Fig. 2 schematically shows that theamplitudes from different amplitude codewords of length n areindependent of each other. Fig. 2 also shows that amplitudes within an amplitude codeword are mutually dependent .The amplitude dependency within amplitude codewordscomes from the CC constraint. During the arithmetic encod-ing of a CCDM that can be modeled as drawing withoutreplacement process, the probability of drawing an amplitudeis always updated by subtracting the previous number ofamplitudes from the composition. The CC constraint furtherimplicitly ensures that the sum of amplitudes in a CCDMcodeword is a constant, i.e., n − (cid:88) j =0 A j = (cid:88) a ∈A an a . (12) REPRINT, FEBRUARY 25, 2021 5 S S . . . S n − S n S n +1 . . . S n − S ( N − n . . . S Nn − S Nn − A A . . . A n − A n A n +1 . . . A n − A ( N − n . . . A Nn − A Nn − . . .. . . Sign Bits, S Amp. Seq., A Amplitude Codeword 1 Amplitude Codeword 2 Amplitude Codeword N FEC Frame dependent independent dependent n n n

Fig. 2. Illustration of FEC frame and amplitude dependency. One FEC frame is formed by nN amplitudes and nN sign bits. Amplitudes within the samecodeword are dependent, while amplitudes from different codewords are independent. For any two different time instants i and i + τ , we then have A i = − A i + τ − n − (cid:88) j =0 j (cid:54) = i,i + τ A j + (cid:88) a ∈A an a . (13)Expression (13) shows that these amplitudes A i and A i + τ within CCDM codewords exhibit a linear relationship . Thisobservation shows that amplitudes within an amplitude code-word are indeed linearly dependent.In the next example, we use a CCDM shaping trellis to ex-plain the statistical dependency between amplitudes. The trellisshows all possible amplitude sequences and their accumulatedenergies. We construct the trellis based on the following rules: • Each path in the trellis represents a unique amplitudesequence. • At time instant i , the vertical position of the nodesrepresents the accumulated energy E acc i ∈ E , where E acc i (cid:44) i − (cid:88) t =0 | A t | , (14) E acc = 0 , and E is the set of possible accumulated energylevels. • The numbers labeling the trellis states represent thevalue of P A i | E acc i ( a | e ) , i.e., the conditional probability ofamplitude A i = a given the accumulated energy. E acc i = e for i = 0 , , . . . , where a ∈ A and e ∈ E . • At time instant i , the edges indicate the amplitude A i . Thesteeper the slope of the edge has, the larger the amplitude.The number next to the edges is P A i ,E acc i ( a, e ) , i.e., theprobability of the paths starting from accumulated energy e using an amplitude a . Example 1 (CCDM Trellis):

Assume the use of CCDM withblocklength n = 4 and a composition of three amplitudes 1and one amplitude 3 (i.e., n = 3 and n = 1 ), and thus,the amplitude PMF is P A = [ , ] . Fig. 3 shows the trellisof the ﬁrst amplitude codeword, whose behavior will repeatfor the subsequent codewords. The set of accumulated energystates is E = { , , , , , , , } . To explain the trellis,consider state E acc = 10 and the three connected edges (shownwith boldface text in Fig. 3). Since two paths reach this state, P E acc (10) = P A ,E acc (3 ,

1) + P A ,E acc (1 ,

9) = + = .At this node, we can only choose amplitude 1 for a , andthus P A | E acc (3 |

10) = 0 and P A | E acc (1 |

10) = 1 . The jointprobability of amplitude a = 1 starting from e = 10 is thus P A ,E acc (1 ,

10) = P A | E acc (1 | P E acc (10) = 1 × = . iE acc i

34 12 14

14 12

14 14

14 34 13 23 12 12

14 34

Fig. 3. CCDM shaping trellis with blocklength n = 4 for amplitude set A = { , } with P A = [ , ] . The amplitude edges are labeled with P A i ,E acc i ( a, e ) . Each node contains P A i | E acc i (3 | e ) on the left (in red) and P A i | E acc i (1 | e ) on the right (in blue). In Example 1, the amplitudes within a CCDM codewordare statistically dependent among each other. This can be seenfrom Fig. 3, where P A i | E acc i ( a | e ) = P A i ( a ) is not alwayssatisﬁed. The reason for this dependency is that A + A + A + A = 6 (see (12)), and any pair of amplitudes exhibitsa linear relationship. On the other hand, for i.i.d. amplitudesequences, P A i | E acc i ( a | e ) = P A i ( a ) is satisﬁed for ∀ a ∈ A .Another property of i.i.d. amplitude sequences is the time-independent marginal probability of an amplitude. As can beseen in Example 1, the marginal probability of an amplitudewithin a CCDM codeword is also constant, i.e., for ∀ i ∈ Z , (cid:88) e ∈E P A i ,E acc i ( a, e ) = P A i ( a ) = P A ( a ) = n a n . (15)The property in (15) is due to the set of codewords being per-mutation invariant, as shown in [46, Lemma 1]. For example, P A (1) = P A ,E acc (1 ,

10) + P A ,E acc (1 ,

2) = + = . Theproperty in (15) holds for all CC amplitude sequences and willbe be used in the statistical analysis later in the paper.A closer look at Example 1 also reveals that when multipleCCDM codewords are transmitted, the trellis extends in thetime domain, and the probabilistic model of the ﬁrst amplitudecodeword repeats with period n . This repetition is studied inthe next example. Example 2 (Trellis Repetition and Blocklength):

Fig. 4shows a comparison of the CCDM shaping trellises for block-

REPRINT, FEBRUARY 25, 2021 6 iE acc i EnergyVariation (a) Trellis for n = 4 . iE acc i EnergyVariation (b) Trellis for n = 8 .Fig. 4. CCDM shaping trellis comparison for blocklengths n = 4 and n = 8 ,with P A = [ , ] . The trellis marked by gray dotted lines show amplitudesequences never generated by CCDM. lengths n = 4 and n = 8 . Since both cases have the same P A , it can be seen that E acc / / for n = 4 , whichis equal to E acc / / for n = 8 (see Fig. 4). As aresult, the green dotted lines in the middle of the trellises showthat their accumulated energies grow at the same average rate.We also note that for blocklength n = 8 , the trellis broadens(in the energy direction) and includes the trellis for n = 4 . Thevertical arrows between the black dashed lines shows that thevariations of the accumulated energy growth for n = 8 islarger when compared to n = 4 .An important implication from the two examples above isthat by introducing dependencies between amplitudes, CCDMartiﬁcially constrains the dispersion of the accumulated energyof the amplitude sequences (see black arrows in Fig. 4). Fora ﬁxed PMF P A , the accumulated energy dispersion is moreconstrained with shorter blocklength. C. Statistics of Symbol Energies with CC Amplitudes

In this section, we will study the statistical propertiesof the QAM symbol sequences deﬁned by CC amplitudesfrom the perspective of stochastic processes. In particular, wewill show that the process of symbol energies is wide-sensecyclostationary (WSCS). The analysis in this section will laythe ground for the deﬁnition and formulation of the EDI metricin Sec. IV. We start by brieﬂy reviewing some of the keyproperties of stochastic processes.

Deﬁnition 1 (First-order stationarity [47, Ch. 9-1]):

Astochastic process X is called ﬁrst-order stationary if P X i = P X i + τ holds for any delay τ . Deﬁnition 2 (Autocorrelation [48, Def. 13.13]):

The auto-correlation of a stochastic process X as a function of timeslot i and delay τ is deﬁned as R X ( i, τ ) (cid:44) E [ X i X i + τ ] . (16) Deﬁnition 3 (Autocovariance [48, Def. 13.12]):

The auto-covariance of a stochastic process X is deﬁned as Cov [ X i , X i + τ ] (cid:44) E [( X i − E [ X i ]) ( X i + τ − E [ X i + τ ])] . (17)Moreover, for ﬁrst-order stationary processes, Cov [ X i , X i + τ ] = R X ( i, τ ) − E [ X ] . (18) Deﬁnition 4 (Wide-sense Cyclostationarity [47, Ch. 10-4]):

A stochastic process X is called wide-sense cyclostationary with period n if E [ X i ] = E [ X i + mn ] and R X ( i, τ ) = R X ( i + mn, τ ) hold for every m ∈ Z .In the deﬁnitions above, we used X to denote a genericstochastic process. In what follows, we focus on the processof symbol energies deﬁned as E (cid:44) [ . . . , E i − , E i , E i +1 , . . . ] , (19)where E i (cid:44) | X i | = A I,i + A Q,i . (20)For a symbol energy process E , we will refer to every n symbol energies as a block , which are constructed basedon two amplitude codewords (see Fig. 1). In addition, theautocorrelation of E is R E ( i, τ ) (cid:44) E [ E i E i + τ ] . (21) Lemma 1:

The QAM symbol energies E deﬁned by CCamplitudes is a ﬁrst-order stationary process. Proof:

As shown in (15), the CC amplitude sequences A are ﬁrst-order stationary processes. Since a QAM symbol canbe decomposed into two independent dimensions, the processof the energies E in (19) is also ﬁrst-order stationary.Lemma 1 shows that the probability distribution of symbolenergy P E i is constant for any i ∈ Z . From this ﬁrst-orderstationarity property in Lemma 1, it follows that E [ E i ] = E (cid:2) | X i | (cid:3) = E (cid:2) | X | (cid:3) , (22) Var [ E i ] = Var (cid:2) | X i | (cid:3) = Var (cid:2) | X | (cid:3) , (23)where E (cid:2) | X | (cid:3) = 2 E (cid:2) A (cid:3) , (24) Var (cid:2) | X | (cid:3) = 2 E (cid:2) A (cid:3) − E (cid:2) A (cid:3) , (25)and (25) was obtained using E (cid:2) | X | (cid:3) = 2 E (cid:2) A (cid:3) + 2 E (cid:2) A (cid:3) . (26)The second and fourth order moments of A in (24)–(26) canbe computed by using the amplitude PMF P A ( a ) = n a /n , i.e., E (cid:2) A (cid:3) = (cid:88) a ∈A a n a n , (27) E (cid:2) A (cid:3) = (cid:88) a ∈A a n a n . (28) Theorem 2:

For the sequence of QAM symbol energies E deﬁned by CC amplitudes, if E i and E i + τ ( τ (cid:54) = 0 ) belong tothe same block, we have R E ( i, τ ) = ρ < E (cid:2) | X | (cid:3) , (29)where ρ (cid:44) n E (cid:2) | X | (cid:3) − E (cid:2) | X | (cid:3) n − , (30) We avoid using “symbol energy codeword”, since the mapping between k input bits and n symbol energies is not one-to-one. REPRINT, FEBRUARY 25, 2021 7 and

Cov [ E i , E i + τ ] = − Var (cid:2) | X | (cid:3) n − < . (31) Proof:

See Appendix A.In Theorem 2, the negativity of autocovariance in (31)shows that the dependency between amplitudes is due tothe fact that these amplitudes are inversely correlated. Thisnegativity will also be used to prove Corollary 7 in Sec. IV.In addition, from (31) we obtain lim n →∞ Cov [ E i , E i + τ ] = 0 ,indicating that using long blocklengths weakens the lineardependency between symbol energies. Lemma 3:

The QAM symbol energies E deﬁned by CCamplitudes is a WSCS process. Proof:

The ﬁrst condition of WSCS processes E [ E i ] = E [ E i + mn ] is satisﬁed because of (22). The second conditionof WSCS processes R E ( i, τ ) = R E ( i + mn, τ ) follows fromthe fact the same probabilistic model repeats for every blockof n symbols (see Fig. 2).In practice, we are interested in the average autocorrelationof WSCS processes over the cyclostationarity period. For theprocess under investigation, such an average autocorrelation R E ( τ ) is deﬁned as R E ( τ ) (cid:44) n n − (cid:88) i =0 R E ( i, τ ) (32) = 1 n n − (cid:88) i =0 E [ E i E i + τ ] , (33)where (33) follows from (21). The average autocorrelation R E ( τ ) in (32) can be interpreted as a quantity that indicatesthe average linear dependency of all possible pairs of symbolenergies separated by τ symbols.The following theorem gives an analytical expression forthe average autocorrelation in (33) for the considered QAMsymbol sequences. Theorem 4:

The average autocorrelation R E ( τ ) , τ ∈ Z forCCDM QAM symbol sequences with blocklength n generatedusing a composition such that P A ( a ) = n a /n, a ∈ A can beexpressed as R E ( τ ) =  E (cid:2) | X | (cid:3) , if τ = 0 | τ | E (cid:2) | X | (cid:3) + ( n − | τ | ) ρn , if ≤ | τ | < n E (cid:2) | X | (cid:3) , if | τ | ≥ n (34)where ρ is given in (30). Proof:

The function R E ( i, τ ) for different values of τ and i is illustrated in Table I, where for simplicity only τ > is shown. When τ = 0 , R E ( i,

0) = E (cid:2) | X | (cid:3) as shown in theﬁrst column in Table I. For τ (cid:54) = 0 , E i + τ and E i are either atthe same block and thus correlated, or at two different blocksand thus independent of each other. For example, when i = 0 , E is only correlated with the following n − symbol energies,yielding R E (0 , τ ) = ρ for τ = 1 , , . . . , n − (see Fig. 2 andthe blue cells in the i = 0 row of Table I). When τ = n , E n is the energy of the ﬁrst symbol in the second block, and thus E and E n are independent that gives R E (0 , n ) = E (cid:2) | X | (cid:3) . . . . . . . . . τ R E ( τ ) Eq. (35) Eq. (34)Exact Emulated n = 10 n = 20 n = 50 n = 100i.i.d. E (cid:2) | X | (cid:3) Fig. 5. Simulation results of ˆ R E ( τ ) in (35) (markers) by using exact or emu-lated CCDM codewords, and analytical results of R E ( τ ) given by Theorem 4(solid lines) for normalized (cid:0) E (cid:2) | X | (cid:3) = 1 (cid:1) P A = [0 . , . , . , . . For all cases R E (0) = E (cid:2) | X | (cid:3) = 1 . . This is shown by the red cells in the i = 0 row of Table I.The other rows in Table I can be calculated in a similar way. R E ( τ ) is the column-wise average of R E ( i, τ ) .Note that R E ( τ ) is an even function and that R E ( τ ) in (34)is completely determined by the blocklength n , the second, andfourth order moments of | X | . The latter can be calculated via(24)–(25) and (27)–(28).Furthermore, R E ( τ ) can be approximated by the sampleautocorrelation function ˆ R E ( τ ) in a Monte Carlo simulation.With T samples, ˆ R E ( τ ) is deﬁned as ˆ R E ( τ ) (cid:44) T T − (cid:88) t =0 | x t | | x t + τ | . (35)When T → ∞ , ˆ R E ( τ ) → R E ( τ ) . This ergodicity followsfrom Slutsky’s theorem [47, Thm. 12-2], due to the fact thatas τ (cid:48) → ∞ , the correlation between symbol energies vanishesand thus Cov (cid:2) E i E i + τ , E i + τ (cid:48) E i + τ + τ (cid:48) (cid:3) → . Example 3 (Average Autocorrelation):

Fig. 5 shows theaverage autocorrelation R E ( τ ) in Theorem 4 and its es-timation ˆ R E ( τ ) for 64QAM symbol sequences (normal-ized to E (cid:2) | X | (cid:3) = 1 ). Four different shaping blocklengthsare compared, which use the same distribution P A =[0 . , . , . , . , and have the same R E (0) = E (cid:2) | X | (cid:3) =1 . (not shown in Fig. 5). I.i.d. sequences with the samedistribution are also considered. Fig. 5 shows ˆ R E ( τ ) in (35)for sequences generated using exact CCDM, emulated CCDM,and for a system with a symbol-level interleaver to emulatethe i.i.d. property. Fig. 5 shows that the emulated results differfrom exact CCDM only for n = 10 , where only ofthe total number of permutations N C are used as codewords.Except this minor mismatch for n = 10 , R E ( τ ) in Theorem 4approximates well the true autocorrelation function R E ( τ ) inall other cases. It can also be seen in Fig. 5 that for i.i.d.symbol sequences or τ ≥ n , R E ( τ ) = E (cid:2) | X | (cid:3) = 1 . Inthe cases of CC symbol sequences with < τ < n , due to REPRINT, FEBRUARY 25, 2021 8

TABLE IT

HE AUTOCORRELATION R E ( i, τ ) AND ITS AVERAGE R E ( τ ) FOR τ > . T HE THREE CASES OF R E ( i, τ ) IN (55) CORRESPOND TO THE FIRST COLUMN ( IN GRAY ), THE UPPER LEFT PART ( IN BLUE ) AND THE BOTTOM RIGHT ( IN RED ), RESPECTIVELY . τi . . . n − n − n n + 1 . . . E (cid:2) | X | (cid:3) ρ ρ . . . ρ ρ E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . . E (cid:2) | X | (cid:3) ρ ρ . . . ρ E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . . ... ... ... ... ... ... ... ... ... ... n − E (cid:2) | X | (cid:3) ρ E (cid:2) | X | (cid:3) . . . E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . .n − E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . . E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . .R E ( τ ) ( n − ρ/n + ( n − ρ/n + . . . ρ/n + ρ/n + . . . E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) /n E (cid:2) | X | (cid:3) /n . . . ( n − E (cid:2) | X | (cid:3) /n ( n − E (cid:2) | X | (cid:3) /n E (cid:2) | X | (cid:3) E (cid:2) | X | (cid:3) . . . ... E i − W E i − W +1 ... E i E i +1 ... E i + W E i + W +1 ...G Wi = E i − W + E i − W +1 + . . . + E i + W G Wi +1 = E i − W +1 + E i − W +2 + . . . + E i + W +1 Fig. 6. Window energies G Wi and G Wi +1 given symbol energy sequence E . R E ( i, τ ) being smaller than E (cid:2) | X | (cid:3) , the average symbol en-ergy dependency manifest itself as a deviation between R E ( τ ) and E (cid:2) | X | (cid:3) (see (29) and the second case in (34)). For eachblocklength, as τ increases, the symbol energy dependencygradually decreases and vanishes when τ = n such that E i and E i + τ always belong to two independent blocks. Moreover, as n increases, the curves approach E (cid:2) | X | (cid:3) (i.e., the deviationdecreases), which implies a weaker dependency.IV. E NERGY D ISPERSION I NDEX

In this section, EDI is introduced as a ﬁgure of merit toqualitatively predict the NLI power. In Sec. II-A, we showedthat memory is one of the main phenomena affecting the NLI.In previous NLI metrics (see Sec. II-C), the effect of non-i.i.d.input sequences and their interaction with the channel memorywas not considered. EDI brings together these two elements bycapturing the statistical properties of the input sequence overa time window which is comparable to the channel memory.

A. Deﬁnition

EDI is designed to be a sliding window statistic withwindow length W . The windowed energy at time instant i , G Wi , is deﬁned as the total energy of X i + W/ i − W/ , i.e., G Wi (cid:44) i + W/ (cid:88) j = i − W/ E j = i + W/ (cid:88) j = i − W/ | X j | . (36)Fig. 6 illustrates the windowed energies G Wi and G Wi +1 andshows the sliding window effect.The windowed energy is a random variable, and thus, wedeﬁne the windowed energy process as [ . . . , G Wi − , G Wi , G Wi +1 , . . . ] . (37)For CCDM QAM symbol sequences, the windowed energyprocess is the sum of W + 1 time-shifted WSCS processes of symbol energies of period n , and thus the windowedenergy process is also WSCS with the same period, i.e., n [49, Ch. 17.2-Prop. 1]. Therefore, R G W ( i, τ ) and E (cid:2) G Wi (cid:3) are periodic with period n . Furthermore, the variance of thewindowed energy G Wi (see (18)) is given by Var (cid:2) G Wi (cid:3) = Cov (cid:2) G Wi , G Wi (cid:3) = R G W ( i, − E (cid:2) G Wi (cid:3) . (38)With periodic R G W ( i, and E (cid:2) G Wi (cid:3) , Var (cid:2) G Wi (cid:3) in (37) alsovaries cyclically with period n . Deﬁnition 5 (Energy Dispersion Index):

EDI is deﬁned as Ψ (cid:44) Var (cid:2) G W (cid:3) E [ G W ] , (39)where E (cid:2) G W (cid:3) (cid:44) n n − (cid:88) i =0 E (cid:2) G Wi (cid:3) , (40) Var (cid:2) G W (cid:3) (cid:44) n n − (cid:88) i =0 Var (cid:2) G Wi (cid:3) . (41)EDI in Deﬁnition 5 measures the windowed energy varia-tions. EDI in (39) is deﬁned as the ratio of the average win-dowed energy variance ( Var (cid:2) G W (cid:3) ) to the average windowedenergy mean ( E (cid:2) G W (cid:3) ). As shown in (40) and (41), these twoaverages have the same form of the average autocorrelation R E ( τ ) in (32). B. Alternative Formulations

EDI in Deﬁnition 5 can also be expressed in terms of thesecond-order statistics of the input symbols. In this section,we introduce such formulation of the EDI for both CCDMQAM symbol sequences as well as i.i.d. symbol sequences.Note that we view the process of i.i.d. symbol sequences as aspecial case of a WSCS process with period n = 1 . Theorem 5:

The average windowed energy mean and av-erage windowed energy variance for CCDM QAM symbolsequences can be expressed as E (cid:2) G W (cid:3) = ( W + 1) E (cid:2) | X | (cid:3) (42)and Var (cid:2) G W (cid:3) = ( W + 1) Var (cid:2) | X | (cid:3) − W ( W + 1) E (cid:2) | X | (cid:3) + 2 W (cid:88) τ =1 ( W − τ + 1) R E ( τ ) , (43) REPRINT, FEBRUARY 25, 2021 9

900 920 940 960 980 1 ,

000 1 ,

020 1 ,

040 1 ,

060 1 ,

080 1 , . . . . . . . . · − G W P r o b a b ili t y Var (cid:2) G W (cid:3) Eq. (43) or (46) σ GW n = 500 110 .

27 108 . n = 2000 383 .

96 381 . .

91 654 . Fig. 7. Histogram of windowed energy G W for W = 1000 . Normalized (cid:0) E (cid:2) | X | (cid:3) = 1 (cid:1) P A = [0 . , . , . , . .The average windowed energy mean is E (cid:2) G W (cid:3) = 1000 . respectively, where R E ( τ ) is given by (34). Proof:

See Appendix B.For i.i.d. symbol sequences, (42) also holds. On the otherhand,

Var (cid:2) G Wi (cid:3) for i.i.d. symbol sequences is the sum of W + 1 energy variances, therefore, Var (cid:2) G W (cid:3) = 1 n n − (cid:88) i =0 i + W/ (cid:88) j = i − W/ Var [ E j ] (44) = 1 n n − (cid:88) i =0 ( W + 1)Var (cid:2) | X | (cid:3) (45) =( W + 1)Var (cid:2) | X | (cid:3) , (46)where (45) follows from (23). Example 4 (Windowed Energy Histograms):

Fig. 7 illus-trates how the probabilities of windowed energies depend onthe blocklength n , based on a number of windowed energysamples. For simplicity, we only show the results of emulatedCCDM amplitude codewords, since almost the same result isobtained when using exact CCDM. Fig. 7 shows that the meanvalue of G Wi is independent of n as shown in (42). Fig. 7also shows that an increase of n results in a heavier tail, i.e.,a larger probability of observing a large windowed energy.The windowed energy variance increases as blocklength n increases, which is in good agreement with our observationsin Example 2. Fig. 7 also shows that the estimated windowedenergy variance σ GW is well-approximated by Var (cid:2) G W (cid:3) given in (43) and (46), where the discrepancies are causedby the limited number of samples.Based on Theorem 5, the EDI of CCDM QAM symbolsequences can be obtained directly by substituting (43) and(42) in (39), i.e., Ψ( n, W ) = E (cid:2) | X | (cid:3) [Φ − ( W + 1)]+ 2 (cid:80) Wτ =1 ( W − τ + 1) R E ( τ )( W + 1) E [ | X | ] , (47) where R E ( τ ) and Φ are given by (34) and (10), respectively.Notation Ψ( n, W ) is used to emphasize the dependency of theEDI on the blocklength n and window length W .Apart from n and W , in view of (10), (30) and (34), EDI in(47) is also determined by the kurtosis Φ as well as the secondand fourth order moments of | X | . Furthermore, after dividing(46) by (42) and using (10), the EDI of i.i.d. is obtained, whichis given by Ψ = E (cid:2) | X | (cid:3) (Φ − . (48)It can be seen that (48) is independent of n and W , but is stilla function of the kurtosis Φ . Recall that kurtosis from EGNmodel is derived based on i.i.d. symbols assumption, hence(48) means that EDI can give the same indication of the NLIas kurtosis does in the case of i.i.d. symbols.We have derived a closed-form expression for the EDI in(47) (and for i.i.d. symbols in (48)). In the next section, we willinvestigate properties of the EDI, and compare them againstthe estimated EDI from Monte Carlo simulations. C. Properties

In what follows, we ﬁrst show that for certain values ofblocklength n and window length W , EDI depends linearly on n . This corresponds to a regime where W (and thus, implicitlythe channel memory) is larger than n . We then give boundson EDI for arbitrary values of n and W . Theorem 6:

When n ≤ W + 2 , the EDI Ψ( n, W ) in (47)depends linearly on n via Ψ lin ( n, W ) = n + 13( W + 1) E (cid:2) | X | (cid:3) (Φ − . (49) Proof:

See Appendix C.

Corollary 7:

For any ﬁnite blocklength n , the EDI in (47)is upper- and lower-bounded as ≤ Ψ( n, W ) ≤ E (cid:2) | X | (cid:3) (Φ − . (50)The bounds are achieved for asymptotic values of W , i.e., lim W → Ψ( n, W ) = E (cid:2) | X | (cid:3) (Φ − . (51) lim W →∞ Ψ( n, W ) = 0 , (52) Proof:

See Appendix D.The lower bound in (52) can be intuitively understood asfollows. When W is much larger than n , the windows alwaysinclude multiple complete blocks of QAM symbols, and thecompositions of amplitudes within a large window “hardens”,yielding a reduced ﬂuctuation of the window energies. Theupper bound in (51) is identical as the EDI of i.i.d. symbolsequences in (48). This upper bound indicates that as windowlength decreases (less memory in the metric), the impact ofsymbol energy correlations on EDI decreases.Note that the EDI for constant-modulus constellations (suchas, e.g., phase-shift keying), is identically 0, due to the factthat the symbols have constant energy. The sliding windowenergy is thus constant. This reﬂects the fact that EDI is ametric speciﬁcally designed to capture NLI ﬂuctuations in PASsystems with different shaping blocklengths, and such systemscannot be designed using constant modulus constellations. REPRINT, FEBRUARY 25, 2021 10 − − − − −

50 Blocklength n E D I [ d B ] ˆΨ, Eq. (53) Ψ lin ΨExact Emulated Eq. (49) Eq. (47) W = 10 W = 30 W = 100 W = 1000Upper Bound, Eq. (51) n = W + 2Uniform, Eq. (48) Fig. 8. Simulation results (markers) using (53), analytical results (solid lines)in (47) and (48), and linear EDI in (49) (in dB) vs. blocklength for normalized (cid:0) E (cid:2) | X | (cid:3) = 1 (cid:1) P A = [0 . , . , . , . . Example 5 (EDI and Blocklength):

Fig. 8 shows the EDI(in dB) of symbol sequences for different values of n and W ,as well as the EDI for i.i.d. uniform 64QAM sequences. Theanalytical results of EDI are computed by using (47), (48) and(49) as well as the EDI estimated as ˆΨ (cid:44) σ GW µ GW , (53)where σ GW and µ GW are the estimated windowed energyvariance and the estimated windowed energy mean, respec-tively. Fig. 8 shows that all the simulation results ˆΨ match theanalytical results Ψ in (47) very well. This is the case evenfor short blocklength ( n = 10 ), indicating that the CCDMemulation approach we took in this paper has little impact onthe EDI in (47). Fig. 8 also shows the linear EDI expression Ψ lin in (49). The EDI curves can be seen to be segmentedinto a linearly blocklength-dependent region by n = W + 2 .As n increases above W + 2 , Ψ begins to diverge from Ψ lin in (49). For a ﬁxed n , as W increases, EDI approaches thelower bound in (52) (i.e., −∞ in dB). On the other hand, EDIgradually reaches the upper bound in (51) as W → .It can be concluded from Example 5 that for CCDM QAMsymbol sequences, if a symbol-level interleaver, or a very longblocklength is used, the EDI will approach the upper boundin (51) that is determined by the kurtosis. Hence, EDI can beviewed as a windowed version of kurtosis. For the purpose ofNLI prediction, window length W should be carefully chosensuch that the channel memory effect is properly reﬂected, aswill be shown in the following section.We conclude this section by emphasizing that Example 5(and also Example 3) showed that the CCDM emulationapproach of considering all N C permutations as codewordsgives very precise results. This assumption allowed us toﬁnd closed-form expressions for the average autocorrelation(see (34)) and the EDI (see (47)). In the next section, wetherefore show results using the analytical expressions we havedeveloped above, and thus, only consider emulated CCDM. TABLE IIS

IMULATION P ARAMETERS . Parameter Value

Modulation QAMPol.-mux. SingleWavelength ( λ ) 1550 nmSymbol rate GBdWDM spacing ( ∆ f ) GHzWDM channels ( N ch ) 5Pulse shape root-raised cosineRoll-off Fiber length kmFiber loss . dB/kmDispersion parameter ( D ) ps/nm/kmNonlinear parameter ( γ ) . dBOversampling factor × QAM symbols per run No. of simulation runs 10

V. N

UMERICAL R ESULTS

In this section, we show that EDI can qualitatively predictthe NLI magnitude when different blocklengths are used. Tothis end, we study the effective SNR in (4), where NLI is asubstantial part of the total noise at relatively high power, andwhere NLI changes produce a change in effective SNR.

A. Simulation Setup

We consider an ideal single-polarization multi-span WDMﬁber system with N ch = 5 channels. Nonlinear noise causedby the Kerr effect, as well as ASE and CD are taken intoconsideration. The ﬁber propagation is simulated using thesplit-step Fourier method with a step size of m. Other keysimulation parameters are displayed in Table II. The channel ofinterest is located at the center of the WDM spectrum, wherethe channel spacing is ∆ f = 50 GHz. The signal is generatedwith root-raised cosine pulse shaping. After propagation overeach span of standard single-mode ﬁber with span length 80km, the attenuation is ideally compensated by an Erbium-doped ﬁber ampliﬁer (EDFA). At the receiver, the channelof interest is ﬁltered with a matched ﬁlter, followed by CDcompensation and sampling.The one-sided channel memory M is an important referencefor the window length W in the EDI. M can be estimated us-ing (1) and (2). In (2), the optical bandwidth is ∆ ω = N ch ∆ f ,while group velocity dispersion β is related to dispersionparameter D as shown in [16, Eq. (1.2.11)].For shaped 64QAM transmission, the amplitude PMF P A =[0 . , . , . , . is used. Note that EDI can be used forarbitrary PMF in principle. Quadrature phase-shift keying(QPSK) and uniform 64QAM are presented as baselines.QPSK is anticipated to provide optimal effective SNR per-formance, since constant modulus constellations completelyremove the modulation-dependent NLI [22]. We also considershaped 64QAM symbol sequences using a randomly generatedsymbol-level interleaver to emulate i.i.d. symbol sequences.The symbol sequences are always normalized to E (cid:2) | X | (cid:3) = 1 .An overview of the NLI-related metrics is given in Table III.PAPR and kurtosis do not depend on the blocklength, and thus, REPRINT, FEBRUARY 25, 2021 11

TABLE IIINLI M

ETRICS OF THE E VALUATED S YMBOL S EQUENCES .T HE RESULTS FOR R r ARE TAKEN FROM [25, F IG . 8]. Metrics

CC PS64QAM i.i.d. PS64QAM Uniform64QAM QPSK Θ , PAPR .

769 3 .

769 2 .

336 1Φ , Kurtosis .

653 1 .

381 1 R r , Run Ratio ≈ .

98 0 .

978 0 .

984 0 . these two metrics cannot predict the NLI differences causedby blocklength differences. Although run ratio can, to someextent, capture the blocklength dependency of the NLI, it doesnot take channel memory into account, and thus, cannot adaptto different distances. By contrast, EDI uses a window lengthdepending on the distance.With various blocklengths n , many pairs of effective SNRand EDI can be obtained. To quantify the correlation betweeneffective SNR and EDI, Pearson’s correlation coefﬁcient [50,Ch. 11.1] is used, i.e., r p (cid:44) Cov (cid:2)

SNR eﬀ , ˆΨ (cid:3)(cid:113) Var [SNR eﬀ ] Var (cid:2) ˆΨ (cid:3) . (54)Coefﬁcient values +1 or − indicates perfect correlation,while indicates no correlation. B. EDI and Effective SNR

We ﬁrst investigate how EDI is correlated to effective SNR.Fig. 9 displays effective SNR and EDI vs. blocklength. Ateach transmission distance, the optimal launch power foundfor n = 10 is used (see Fig. 12 ahead). The effective SNRsachieved for n = 10 and n = 10000 at these launch powers areshown with ﬁlled triangles in Fig. 9. EDI is calculated usingthe optimal window lengths W ∗ , which will be discussed inFig. 10. Fig. 9 shows that the estimated EDIs (circles) are ingood agreement with the analytical EDI in (47) (solid lines),despite of slight ﬂuctuations due to the limited number oftransmitted symbols. Effective SNR almost follows the sametrend as that of EDI, and their Pearson’s linear correlationcoefﬁcients at three distances are at least . . Based onTheorem 6, the x-axes in Fig. 9 can be divided into tworegions: n < W ∗ + 2 and n > W ∗ + 2 . The blocklength-dependent region on the left shows that the effective SNRvaries linearly with blocklength n . As n increases enteringthe region on the right, SNR begins to decreases slowly untilexhibiting marginal differences, since for long blocklengths theNLI reduction brought by weakly-correlated symbol energiesbecomes insigniﬁcant. In this region, EDI is determined bykurtosis, and the EGN model is able to give accurate predic-tions of effective SNR, as demonstrated in [25, Fig. 7].The optimal window length W ∗ used in Fig. 9 was chosensuch that EDI yields the highest correlation with effectiveSNR. To this end, W ∗ is obtained by analyzing various win-dow length W at each transmission distance, whose absolutevalue of correlation coefﬁcient | r p | is shown in Fig. 10. It canbe seen that | r p | reaches its peak for values W ∗ , which ismuch smaller than the estimated channel memory M given . . . . . . . n E ﬀ ec t i v e S N R [ d B ] − − − − − − − − − − − W ∗ + 2 = 32 E D I [ d B ] (a) 80 km Ψ lin , Eq. (49) Ψ, Eq. (47) ˆΨ, Eq. (53) SNR eﬀ , Eq. (5) . . . . .

825 Blocklength n E ﬀ ec t i v e S N R [ d B ] − − − − − − − − − W ∗ + 2 = 152 E D I [ d B ] (b) 320 km . . . . . . . . n E ﬀ ec t i v e S N R [ d B ] − − − − − W ∗ + 2 = 1002 E D I [ d B ] (c) 1600 km Fig. 9. Effective SNR (left axis) and EDI (right axis) vs. blocklength aftertransmission distances of (a) 80 km, (b) 320 km and (c) 1600 km. The launchpowers are (a) − . dBm, (b) − . dBm and (c) − . dBm. Error barsfor effective SNRs represent 95% conﬁdence interval. The EDI is shown indB and inverted for convenience of comparison. The optimal window lengths W ∗ shown in the ﬁgure are used for the EDI calculation. The correlationcoefﬁcients r p in (54) are (a) − . , (b) − . and (c) − . . REPRINT, FEBRUARY 25, 2021 12 . . . . . W ∗ = 30 2 M = 216 W ∗ = 150 2 M = 866 W ∗ = 1000 2 M = 4332Window Length W | r p |

80 km320 km1600 km

Fig. 10. Absolute value of Pearson’s linear correlation coefﬁcient | r p | in (54)between EDI and effective SNR vs. window length W for km, kmand km, whose launch powers are − . dBm, − . dBm and − . dBm, respectively. The optimal value of W is denoted by W ∗ . The channelmemories M calculated using (1)–(2) are also indicated.

200 400 600 800 1 ,

000 1 ,

200 1 ,

400 1 , , , , , , , , ,

500 Transmission Distance [km] N u m b e r o f S y m b o l s Channel Memory 2 M Optimal Window Length W ∗ Fig. 11. The estimated channel memory M calculated using (1)–(2) and theoptimal window length W ∗ ( | r p | ≥ . ) at transmission distances from km to km. by (1)–(2). Incorrectly choosing W can lead to smaller valuesof | r p | , and thus the SNR prediction by EDI is less accurate.However, Fig. 10 also shows that even if M is used aswindow length, EDI can still reﬂect the SNR variations withcorrelation coefﬁcients above . Thus, in practice one candirectly use the estimated channel memory M as windowlength rather than ﬁnding the optimal one W ∗ . In general,Fig. 9 and Fig. 10 show that EDI and effective SNR arehighly correlated with each other, indicated by a nearly perfectnegative correlation. It can be concluded that EDI evaluatedwith the optimal window length is capable of reﬂecting theimpact of blocklength and distance on the NLI.Fig. 11 shows the relationship between W ∗ and estimatedchannel memory M at various transmission distances. All − − − − − − − .

91 dB Launch Power [dBm] E ﬀ ec t i v e S N R [ d B ] (a) 80 km QPSK CC PS 64QAM n = 10 i.i.d. PS 64QAMUniform 64QAM CC PS 64QAM n = 10000 − − − − − − − . . . . . .

83 dB Launch Power [dBm] E ﬀ ec t i v e S N R [ d B ] (b) 320 km − − − − − − − . . . . .

64 dB Launch Power [dBm] E ﬀ ec t i v e S N R [ d B ] (c) 1600 km Fig. 12. Effective SNR vs. launch power after transmission distances of(a) 80 km, (b) 320 km and (c) 1600 km. PAS 64QAM with blocklength n = 10 and n = 10000 are displayed. Results of uniform 64QAM, i.i.d.PS 64QAM symbols, and QPSK are also included as references. The circledSNR performance correspond to the launch powers used in Fig. 9 (a)–(c)respectively. W ∗ are obtained with at least . absolute correlationcoefﬁcients | r p | . It can be seen in Fig. 11 that M scales linearly REPRINT, FEBRUARY 25, 2021 13

TABLE IVEDI Ψ ( IN D B) OF THE E VALUATED S YMBOL S EQUENCES

Distance

CC PS 64QAM i.i.d. PS Uniform n = 10 n = 10000 − . − . − . − . −∞

320 km − . − . − . − . −∞ − . − . − . − . −∞ with distance, as M is computed by using (2). Likewise, W ∗ also increases approximately linearly but at a slower rate.The optimal window length W ∗ is smaller than the estimatednumber of interfering symbols M , which can be explainedby two facts. First, (2) is a rough estimation of the channelmemory. Second, the symbol energies within EDI window areassumed to have equal contributions to the NLI (as shown in(36) that all symbol energies are weighted by ). Therefore, W ∗ generally represents an effective number of dominantinterfering symbols that are involved in the NLI generation.To conclude, we show effective SNR vs. launch power inFig. 12, which shows that the impact of the shaping block-length on effective SNR can be as large as that of modulationformat. For simplicity, CCDM 64QAM symbol sequences areonly shown with ultra short blocklength n = 10 and longblocklength n = 10000 , in general representing “good” and“bad” NLI-tolerant blocklengths. The EDI of different symbolsequences are given in Table IV. The ﬁlled triangles in Fig. 12for n = 10 and n = 10000 correspond to the same effectiveSNR markers as shown in Fig. 9. The ﬁrst observation fromFig. 12 (a)–(c) is that QPSK exhibits the best effective SNRfor the distances under consideration, as it has the smallestEDI. Secondly, the effective SNRs of i.i.d. 64QAM symbolsequences and the shaped symbol sequence with n = 10000 almost coincide for all distances. Meanwhile, their EDIs havemarginal differences. Thirdly, the effective SNR of uniform64QAM falls between the effective SNRs of shaped 64QAMusing n = 10 and n = 10000 , so does their EDIs. Lastly,Fig. 12 shows that the SNR gains offered by CCDM symbolsequences using n = 10 instead of n = 10000 decrease from . dB to . dB as transmission distance increases.VI. C ONCLUSIONS

This paper proposed a new heuristic metric called energydispersion index (EDI) to predict the impact of blocklengthon the effective SNR for CCDM-coded systems. EDI is ameasure of the windowed energy dispersion which captures theinteraction between the statistical properties of a CCDM inputsequence and the channel memory with respect to the receivedNLI magnitude. Numerical results show that the effective SNRis highly correlated to the EDI of the transmitted symbolsequence, with correlation coefﬁcients greater than .Being a heuristic metric, the EDI requires future analyticalsubstantiation, possibly leading to a reﬁned version thereof.One possible improvement of the EDI accuracy could beappropriately weighting the symbol energies within the EDIwindow to reﬂect their uneven contributions to the NLI. Inthis paper, we only studied EDI for CCDM, and thus, aperformance analysis of EDI to other shaping algorithms or other constellations is still pending. All these open problemsare left for future work. Nevertheless, we believe that EDI canfacilitate the development of NLI-tolerant signaling schemesthat aim to optimize time-varying statistical properties of theinput symbol sequence.A

CKNOWLEDGMENTS

The authors would like to thank Dr. Yunus Can G¨ultekin andSebastiaan Goossens (Eindhoven University of Technology)for fruitful discussions on shaping techniques.A

PPENDIX AP ROOF OF T HEOREM R E ( i, τ ) in (21).For i = 0 , , . . . , n − (one period), R E ( i, τ ) is given by R E ( i, τ ) =  E (cid:2) | X | (cid:3) , if τ = 0 ρ, if τ (cid:54) = 0 , < i + τ < n E (cid:2) | X | (cid:3) , if τ (cid:54) = 0 , i + τ ≥ n or i + τ ≤ (55)The second case in (55) is when two different symbolenergies belong to the same block. In what follows wewill prove that in this case symbol energies are equallycorrelated and yields the same autocorrelation ρ in (30), andalso autocovariance. By using Lemma 1, (16) and (18), for ∀ i, j ∈ { , , . . . , n − } and i (cid:54) = j, τ = j − i , we can write Cov [ E i , E j ] = R E ( i, j − i ) − E [ E ] (56) = E (cid:2) ( A I,i + A Q,i )( A I,j + A Q,j ) (cid:3) − E (cid:2) A (cid:3) (57) =2 E (cid:2) A i A j (cid:3) − E (cid:2) A (cid:3) , (58) =2 (cid:88) a,b ∈A P A i ,A j ( a, b ) a b − E (cid:2) A (cid:3) , (59)where (57) follows from (20), (22) and (24), and (58) fromthe fact that amplitudes in the I/Q branches in Fig. 1 areindependent from each other. It can be seen in (59) that R E ( i, j − i ) and Cov [ E i , E j ] is determined by the jointprobability P A i ,A j , i.e., P A i ,A j ( a, b ) = P A i ( a ) · P A j | A i ( b | a ) , (60)where P A i ( a ) = n a /n (see (15)). For a given A i = a , theamplitude composition at time j is updated, and thus, P A j | A i ( b | a ) = (cid:26) n a − n − , if b = a n b n − , if b (cid:54) = a . (61)The joint probability P A i ,A j in (60) is thus, independent of i and j , and so is R E ( i, j − i ) = ρ and Cov [ E i , E j ] .Finally, based on the fact that ρ is a constant, we compute ρ and the autocovariance. Similar to (12), the total energy ofa symbol energy block is a constant, i.e., n − (cid:88) i =0 E i = 2 n − (cid:88) i =0 A i = 2 (cid:88) a ∈A a n a . (62) REPRINT, FEBRUARY 25, 2021 14

In analogy to (13), (62) shows that these n symbol energiesalso satisfy a linear relationship. By taking expectation on bothsides of (62), we have n − (cid:88) i =0 E [ E i ] = 2 (cid:88) a ∈A a n a . (63)Subtracting (63) from (62) yields n − (cid:88) i =0 ( E i − E [ E i ]) = 0 , (64)which can be rewritten as E j − E [ E j ] = − n − (cid:88) j (cid:48) =0 j (cid:48) (cid:54) = j ( E j (cid:48) − E [ E j (cid:48) ]) . (65)By using (17) and (65), Cov [ E i , E j ] is expanded, i.e., Cov [ E i , E j ] = E  ( E i − E [ E i ])  − n − (cid:88) j (cid:48) =0 j (cid:48) (cid:54) = j ( E j (cid:48) − E [ E j (cid:48) ])  (66) = − E (cid:104) ( E i − E [ E i ]) (cid:105) − n − (cid:88) j (cid:48) =0 j (cid:48) (cid:54) = j,i E [( E i − E [ E i ]) ( E j (cid:48) − E [ E j (cid:48) ])] (67) = − Var [ E i ] − n − (cid:88) j (cid:48) =0 j (cid:48) (cid:54) = j,i Cov [ E i , E j (cid:48) ] . (68)Therefore, (68) can be rewritten as Cov [ E i , E j ] + ( n − E i , E j ] = − Var [ E i ] . (69)By using (23) in (69), the equality in (31) is obtained.The inequality in (31) clearly follows from the fact that thevariance Var (cid:2) | X | (cid:3) is positive. The last step in the proof is toshow that the autocorrelation R E ( i, τ ) = ρ is given by (30).This follows from substituting (31) into (18). Because of thenegative autocovariance, it can be observed that ρ is smallerthan E (cid:2) | X | (cid:3) . This completes the proof.A PPENDIX BP ROOF OF T HEOREM E (cid:2) G W (cid:3) = 1 n n − (cid:88) i =0 i + W/ (cid:88) j = i − W/ E (cid:2) | X j | (cid:3) (70) = 1 n n − (cid:88) i =0 i + W/ (cid:88) j = i − W/ E (cid:2) | X | (cid:3) (71) = ( W + 1) E (cid:2) | X | (cid:3) , (72)where (70) uses the linearity of expectation, and (71) followsfrom (22). j ji − W i + W i − W i + W ( i, i ) τ = 1 τ = 2...... τ = WW + 1 W +1 Fig. 13. An illustration of possible combinations of j and j (cid:48) for the doublesummations of the last term in (76). The dots at each diagonal directionrepresent W − τ + 1 possible pairs of j and j (cid:48) when j − j (cid:48) > . To prove (43), we begin with

Var (cid:2) G Wi (cid:3) . Using (36),Lemma 1, and [48, Thm. 9.2], we have Var (cid:2) G Wi (cid:3) = i + W/ (cid:88) j = i − W/ Var (cid:2) | X j | (cid:3) + 2 i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 Cov [ E j , E j (cid:48) ] (73) = ( W + 1) Var (cid:2) | X | (cid:3) + 2 i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 Cov [ E j , E j (cid:48) ] (74) = ( W + 1) Var (cid:2) | X | (cid:3) + 2 i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 (cid:2) R E ( j, j (cid:48) − j ) − E (cid:2) | X | (cid:3) (cid:3) (75) = ( W + 1) Var (cid:2) | X | (cid:3) − W ( W + 1) E (cid:2) | X | (cid:3) + 2 i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 R E ( j, j (cid:48) − j ) . (76)where (74) follows from (23), and (75) from (18).The ﬁrst two terms in (76) are known, since Var (cid:2) | X | (cid:3) and E (cid:2) | X | (cid:3) are given in (25) and (24). Meanwhile, R E ( j, j (cid:48) − j ) depends on the time instant j and delay τ = j (cid:48) − j > .Therefore, based on (41), only the last term in (76) needs tobe considered for averaging over n time instants, i.e., n n − (cid:88) i =0  i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 R E ( j, j (cid:48) − j )  (77) = 2 W (cid:88) τ =1 n n − (cid:88) i =0 i + W/ (cid:88) j = i − W/ R E ( j, τ ) (78) = 2 W (cid:88) τ =1 ( W − τ + 1) R E ( τ ) , (79)where (78) can be explained by observing Fig. 13, whichshows possible pairs of j and j (cid:48) and the corresponding values REPRINT, FEBRUARY 25, 2021 15 of τ . To obtain (79), we observe in Fig. 13 that for each valueof τ , there are W − τ + 1 pairs in the corresponding diagonal,whose R E ( j, τ ) are averaged. Hence, (79) is obtained. Finally, Var (cid:2) G W (cid:3) in (43) is simply the sum of the ﬁrst two terms in(76) and (79), which completes the proof.A PPENDIX CP ROOF OF T HEOREM n ≤ W + 2 , the computation of Var (cid:2) G Wi (cid:3) can be simpliﬁed byconsidering the symbol energy “pattern” inside the window.This leads to a simpler expression for the EDI in (49).Given a window length W , the window has W + 1 symbolenergies. This window in general covers multiple symbolenergy blocks, and thus we can write W + 1 = un + r, (80)where u ∈ N is the maximum number of complete symbolenergy blocks covered by the window, and r is the remainderwhen W + 1 is divided by n ( ≤ r ≤ n − ).As the window slides, the symbol energy “pattern” coveredby the window varies cyclically with a period n , which isillustrated in Fig. 14. The pattern shows that the sum of symbolenergies with a shaded background is constant, whereas theenergy of the incomplete blocks located at two edges of thewindows is random. Therefore, the constant part of energyhas no contribution to window energy variance, and thuscan be removed in the computation of the variance. Forexample, for the pattern at the top of Fig. 14, it consists of u complete blocks at the right side, while at its left edge thereare r symbols from the adjacent block. Hence, the varianceof the top pattern is simpliﬁed by only considering these r symbols. Such simpliﬁcation can by done by using (76). Aftersubstituting W + 1 with r , and R E ( j, j (cid:48) − j ) with ρ , we have Var (cid:2) G Wi (cid:3) = r Var (cid:2) | X | (cid:3) − r ( r − E (cid:2) | X | (cid:3) + r ( r − ρ. (81)As long as W + 1 ≥ n − , it can be concluded fromFig. 14 that: (i) two edges with random symbol energiescontribute to the variance, and (ii) these n patterns keep , , ..., n − mutually correlated symbol energies at bothedges, respectively. Under these circumstances, Var (cid:2) G W (cid:3) isobtained by averaging the variance over n patterns, i.e., Var (cid:2) G W (cid:3) = 2 n n − (cid:88) i =0 (cid:20) i Var (cid:2) | X | (cid:3) − i ( i − E (cid:2) | X | (cid:3) + i ( i − ρ (cid:21) (82) = 2Var (cid:2) | X | (cid:3) n ( n − n − (cid:88) i =0 ( ni − i ) (83) = ( n + 1)Var (cid:2) | X | (cid:3) , (84)where (83) follows from (18) and (31). By substituting (84)and (42) into (39), EDI in (49) is obtained. r unun rn − u − n r + 1 r + 1 ( u − n n − slide r symbolsslide 1 symbolslide n − r − W + 1 symbolsleft edges right edges Fig. 14. An illustration of four symbol energy patterns as window slides tothe right through the symbol energy sequence. The shaded area represents u complete blocks. A PPENDIX DP ROOF OF C OROLLARY

Var (cid:2) G W (cid:3) , which directlydetermine the bounds on EDI. For any ﬁnite blocklength n ,the Var (cid:2) G W (cid:3) of the CCDM QAM symbol sequence satisﬁes ≤ Var (cid:2) G W (cid:3) ≤ ( W + 1) Var (cid:2) | X | (cid:3) . (85)The variance is always positive, hence the left inequality in(85) is true. Based on (74), due to negative autocovariancefrom (31), we have, Var (cid:2) G Wi (cid:3) =( W + 1) Var (cid:2) | X | (cid:3) + 2 i + W/ − (cid:88) j = i − W/ i + W/ (cid:88) j (cid:48) = j +1 Cov [ E j , E j (cid:48) ] , (86) ≤ ( W + 1)Var (cid:2) | X | (cid:3) , (87)where (87) holds with equality when W = 0 . After using (41),we have the right inequality in (85). By using (85) and (42)in (39), the inequalities for the EDI in (50) is obtained.We now prove how the bounds in (50) are achieved asymp-totically by W . In terms of the upper bound in (51), by setting W = 0 , the window only encompasses one symbol energy,thereby Var (cid:2) G W (cid:3) = Var (cid:2) | X | (cid:3) and E (cid:2) G W (cid:3) = E (cid:2) | X | (cid:3) .With (39) and using (10), (51) is obtained. From (49), it canbe seen that for n < W + 2 , the EDI tends to for W → ∞ .This completes the proof.R EFERENCES[1] G. B¨ocherer, F. Steiner, and P. Schulte, “Bandwidth efﬁcient andrate-matched low-density parity-check coded modulation,”

IEEE Trans.Commun. , vol. 63, no. 12, pp. 4651–4665, Dec. 2015.[2] J. Cho and P. J. Winzer, “Probabilistic constellation shaping for opticalﬁber communications,”

J. Lightw. Technol. , vol. 37, no. 6, pp. 1590–1607, Mar. 2019.[3] P. Schulte and G. B¨ocherer, “Constant composition distribution match-ing,”

IEEE Trans. Inf. Theory , vol. 62, no. 1, pp. 430–434, Jan. 2016.[4] T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, and K. Par-sons, “Multiset-partition distribution matching,”

IEEE Trans. Commun. ,vol. 67, no. 3, pp. 1885–1893, Mar. 2019.[5] F. Steiner, P. Schulte, and G. Bocherer, “Approaching waterﬁlling ca-pacity of parallel channels by higher order modulation and probabilisticamplitude shaping,” in . Princeton, NJ, USA, 21-23 Mar. 2018.[6] F. M. J. Willems and J. Wuijts, “A pragmatic approach to shaped codedmodulation,” in

Proc. Symp. Commun. Veh. Technol. Benelux . Delft,The Netherlands, Oct. 1993.

REPRINT, FEBRUARY 25, 2021 16 [7] Y. C. G¨ultekin, W. J. van Houtum, S. S¸erbetli, and F. M. J. Willems,“Constellation shaping for ieee 802.11,” in

Proc. IEEE Int. Symp. Pers.,Indoor Mobile Commun.

Montreal, QC, Canada, Oct. 2017.[8] A. K. Khandani and P. Kabal, “Shaping multidimensional signalspaces—Part I: Optimum shaping, shell mapping,”

IEEE Trans. Inf.Theory , vol. 39, no. 6, pp. 1799–1808, Nov. 1993.[9] J. Cho, S. Chandrasekhar, R. Dar, and P. J. Winzer, “Low-complexityshaping for enhanced nonlinearity tolerance,” in

Proc. Eur. Conf. Opt.Commun.

D¨usseldorf, Germany, Sep. 2016, Paper W1C.2.[10] T. Fehenberger, A. Alvarado, G. B¨ocherer, and N. Hanik, “On proba-bilistic shaping of quadrature amplitude modulation for the nonlinearﬁber channel,”

J. Lightw. Technol. , vol. 34, no. 21, pp. 5063–5073, Nov.2016.[11] A. Amari, S. Goossens, Y. C. G¨ultekin, O. Vassilieva, I. Kim, T. Ikeuchi,C. M. Okonkwo, F. M. J. Willems, and A. Alvarado, “Introducingenumerative sphere shaping for optical communication systems withshort blocklengths,”

J. Lightw. Technol. , vol. 37, no. 23, pp. 5926–5936,Dec. 2019.[12] F. Buchali, F. Steiner, G. B¨ocherer, L. Schmalen, P. Schulte, and W. Idler,“Rate adaptation and reach increase by probabilistically shaped 64-QAM: An experimental demonstration,”

J. Lightw. Technol. , vol. 34,no. 7, pp. 1599–1609, Apr. 2016.[13] A. Ghazisaeidi, I. F. de Jauregui Ruiz, R. Rios-M¨uller, L. Schmalen,P. Tran, P. Brindel, A. C. Meseguer, Q. Hu, F. Buchali, G. Charlet et al. , “Advanced C+L-band transoceanic transmission systems basedon probabilistically shaped PDM-64QAM,”

J. Lightw. Technol. , vol. 35,no. 7, pp. 1291–1299, Apr. 2017.[14] J. Cho, X. Chen, S. Chandrasekhar, G. Raybon, R. Dar, L. Schmalen,E. Burrows, A. Adamiecki, S. Corteselli, Y. Pan et al. , “Trans-atlanticﬁeld trial using high spectral efﬁciency probabilistically shaped 64-QAMand single-carrier real-time 250-Gb/s 16-QAM,”

J. Lightw. Technol. ,vol. 36, no. 1, pp. 103–113, Jan. 2018.[15] S. L. Olsson, J. Cho, S. Chandrasekhar, X. Chen, E. C. Burrows, andP. J. Winzer, “Record-high 17.3-bit/s/Hz spectral efﬁciency transmissionover 50 km using probabilistically shaped PDM 4096-QAM,” in

Proc.Opt. Fiber Commun. Conf.

San Diego, CA, USA, Mar. 2018, PaperTh4C.5.[16] G. Agrawal,

Nonlinear Fiber Optics, Third Edition . Academic Press,Jan. 2001.[17] Z. Qu and I. B. Djordjevic, “Geometrically shaped 16QAM outperform-ing probabilistically shaped 16QAM,” in

Proc. Eur. Conf. Opt. Commun .IEEE, Sep. 2017, Paper Th.2.F.4.[18] K. G¨um¨us¸, A. Alvarado, B. Chen, C. H¨ager, and E. Agrell, “End-to-end learning of geometrical shaping maximizing generalized mutualinformation,” in

Proc. Opt. Fiber Commun. Conf.

San Diego, CA,USA, Mar. 2020. Paper W3D.4.[19] J. Renner, T. Fehenberger, M. P. Yankov, F. Da Ros, S. Forchhammer,G. B¨ocherer, and N. Hanik, “Experimental comparison of probabilisticshaping methods for unrepeated ﬁber transmission,”

J. Lightw. Technol. ,vol. 35, no. 22, pp. 4871–4879, Nov. 2017.[20] E. Sillekens, D. Semrau, G. Liga, N. A. Shevchenko, Z. Li, A. Alvarado,P. Bayvel, R. I. Killey, and D. Lavery, “A simple nonlinearity-tailoredprobabilistic shaping distribution for square QAM,” in

Proc. Opt. FiberCommun. Conf.

San Diego, CA, USA, Mar. 2018. Paper M3C.4.[21] E. Agrell, A. Alvarado, G. Durisi, and M. Karlsson, “Capacity ofa nonlinear optical channel with ﬁnite memory,”

J. Lightw. Technol. ,vol. 32, no. 16, pp. 2862–2876, Aug. 2014.[22] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “On shaping gain inthe nonlinear ﬁber-optic channel,” in

IEEE Int. Symp. Inform. Theory .Honolulu, HI, USA, June 2014, pp. 2794–2798.[23] ——, “Time varying ISI model for nonlinear interference noise,” in

Optical Fiber Communication Conference . San Diego, CA, USA, Mar.2014, Paper W2A.62.[24] M. P. Yankov, K. J. Larsen, and S. Forchhammer, “Temporal probabilis-tic shaping for mitigation of nonlinearities in optical ﬁber systems,”

J.Lightw. Technol. , vol. 35, no. 10, pp. 1803–1810, May 2017.[25] T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, K. Parsons,and H. Griesser, “Analysis of nonlinear ﬁber interactions for ﬁnite-lengthconstant-composition sequences,”

J. Lightw. Technol. , vol. 38, no. 2, pp.457–465, Jan. 2020.[26] S. Goossens, S. Van der Heide, M. van den Hout, A. Amari, Y. C.G¨ultekin, O. Vassilieva, I. Kim, T. Ikeuchi, F. M. J. Willems, A. Alvarado et al. , “First experimental demonstration of probabilistic enumerativesphere shaping in optical ﬁber communications,” in . Fukuoka, Japan, July 2019. [27] T. Fehenberger, H. Griesser, and J.-P. Elbers, “Mitigating ﬁber nonlinear-ities by short-length probabilistic shaping,” in

Proc. Opt. Fiber Commun.Conf.

San Diego, CA, USA, Mar. 2020, Paper Th1I.2.[28] S. Civelli, E. Forestieri, and M. Secondini, “Interplay of probabilisticshaping and carrier phase recovery for nonlinearity mitigation,” Sep.2020. [Online] Available: https://arxiv.org/abs/2009.01135.[29] P. Skvortcov, I. D. Phillips, W. Forysiak, T. Koike-Akino, K. Kojima,K. Parsons, and D. Millar, “Huffman-coded sphere shaping for extended-reach single-span links,”

IEEE J. Sel. Top. Quantum Electron. , Jan. 2021,(early access).[30] P. Poggiolini, A. Carena, V. Curri, G. Bosco, and F. Forghieri, “An-alytical modeling of nonlinear propagation in uncompensated opticaltransmission links,”

IEEE Photon. Technol. Lett. , vol. 23, no. 11, pp.742–744, June 2011.[31] P. Poggiolini, G. Bosco, A. Carena, V. Curri, Y. Jiang, and F. Forghieri,“The GN-model of ﬁber non-linear propagation and its applications,”

J.Lightw. Technol. , vol. 32, no. 4, pp. 694–721, Feb. 2014.[32] A. Carena, V. Curri, G. Bosco, P. Poggiolini, and F. Forghieri, “Modelingof the impact of nonlinear propagation effects in uncompensated opticalcoherent transmission links,”

J. Lightw. Technol. , vol. 30, no. 10, pp.1524–1539, May 2012.[33] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlinearnoise in long, dispersion-uncompensated ﬁber links,”

Opt. Express ,vol. 21, no. 22, pp. 25 685–25 699, Nov. 2013.[34] ——, “Inter-channel nonlinear interference noise in WDM systems:modeling and mitigation,”

J. Lightw. Technol. , vol. 33, no. 5, pp. 1044–1053, Mar. 2015.[35] A. Carena, G. Bosco, V. Curri, Y. Jiang, P. Poggiolini, andF. Forghieri, “On the accuracy of the GN-model and on analyti-cal correction terms to improve it,” Jan. 2014. [Online] Available:https://arxiv.org/abs/1401.6946.[36] P. Poggiolini, G. Bosco, A. Carena, V. Curri, Y. Jiang, and F. Forghieri,“A simple and effective closed-form GN model correction formulaaccounting for signal non-Gaussian distribution,”

J. Lightw. Technol. ,vol. 33, no. 2, pp. 459–473, Jan. 2015.[37] T. Fehenberger and A. Alvarado, “Analysis and optimisation of distribu-tion matching for the nonlinear ﬁbre channel,” in

Proc. Eur. Conf. Opt.Commun.

Dublin, Ireland, Sep. 2019. Poster Session 1.[38] T. Fehenberger, “On the impact of ﬁnite-length probabilistic shap-ing on ﬁber nonlinear interference,” June 2020. [Online] Available:https://arxiv.org/abs/2006.07004.[39] A. Mecozzi and R. Essiambre, “Nonlinear shannon limit in pseudolinearcoherent systems,”

J. Lightw. Technol. , vol. 30, no. 12, pp. 2011–2024,June 2012.[40] Z. Tao, L. Dou, W. Yan, L. Li, T. Hoshida, and J. C. Rasmussen,“Multiplier-free intrachannel nonlinearity compensating algorithm oper-ating at symbol rate,”

J. Lightw. Technol. , vol. 29, no. 17, pp. 2570–2576,Sep. 2011.[41] J. Armstrong, “OFDM for optical communications,”

J. Lightw. Technol. ,vol. 27, no. 3, pp. 189–204, Feb. 2009.[42] B. Chen, A. Alvarado, S. van der Heide, M. van den Hout, H. Hafer-mann, and C. Okonkwo, “Analysis and experimental demonstra-tion of orthant-symmetric four-dimensional 7 bit/4D-sym modula-tion for optical ﬁber communication,” Mar. 2020. [Online] Available:https://arxiv.org/abs/2003.12712.[43] O. Geller, R. Dar, M. Feder, and M. Shtaif, “A shaping algorithmfor mitigating inter-channel nonlinear phase-noise in nonlinear ﬁbersystems,”

J. Lightw. Technol. , vol. 34, no. 16, pp. 3884–3889, May 2016.[44] F. Gray, “Pulse code communication,” US Patent 2 632 058, Mar. 1953.[45] Y. C. G¨ultekin, T. Fehenberger, A. Alvarado, and F. M. J. Willems,“Probabilistic shaping for ﬁnite blocklengths: Distribution matching andsphere shaping,”

Entropy , vol. 22, no. 5, p. 581, May 2020.[46] Y. C. G¨ultekin, W. J. Van Houtum, A. Koppelaar, and F. M. J. Willems,“Comparison and optimization of enumerative coding techniques foramplitude shaping,”

IEEE Commun. Lett. , Dec. 2020, (early access).[47] A. Papoulis and S. U. Pillai,

Probability, random variables, and stochas-tic processes, Fourth Edition . Tata McGraw-Hill Education, 2002.[48] R. D. Yates and D. J. Goodman,

Probability and stochastic processes:a friendly introduction for electrical and computer engineers, ThirdEdition . John Wiley & Sons, 2014.[49] G. B. Giannakis, “Cyclostationary signal analysis,” in

Digital SignalProcessing Handbook, First Edition . V. K. Madisetti and D. Williams,Eds, Boca Raton, FL: CRC, 1998.[50] J. D. Gibbons and S. Chakraborti,