[PDF] Compressed Shaping: Concept and FPGA Demonstration

Abstract

Probabilistic shaping (PS) has been widely studied and applied to optical fiber communications. The encoder of PS expends the number of bit slots and controls the probability distribution of channel input symbols. Not only studies focused on PS but also most works on optical fiber communications have assumed source uniformity (i.e. equal probability of marks and spaces) so far. On the other hand, the source information is in general nonuniform, unless bit-scrambling or other source coding techniques to balance the bit probability is performed. Interestingly, one can exploit the source nonuniformity to reduce the entropy of the channel input symbols with the PS encoder, which leads to smaller required signal-to-noise ratio at a given input logic rate. This benefit is equivalent to a combination of data compression and PS, and thus we call this technique compressed shaping. In this work, we explain its theoretical background in detail, and verify the concept by both numerical simulation and a field programmable gate array (FPGA) implementation of such a system. In particular, we find that compressed shaping can reduce power consumption in forward error correction decoding by up to 90% in nonuniform source cases. The additional hardware resources required for compressed shaping are not significant compared with forward error correction coding, and an error insertion test is successfully demonstrated with the FPGA.

Full PDF

11 Compressed Shaping:Concept and FPGA Demonstration

Tsuyoshi Yoshida,

Member, IEEE,

Koji Igarashi,

Member, IEEE,

Magnus Karlsson,

Fellow, OSA; Senior Member, IEEE, and Erik Agrell,

Fellow, IEEE

Abstract —Probabilistic shaping (PS) has been widely studiedand applied to optical ﬁber communications. The encoder of PSexpends the number of bit slots and controls the probabilitydistribution of channel input symbols. Not only studies focusedon PS but also most works on optical ﬁber communications haveassumed source uniformity (i.e. equal probability of marks andspaces) so far. On the other hand, the source information is ingeneral nonuniform, unless bit-scrambling or other source codingtechniques to balance the bit probability is performed. Interest-ingly, one can exploit the source nonuniformity to reduce theentropy of the channel input symbols with the PS encoder, whichleads to smaller required signal-to-noise ratio at a given inputlogic rate. This beneﬁt is equivalent to a combination of datacompression and PS, and thus we call this technique compressedshaping . In this work, we explain its theoretical background indetail, and verify the concept by both numerical simulation and aﬁeld programmable gate array (FPGA) implementation of such asystem. In particular, we ﬁnd that compressed shaping can reducepower consumption in forward error correction decoding by upto 90% in nonuniform source cases. The additional hardwareresources required for compressed shaping are not signiﬁcantcompared with forward error correction coding, and a real-timeback-to-back test is successfully demonstrated.

Index Terms —Coding, data compression, distribution match-ing, entropy, implementation, modulation, optical ﬁber commu-nication, probabilistic shaping, source coding.

I. I

NTRODUCTION

Trafﬁc demands are growing with deployments of mo-bile communication systems for the 5th generation and be-yond. Optical ﬁber communications take a key role in thecommunication infrastructure because of its high capacity.In the past, the modulation formats used in optical ﬁbercommunications were binary, e.g., on–off keying, binary, orquaternary phase-shift keying without forward error correction(FEC) or with hard-decision FEC [1]. However, the latest 400

This work was presented in part at OFC 2019 [31] and ECOC 2019 [32].T. Yoshida is with Information Technology R&D Center, Mitsubishi ElectricCorporation, Kamakura, 247-8501, Japan. He also belongs to GraduateSchool of Engineering, Osaka University, Suita, 565-0871, Japan (e-mail:[email protected]).K. Igarashi is with Graduate School of Engineering, Osaka University,Suita, 565-0871, Japan.M. Karlsson is with the Dept. of Microtechnology and Nanoscience andE. Agrell is with the Dept. of Electrical Engineering, both at ChalmersUniversity of Technology, SE-41296 Gothenburg, Sweden.This work was partly supported by “Massively Parallel and Sliced Opti-cal Network (MAPLE),” the Commissioned Research of National Instituteof Information and Communications Technology (NICT), Japan (projectno. 20401).Copyright (c) 2021 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

Gb/s standards [2], [3] utilize 16-ary quadrature amplitudemodulation (QAM) with soft-decision (SD) FEC under bit-interleaved coded modulation (BICM) [4]–[7]. Furthermore,constellation shaping [8], or more speciﬁcally, probabilisticshaping (PS) [9], [10], has attracted wide research interestdue to its capacity-approaching performance [11]–[14]. Es-pecially reverse concatenation, where the shaping encoding(also known as distribution matching , DM) [15]–[20], is doneoutside the FEC encoding [21]–[23], made PS deployable interms of implementation capability.An optimal encoder, which minimizes the rate loss in theconversion process, can be theoretically achieved in two stepsif the block length is large enough; by applying source coding(often called “data compression”) ﬁrst and channel coding(i.e., constellation shaping and FEC coding) next. Information-theoretic coding and modulation techniques have realizedsigniﬁcant performance improvements in recent years, thusalmost closing the gap to the Shannon channel capacity [24].In contrast, coding for dynamically variable source informa-tion has rarely been investigated for ﬁber-optic communicationsystems, which aggregates massive user trafﬁc in frames. Inthe standard [25], simple bit scrambling (ﬂipping bits by theexclusive OR operation with a pseudorandom bit sequence(PRBS)) has been implemented to balance the mark (logic‘ ’) and space (logic ‘ ’) counts instead of applying anysource coding. Often we tend to assume the source bits asjust uniformly distributed and independent, although the truesource entropy before bit scrambling is variable and dependenton the user trafﬁc, e.g., due to the existence of idle frames inthe media access control protocol [25].Data compression and shaping are almost inverse oper-ations, i.e., the former converts a nonuniform informationsequence into a shorter uniform one, while the latter does theopposite. Simultaneous realization of data compression andshaping is not only an interesting research topic but also akey technique for more efﬁcient communications in practice.Thus in this paper, we propose and investigate compressedshaping , which combines the beneﬁts from data compressionand shaping. Similar ideas have been studied in the contextof joint source–channel coding in communication theory [26],[27], but an application to ﬁber-optic communication is pre-sented here, for the ﬁrst time to the best of our knowledge.Compressed shaping is enabled by a shaping encodingthat is sensitive to the source entropy. As data compressionallocates short bit patterns to frequently occurring sourcewords, compressed shaping allocates amplitudes with smallenergy to such frequent source words. This compression a r X i v : . [ c s . I T ] F e b feature is similar to burst signalling in time-domain multipleaccess [28], i.e., optical power variation depending on thetrafﬁc. Compressed shaping is a ﬁxed-length to ﬁxed-lengthconversion, and the average energy of channel input symbolsis reduced for source information sequences having a smallentropy. We do not need an operational mode change suchas updating the source statistics based on prior knowledge,although this would be a kind of data compression. Ourpreviously proposed look-up table (LUT)-based hierarchicalDM [18] and following works [29], [30] are applicable forthis purpose without signiﬁcantly increased complexity, butrather a reordering of the LUT entries. The proposed techniquecan reduce the rate losses associated with source and channelcoding compared with state-of-the-art DM schemes such asconstant-composition DM (CCDM) [15], or reduce the powerconsumption in the FEC at a given information rate andsignal-to-noise ratio (SNR) by relaxing the FEC performance.We also report a ﬁeld programmable gate array (FPGA)implementation and realtime evaluation results for compressedshaping 16- and 64-QAM at system throughputs of and

113 Gb / s , respectively.This is an evolutional work of [31]–[33], which are hereextended by providing a more detailed theoretical backgroundof compressed shaping. The system throughput is increased byseparating the clock domain into data and controlling becausethe control circuitry was the bottleneck in the logical circuitrywhen making the clock frequency faster. The accuracy of thepower consumption estimation is improved by introducing adynamic simulation. Even if we consider shaping encodingonly, there have been very few other reports on FPGA im-plementations [34], [35]. Furthermore, there are neither anyreports on static/dynamic power consumption estimates norany real-time evaluations of FPGA implementations includingboth shaping encoding and decoding, except [32].The rest of the paper is organized as follows. The prin-ciple of compressed shaping is explained in Sec. II, andits numerical simulations are shown in Sec. III. An FPGAimplementation example of compressed shaping is found inSec. IV, and real-time demonstrations are summarized inSec. V. Finally, Sec. VI concludes the paper.II. C OMPRESSED SHAPING — BASIC PRINCIPLES

While conventional PS systems employ full bit-scramblingand assume source uniformity, the proposed compressed shap-ing scrambles the sign bit only and applies source-sensitiveamplitude shaping to realize better performance in the caseof small source entropy. In this section, we ﬁrstly review thehistorical source uniformity assumption in conventional sys-tems and discuss rate loss under either uniform or nonuniformsource conditions. Then we compare compressed shaping withbit-sequence data compression and PS. Finally, we show thesystem model and characterize entropy bounds in compressedshaping systems.

A. Rate loss in conventional systems

When investigating coding and modulation techniques,source uniformity is usually assumed. It is because a mark ratio is produced by transcoding or bit scrambling, evenif the true source information is nonuniform, which is oftenthe case due to dynamically variable client trafﬁc.Optical ﬁber communication systems gather many mediaaccess control frames from client trafﬁc. There are usually idleframes, which are transcoded by 64B/66B line coding into thezero codeword except for the control bits [25]. Due to theexistence of such idle frames, the source bits S i ∈ { , } can have more than ‘ ’s, which makes the binary sourceentropy H ( S i ) < , where H ( · ) denotes entropy. In conven-tional systems, the (serial) source bits S i are parallelized toform a source bit sequence [ S . . . S k bs ] , after which the bitscrambling converts [ S . . . S k bs ] into a scrambled bit sequence [ U . . . U k bs ] . The sequence length k bs is chosen to matchthe applied bit-scrambling protocol. An arbitrary (randomlyselected) bit in this scrambled sequence has the distribution P U = (1 /k bs ) (cid:80) k bs i =1 P U i , which is assumed to be uniform and H ( U ) = 1 , even when the corresponding (serial) unscrambledbit distribution P S = (1 /k bs ) (cid:80) k bs i =1 P S i yields an entropy H ( S ) < . The bit scrambling is essential in binary modula-tion for maintaining direct current levels in electronic devicesand for recovering the clock signal at the receiver. Non-PSQAM systems also utilize bit scrambling to maintain the mark ratio. In general, neither S i nor U i ( i = 1 , , . . . , k bs ) isensured to be identical and independent distributed (i.i.d.), so H ([ S . . . S k bs ]) ≤ k bs H ( S ) and H ([ U . . . U k bs ]) ≤ k bs H ( U ) by the concavity of entropy. However, [ U . . . U k bs ] is assumedto be i.i.d. and uniform in most research works on channelcoding.In PS, the goal is to reduce symbol entropy at a giveninformation rate in order to reduce the required SNR forquasi-error-free operation over channels approximated by theGaussian channel. The scrambled bit sequences [ U . . . U k bs ] are rearranged into sequences of length k , and each suchlength- k bit sequence is mapped into a sequence of amplitudes [ A . . . A n ] . The sequence lengths k and n are selected tomatch the PS scheme, regardless of k bs . Since U i is in generalnot i.i.d., neither is A i , and thus H ([ A . . . A n ]) ≤ n H ( A ) ,where P A = (1 /n ) (cid:80) ni =1 P A i . On the other hand, A i isassumed to be i.i.d. in a mismatched (memoryless) receiver.The PS performance is typically quantiﬁed with a rate loss R loss ( U, A ) = H ( A ) − kn , (1)where R loss ( · , · ) denotes a rate loss in a sequence conversion,and R loss ( U, A ) ≥ for i.i.d. U i . The CCDM [15], which hasbeen the state-of-the-art PS coding, has an almost negligiblerate loss when n is sufﬁciently large, e.g., – . TheCCDM generates a ﬁxed probability mass function (PMF) P A of output amplitudes A regardless of the DM encoderinput bit sequence and its statistics. For a ﬁxed PMF P A with a nonconstant-composition DM, U i is required to beuniformly distributed in general, which can be achieved bybit scrambling. B. Rate loss in source to amplitude conversion

The CCDM shows negligible rate loss R loss ( U, A ) in (1)with a sufﬁciently large block length in PS for a uniform source, but not for nonuniform sources, in which case thechannel input symbol entropy or average symbol energy can befurther reduced by exploiting nonconstant-composition DM.During a conversion process from source bit sequence toamplitude symbol sequence, bit scrambling acts against thesymbol entropy reduction because it maximizes binary entropy( H ( U ) = 1 ) even if H ( S ) < . Instead, we propose toexploit this source nonuniformity. In the case of H ( S ) < ,the rate loss in the conversion from a nonuniform source bitsequence [ S . . . S k ] into an amplitude sequence [ A . . . A n ] can in general be expressed as R loss ([ S . . . S k ] , A ) = H ( A ) − H ([ S . . . S k ]) n , (2)which is bounded as R loss ([ S . . . S k ] , A ) ≥ R loss ( S, A ) (3) R loss ( S, A ) = H ( A ) − H ( S ) kn . (4)Obviously, CCDM is not optimum (with or without bit scram-bling) because of the constant P A and H ( A ) regardless of thesource distribution. The general rate loss bound R loss ( S, A ) in (4) becomes signiﬁcantly larger than R loss ( U, A ) in (1)under a small H ( S ) . A nonconstant-composition DM couldbe source sensitive, i.e., realizing a smaller output entropy H ( A ) for a smaller input entropy H ( S ) , resulting in a smaller R loss ( S, A ) compared with CCDM. Even in such cases, atleast the sign bits should be bit-scrambled for the directcurrent level management and clock recovery. To enhancethe performance under nonuniform source information, datacompression is another option, at the expense of, possiblylarge, digital signal processing circuit resources, since it hasto adapt to the time variation of H ( S ) . C. Proposed bit sequence conversion

This section explains the principle of the proposed com-pressed shaping compared to well-known data compressionand PS. Tab. I shows small examples of bit sequence conver-sions using (a) data compression, (b) PS, and (c) compressedshaping.Tab. I(a) assumes a nonuniform source input bit sequence S = [ S S ] with a ﬁxed length k = 2 . The bit sequenceconversion is given by Huffman coding [36], which allocatesa short output string to a high probability input word. Theoutput bit sequence C have variable lengths n C from to in this example. Then the ﬁxed input length k = 2 isshortened to n avg = E [ n C ] = 1 . on average, where E denotesexpectation. Such conversion is useful to reduce the requiredstorage size after conversion. Huffman coding is an invertibledata compression; however, it is usually not suitable for highthroughput data communications due to issues of latency andrequired storage size in the variable-length conversion process.Tab. I(b) assumes a uniform source input bit sequence U =[ U U ] with a ﬁxed length k = 2 , which is converted intoan amplitude sequence A = [ A A A ] with a ﬁxed length n = 3 and an amplitude element A ∈ { , } . Candidates ofoutput amplitudes are sorted by ascending order of averageenergy per codeword E A = || A || /n . The output amplitude TABLE IB

IT SEQUENCE CONVERSIONS .(a) Data compressionInput OutputBits Probability Bits Probability Length S P S C P C n C

00 0.50 1 0.50 101 0.30 01 0.30 210 0.15 001 0.15 311 0.05 000 0.05 3Average 1.7(b) PSInput OutputBits Probability Amplitudes Probability Avg. energy U P U A P A E A

00 0.25 111 0.25 101 0.25 113 0.25 3.6710 0.25 131 0.25 3.6711 0.25 311 0.25 3.67Average 3(c) Compressed shapingInput OutputBits Probability Amplitudes Probability Avg. energy S P S A P A E A

00 0.50 111 0.50 101 0.30 113 0.30 3.6710 0.15 131 0.15 3.6711 0.05 311 0.05 3.67Average 2.33 sequence A = 111 has the smallest E A of , and A = 113 , , and have the second smallest E A of / . Amplitudesequences A = 133 , , , and are not chosen asoutput strings in this codebook because of their large E A .Then the average output energy E avg = E [ E A ] = 3 . Whenwe use a codebook with uniform output amplitudes of and with a length of (i.e., , , , and ), E = 1 , E = E = 5 , E = 9 , and E avg = 5 . Compared with sucha uniform output amplitude case, E avg is reduced by . with this exempliﬁed PS, leading to a small required SNR ata given information rate over the Gaussian channel. To shapethe output amplitude probabilities (i.e., to reduce the outputamplitude entropies), we thus need more output bit slots thaninput bit slots.Tab. I(c) exempliﬁes the proposed compressed shaping,which assumes a nonuniform source input bit sequence S =[ S S ] as in the case of Tab. I(a). The source bit sequence S with a ﬁxed length k = 2 is converted into an amplitudesequence A = [ A A A ] with a ﬁxed length n = 3 .When generating the codebook, we need two sortings: (i)the input bit sequences S are sorted by descending order ofprobability P S , and (ii) the output amplitude sequence A issorted by ascending order of average energy E A . By allocatingoutput amplitude sequences with a small energy to input bitsequences with a high probability, the average output energy E avg becomes / , which is . further less than the onein Tab. I(b). Both conventional PS and compressed shaping areﬁxed-length to ﬁxed-length conversions in these examples, andare therefore suitable for high-throughput data communicationwhich arranges data into ﬁxed-length frames. We do not needany adaptation of the codebook to different source statistics. D. Proposed system model

Fig. 1 shows the system model for the proposed com-pressed shaping. Here we exemplify using hierarchical DM[18] for the shaping encoder/decoder, but generally anynonconstant-composition DM can be used. The source in-formation bits are parallelized into a source bit sequence S = [ S . . . S k bs + k ] , which is separated into a source signbit sequence S s = [ S s , . . . S s ,k bs ] and a source amplitude bitsequence S a = [ S a , . . . S a ,k ] . The source sign bit sequence S s is bit-scrambled into U s = [ U s , . . . U s ,k bs ] by taking theexclusive OR with a PRBS, to balance the numbers of ‘ ’s and‘ ’s. The source amplitude bit sequence S a is processed by abit-ﬂipping function, which ﬂips all input bits if there are more‘ ’s than ‘ ’s and adds a parity bit ‘ ’, otherwise just adds aparity bit ‘ ’, because a large mark ratio is not desirable in thecompressed shaping scheme. Then the bit-ﬂipping encodingoutput bit sequence F = [ F . . . F k +1 ] contains at least asmany ‘ ’s as ‘ ’s. The sequence F is then processed in ahierarchical DM, which consists of hierarchically connectedsmall look-up tables (LUTs) as shown in Fig. 2 [18], [38].There are L layers and T (cid:96) LUTs in each layer (cid:96) . Each LUTin a layer (cid:96) receives s (cid:96) bits from the input interface of theDM and r (cid:96) bits from layer (cid:96) + 1 , and it transmits r (cid:96) − bitsto each of t (cid:96) − LUTs in layer (cid:96) − , in total u (cid:96) = t (cid:96) − r (cid:96) − transmitted bits. To determine the one-to-one correspondenceof input and output words in each small LUT, we sort theinput words in descending order of the number of ‘ ’s and theoutput amplitudes in ascending order of average energy, as inthe small example shown in Tab. I(c). The output amplitudesequence from the hierarchical DM is A = [ A c , . . . A c ,n ] ,where each element is a two-dimensional vector A c ,i ∈{ , , . . . , m a / − } , m a is the number of bit tributaries fora two-dimensional amplitude, and P A c = (1 /n ) (cid:80) ni =1 P A c ,i .The amplitude sequence A is represented by a bit sequence B a = [ B a , . . . B a ,nm a ] . From B a and U s , an FEC parity bitsequence B fp is generated by a systematic FEC encoder withan FEC code rate R c . Then U s and B fp are concatenated intoa sign bit sequence B s = [ B s , . . . B s ,nm s ] , where m s denotesthe number of sign-bit tributaries. The number of elements in S s or U s is k bs = n ( m s R c − m a (1 − R c )) and that in B fp is (1 − R c ) n ( m s + m a ) . Finally channel input QAM symbols X = [ X c , . . . X c ,n ] are generated from B s and B a , where P X c = (1 /n ) (cid:80) ni =1 P X c ,i .The channel output QAM symbols Y = [ Y c , . . . Y c ,n ] aredemapped into an L-value sequence L = [ L . . . L ( m s + m a ) n ] by memoryless bit-metric decoding [11], where Y c ,i ∈ R and R denotes the real number set. The receiver-side process-ing reverses the one on the transmitter side. QAM symboldemapping and FEC decoding are performed to recover thescrambled source sign bit sequence ˆ U s and the shaped ampli-tude bit sequence ˆ B a . The recovered scrambled source signbit sequence ˆ U s is bit-descrambled into the source sign bitsequence ˆ S s . The shaping decoder, which is a hierarchicalDM decoder here, converts ˆ B a into the ﬂipped bit sequence There may be situations with a predominance of ‘ ’s (due to many idleframes) or ‘ ’s (due to an alarm indication signal, AIS) [25]. In both cases,the source entropy is small. Fig. 1. System model for the proposed compressed shaping. ˆ F . The bit-ﬂipping is terminated based on the parity bit, i.e., ifthe parity bit is ‘ ’, all input bits are ﬂipped and the parity bitis removed, otherwise the parity bit is just removed, to obtainthe source amplitude bit sequence ˆ S a . Finally, the source signbit sequence ˆ S s and the source amplitude bit sequence ˆ S a areconcatenated into the source information bit sequence ˆ S .We here summarize the entropy and rate loss in thiscompressed shaping system. The entropy of the channel inputsymbol is given by H ( X c ) = m s + H ( A c ) . (5)If S , . . . , S k bs + k are i.i.d. with a distribution P S = P S a = k bs + k (cid:80) k bs + ki =1 P S i , then the rate loss in the sequence conver-sion from S a to A c is, similarly to (4), R loss ( S a , A c ) = H ( A c ) − H ( S ) kn . (6)In order to characterize the performance in Sec. III, we heredeﬁne the minimum entropy of a channel input symbol as H LB = m s + H ( S ) kn . (7)The time-variable PMF P A c can cause practical issues withrespect to electrical amplitude, optical power, and SNR controlin ﬁber-optic communication systems, although the variationof P A c per wavelength channel can be statistically relaxed bymultiplexing many channels. The analysis of such issues anddevelopment of appropriate control methods are deferred tofuture work. III. S IMULATIONS

To verify the concept of compressed shaping, we performednumerical simulations based on the system model in Sec. II-D.In this section, nonuniform, independent source informationbits S were generated for simplicity, by independent, uni-formly distributed pseudorandom numbers from the Mersennetwister. For a given target source mark ratio P S (1) , theuniformly distributed pseudorandom numbers ranging from to were binarized with a threshold level − P S (1) , i.e.,generating a logic ‘ ’ for a random number to − P S (1) and a logic ‘ ’ otherwise. In this simulation, we set a static Fig. 2. Schematic of hierarchical DM encoding [38, Fig. 2]. source mark ratio P S (1) in a simulation batch for a shortperiod.Fig. 3 shows the average two-dimensional symbol energy E as a function of the lower bound entropy of channel inputsymbols H LB in (7) for various PS-QAM formats and sourcemark ratios P S (1) = 0 . , . , . , . , . , and . , wherethe minimum Euclidean distance d LB = 2 . For compressedshaping with hierarchical DM, we employed -, -, -, -,and -QAM as base constellations. The PS overhead wasaround in each case when assuming the use of a rate- / FEC, and the PS codeword length (number of QAM symbols)was , , , , and for -, -, -, -, and -QAM, respectively. Such granular base constellation andshallow shaping help to avoid excessive increases of peak-to-average power ratio and power consumption [37], [38] andpenalties from nonideal FEC performance [39]. As the 8-QAMconstellation, C in [40] was used to make the constellationsymmetric around the imaginary axis, so that the uniformlydistributed FEC parity bits could be placed on the sign bitswithout changing P | X c | . The bit labelling for 128-QAM wasbased on [41, Fig. 3]. For comparison, CCDM-based PS-16-QAM and PS-64-QAM were also evaluated, using the samePS overhead and codeword length as with compressed shaping.We also evaluated the performance of PS-4096-QAM with anideal Maxwell–Boltzmann input distribution and perfect datacompression.With CCDM, the energy E in Fig. 3 is constant for various P S (1) cases, because its output PMF P A c does not depend onincoming bits. With compressed shaping, on the other hand, E decreases with decreasing P S (1) (and H ( S ) ) for each baseconstellation, although there are signiﬁcant performance gapsto the ideal case (black solid line in Fig. 3), especially for high-order QAM. Fig. 4 exempliﬁes the PMF P A for compressedshaping 64-QAM with source mark ratios P S (1) = 0 . – . . Higher source nonuniformity (i.e., smaller P S (1) ) makesdeeper PS. Under such source nonuniformity, we observedreduced P A (3) , P A (5) , and P A (7) and increased P A (1) com-pared with the uniform source case. The transmitted PMFs of The signal points are ( ± , , ( ± , ± , and ( ± , . Fig. 3. Average two-dimensional symbol energy E for various PS-QAMformats and source mark ratios P S (1) in the range . – . as a function ofthe lower bound entropy of channel input symbols H LB = m s + H ( S ) k/n .Fig. 4. An example of a one-dimensional amplitude PMF P A by compressedshaping 64-QAM, for various source mark ratios P S (1) . In situations withdynamically variable P S (1) , the demapper assumes a constant P S (1) = 0 . ,as explained in the text. one-dimensional amplitudes A and two-dimensional channelinput symbols X c at a given source mark ratio P S (1) aredenoted as P A ( P S (1)) and P X c ( P S (1)) , respectively.We then simulated the required SNR over the Gaussianchannel with the DVB-S2 low-density parity check code [42]having a code rate of / , with a maximum number ofdecoding iterations of . Fig. 5 shows the simulated requiredSNR with the maximum number of FEC decoding iterations.We simulated both mismatched and matched decoding. Thetrue transmitted two-dimensional symbol PMF is denotedby P X c ( P S (1)) and the transmitted symbol PMF assumed inthe soft demapping by Q X c ( P S (1)) . In the matched case, Q X c ( P S (1)) was set to P X c ( P S (1)) for all P S (1) , while inthe mismatched case, Q X c ( P S (1)) was set to a ﬁxed PMF of P X c (0 . regardless of the true P S (1) . This is because underdynamically variable source situations, the true transmittedsymbol PMF P X c ( P S (1)) is hard to track in deployable systems.As shown in Fig. 5, the required SNR can be reduced bycompressed shaping, in contrast to the ﬁxed required SNRby CCDM and uniform QAM with bit scrambling due to theﬁxed P X c ( = P X c (0 . ). The SNR penalty by the mismatchbetween P X c ( P S (1)) and Q X c ( P S (1)) is not signiﬁcant exceptat very small P S (1) . Fig. 5. Simulated required SNR for compressed shaping QAM under matched(dotted lines) and mismatched decoding (solid lines). In the mismatched case,we set the transmitted PMF assumed in the demapper to the one for a uniformsource ( P S (1) = 0 . ), for every source mark ratio P S (1) . Dashed and long-dashed lines correspond to CCDM and uniform QAM, respectively. The better performance of compressed shaping comparedwith conventional PS can be converted into lower powerconsumption. We quantiﬁed the power consumption in FECdecoding, because it dominates the power consumption amongall coding functions. Fig. 6 shows the relative power consump-tion of the FEC decoding for PS-QAM with P S (1) = 0 . (circle), . (square), . (diamond), . (triangle), . (cross),or . (plus), so there are six curves for each QAM order withcompressed shaping. The power consumption is assumed tobe proportional to the average number of decoding iterations,which is almost proportional to the toggle rate in logicalcircuitry. The vertical axis in Fig. 6 is normalized by themaximum number of decoding iterations, i.e., . The softdemapping was assumed to be mismatched as in Fig. 5, i.e., Q X c ( P S (1)) = P X c (0 . . While CCDM consumes a ﬁxed powereven if P S (1) is reduced (this is the same for non-PS signalingwith bit scrambling, but not shown in Fig. 6), compressedshaping signiﬁcantly reduces the power to about forhighly nonuniform source probabilities, because of the smallerrequired SNR, which leads to a smaller number of decodingiterations than with conventional bit-scrambled PS.Note that additional complexity in compressed shaping isnot signiﬁcant when hierarchical DM is employed for shapingencoding/decoding, which will be shown in the next section.IV. FPGA IMPLEMENTATION

We implemented compressed shaping in a single FPGAchip on an evaluation board Xilinx® Virtex® Ultrascale+™VCU118 XCVU9P. Fig. 7 shows the functional block diagramof the implemented circuitry. The source generator outputssource information bits based on a given target mark ratio P S (1) . The schematic of the source generator is illustrated inFig. 8. Each source information bit is selected from one ofthree possible candidates; 0) the logic bit ‘ ’, 1) a PRBS oflength − bits, and 2) the logic bit ‘ ’. The mask signal,used for the selection, is generated from the target source markratio P S (1) , which is given by the user. If P S (1) ≤ . , themask signal takes on values or such that the average Fig. 6. Simulated relative power consumption of the FEC decoding, whichis assumed to be proportional to average number of decoding iterations, asa function of SNR for PS-QAM. Solid and black dashed lines correspond tocompressed shaping and CCDM, respectively. fraction of logic ‘ ’s in S is P S (1) , and if P S (1) > . ,the mask similarly is or . Fig. 9 shows an exempliﬁedmask signal when the target P S (1) is . . For simplicity,we classiﬁed source bits into 20 groups (32 bits per group),provided a ‘ ’ mask window for four groups, and slided thewindow in every clock cycle.The bit-ﬂipping encoder counts the numbers of ‘ ’s and ‘ ’s.When the number of ‘ ’s is larger than that of ‘ ’s, it ﬂips allbits at the clock cycle and adds a parity bit ‘ ’. Otherwiseit just adds a parity bit ‘ ’. The shaping encoding/decodingis realized by hierarchical DM having a total codewordlength of bits for the shaped two-dimensional amplitudes.The number of shaped information bits per two-dimensionalamplitude k/n for compressed shaping employing hierarchi-cal DM is generally ﬂexible, and is in this implementationﬁxed to / and / for 16-QAM and 64-QAM,respectively, with compressed shaping. Before the receiver-side processing, we have an error insertion function, whichinserts bit errors based on a given bit error rate (BER) beforethe shaping decoder. The shaping decoder is also implementedwith a hierarchical DM decoder, and the bit-ﬂipping decoderrecovers the source bits. When the parity bit is ‘ ’, it ﬂips allbits at the clock cycle and removes the parity bit. Otherwise,it just removes the parity bit.There are several monitoring functions, i.e., the PMF P A c atthe shaping encoder output, the assumed post-FEC BER, andthe system output BER. Note that hierarchical DM mainlyconsists of LUTs, which are implemented with random accessmemory (RAM). Because it is sensitive to unwanted bit-ﬂipping due to radiation-induced soft errors, we implementedsoft error protection circuitry.The clock domain was initially single and the clock fre-quency f clk for the ﬁtting (FPGA synthesis) was

90 MHz [32].Later we found a bottleneck in making the clock frequencyhigher inside the soft error protection circuitry (consisting ofﬂip-ﬂops and a selector tree for refreshing RAM contents), so A ﬂexible choice of ( n, k ) in an FPGA implementation of hierarchicalDM is left as potential future work. The LUT contents can be reconﬁguredin software or ﬁrmware without increasing the RAM size. Fig. 7. Block diagram of FPGA implementation of compressed shaping.Fig. 8. Schematic of source generation function. The mask signal selects theoutput source bit from logic ‘ ’, ‘ ’, or a bit given by the PRBS based onthe target source mark ratio P S (1) .Fig. 9. Example of mask signals for given clock cycle indices and groupindices for a target source mark ratio P S (1) = 0 . . In each clock cycleindex or each group index, of the generated bits are set directly to ‘ ’and are taken from a uniform PRBS, which implies

20 + 80 / ‘ ’s on average. we separated the clock domain into one for data processingand one for control. We then achieved f clk = 240 MHz forthe data processing for higher throughput. Assuming a suit-able FEC concatenation (not implemented here), the systemthroughputs for compressed shaping 16- and 64-QAM wouldbe and

42 Gb / s at f clk = 90 MHz , and and

113 Gb / s at f clk = 240 MHz , respectively. The number of bits per PScodeword were the same in both cases, i.e., there were half asmany PS-64-QAM symbols as PS-16-QAM symbols.Tab. II shows the utilized hardware resources at a clockfrequency of the data signals f clk = 90 or

240 MHz . Fig. 10depicts the utilized area of the FPGA chip having three dies.The used resource elements were mainly located in the left andright dies, and the center die was mainly used for connectionbetween the two dies. The register elements were mainly usedby the soft error protection circuitry for storing the RAM

TABLE IIU

TILIZATION OF KEY RESOURCES IN

FPGA

FOR COMPRESSED SHAPINGAT f clk = 90 OR

240 MHz .Category Element Available Utilization

90 MHz 240 MHz

System logic LUT as logic .

17% 24 . cell Register .

49% 21 . Memory Block RAM . .

65% 4 . Ultra RAM . Not used Not usedDSP slice

Not used Not used contents. Out of the about 290,000 utilized

LUT as logic elements, were provided to external functions such assource generator and BER/PMF monitors, were for bit-ﬂipping encoding, and the rest was for other combinationallogics. The data processing of the hierarchical DM used totally . block RAM elements. No ultra RAM or DSP slice elements were used.As a benchmark, 400 ZR FEC was implemented in Xil-inx® Virtex® Ultrascale™ FPGAs [43]. The clock frequencywas

125 MHz and the system throughput was

200 Gb / s withexternal functions, i.e., source generation, mapping, demap-ping, interleaver, de-interleaver, and noise loading. Even con-sidering that this reference includes many external functions,compressed shaping utilizes a very small amount of hardwareresources.Tab. III shows the estimated dynamic power consumptionfor compressed shaping 64-QAM at f clk = 240 MHz . Whilewe estimated a static power consumption based on a defaulttoggle rate of . in our previous report [32], we nowimproved the estimation accuracy by taking realistic nodeswitching activities into account based on register transferlevel simulation waveform (a so-called switching activityinterchange format) over clock cycles, where the sourcemark ratio P S (1) was set to . .The blocks of Tx data and Rx data in Tab. III are essentialfor data communications. Among them, the preprocessing(including bit-ﬂipping encoding) and DM encoder core in theTx data block, and the DM decoder core in the Rx datablock, consumed most of the power. The DM encoder anddecoder cores mainly consisted of block RAMs for LUTs.The bit-ﬂipping encoding counted the logic ‘ ’s using adders,leading to a relatively large power consumption. Other pre- andpostprocessing functions including the lane reorder consumedlittle power. Note that the power consumption of either theTx or the Rx data block was smaller than that of externalfunctions, e.g., source generation and BER monitoring. We hadnonnegligible power consumption in the control blocks. Theyhad soft error protection functions for the DM encoder anddecoder cores by holding copies of the entire RAM contents inregisters and refreshing the RAM intermittently. These powerscan be expected to be less in an ASIC implementation bymaking the activation ratio small.V. R EALTIME DEMONSTRATION

We made real-time evaluations of compressed shaping byemploying the FPGA evaluation board at f clk = 90 MHz .First, the transmitter-side functions were veriﬁed. Histograms Fig. 10. Utilized area (green) of a single FPGA chip for compressed shaping.TABLE IIIS

IMULATED DYNAMIC POWER CONSUMPTION OF COMPRESSED SHAPING AT f clk = 240 MHz .Block Function Power (mW)External Source generation functions Error insertion BER monitor

Clock generation Sub-total

Tx data Pre-processing

DM encoder core

Post-processing Delay adjustment Sub-total

Rx data Pre-processing DM decoder core

Post-processing Delay adjustment Sub-total

Tx control

Rx control

Tx/Rx control

Total of the two-dimensional amplitude A c for compressed shaping16- and 64-QAM were measured over two-dimensionalamplitude samples for various source mark ratios P S (1) . Theobtained histograms were interpreted as PMFs P A c and theirentropies were computed. Fig. 11 shows H ( A c ) as a functionof P S (1) . The entropy H ( A c ) is maximum for P S (1) = 0 . .The two-dimensional PS rate losses R loss ( U, A c ) are . and .

064 bpcu for the exempliﬁed PS-16-QAM and PS-64-QAMschemes, respectively. When P S (1) deviates from . , i.e., H ( S ) decreases, H ( A c ) also decreases monotonically. Becauseof the bit-ﬂipping encoding, H ( A c ) is symmetric around P S (1) = 0 . . In contrast, the CCDM-based schemes show aconstant H ( A c ) , which is independent of the source distribu-tion. Scaled binary entropy functions H ( S ) max P S (1) H ( A c ) are also depicted in Fig. 11 with black dotted lines for thetwo cases. The gap between H ( A c ) and H ( S ) max P S (1) H ( A c ) is a data compression rate loss. The data compression rateloss is ideally zero but is larger here for smaller H ( S ) , dueto the nonideal simple processing of compressed shaping.Here the bit sequence conversion loss in compressed shaping R loss ( S a , A c ) in (6) is given by the sum of rate losses in PSand data compression. Regardless of the rate loss increase indata compression, H ( A c ) itself is reduced to help reduce therequired SNR substantially.Next, the receiver-side functions were veriﬁed. We turnedthe error insertion function on to provide sparse bit errorsbetween the transmitter and the receiver. As reported in [18],sparse errors creates the worst system output BER at a given Fig. 11. Measured two-dimensional amplitude entropy H ( A c ) for compressedshaping as a function of the target source mark ratio P S (1) based on > two-dimensional amplitude samples. The dashed lines show the binary entropyfunction scaled to the maximum entropies of the two-dimensional amplitudesfor comparison.Fig. 12. Measured system output BER for compressed shaping as a functionof assumed post-FEC BER in back-to-back error insertion tests. post-FEC BER. We swept the assumed post-FEC BER from − to − and examined P S (1) of . , . , and . forcompressed shaping 16- and 64-QAM. After the receiver-sideprocessing, the system output BER was measured. Fig. 12shows the system output BER as a function of assumedpost-FEC BER in the back-to-back error insertion test. TheBER increase due to compressed shaping decoding was onlyaround times because hierarchical DM can partially decodecorrectly even if there are incoming bit errors. As predictedin [18], the BER increase factor is signiﬁcantly smaller thanfor other DM techniques. For example, CCDM decodinghaving a PS codeword length of around bits resultsin more than times higher BER, though we could notimplement CCDM in the FPGA due to its high complexity.We performed hours of long-term measurement only in thecase of P S (1) = 0 . , due to time constraints. The number ofobserved bit errors after compressed shaping decoding weremore than . The system output BER was . · − at apost-FEC BER of . · − . This proves that the proposedcompressed shaping does not cause an error ﬂoor or excessiveerror increase, so the required post-FEC BER remains around − to satisfy a required system output BER of − .VI. S UMMARY

We proposed and demonstrated compressed shaping, i.e.,the application of hierarchical DM to simultaneous source data compression and probabilistic shaping, which is anexample of joint source–channel coding. Under a reducedsource entropy, compressed shaping reduces channel inputsymbol entropy, symbol energy, and required SNR. Simulationresults showed its smaller required SNR, as well as reducedpower consumption compared with CCDM. We implementedcompressed shaping, which are mainly hierarchical DM encod-ing/decoding, into a single FPGA and estimated the dynamicpower consumption based on simulated waveforms. The sys-tem throughput reached and

113 Gb / s for compressedshaping 16- and 64-QAM, respectively. Real-time evaluationresults showed expected performance in both encoding anddecoding. Compressed shaping works at a very small BERof around − without any error ﬂoor, and its decodingincreases the BER only around times, which is smallcompared to other DM schemes.A CKNOWLEDGMENT

We thank Kyo Inoue of Osaka University for assistance inthe research. R

EFERENCES[1] K. Roberts, M. O’Sullivan, K.-T. Wu, H. Sun, A. Awadalla, D. J. Krause,and C. Laperle, “Performance of dual-polarization QPSK for opticaltransport systems,”

J. Lightw. Technol.

IEEE Trans.Commun. , vol. 40, no. 3, pp. 873–884, May 1992.[5] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula-tion,”

IEEE Trans. Inf. Theory , vol. 44, no. 3, pp. 927–946, May 1998.[6] A. Guill´en i F`abregas, A. Martinez, and G. Caire, “Bit-interleaved codedmodulation,”

Found. Trends Commun. Inf. Theory , vol. 5, nos. 1/2, pp. 1–153, 2008.[7] L. Szczecinski and A. Alvarado,

Bit-Interleaved Coded Modulation:Fundamentals, Analysis, and Design.

New York, NY, USA: Wiley, 2015.[8] G. D. Forney, Jr. and L.-F. Wei, “Multidimensional constellations—PartI: introduction, ﬁgure of merit, and generalized cross constellation,”

IEEE J. Selected Areas Commun. , vol. 7, no. 6, pp. 877–892, Aug. 1989.[9] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling on theGaussian channel,”

IEEE Trans. Inf. Theory , vol. 36, no. 4, pp. 726–740,July 1990.[10] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling forGaussian channels,”

IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 913–929, May 1993.[11] G. B¨ocherer, F. Steiner, and P. Schulte, “Bandwidth efﬁcient andrate-matched low-density parity-check coded modulation,”

IEEE Trans.Commun. , vol. 63, no. 12, pp. 4651–4665, Dec. 2015.[12] F. Buchali, F. Steiner, G. B¨ocherer, L. Schmalen, P. Schulte, and W. Idler,“Rate adaptation and reach increase by probabilistically shaped 64-QAM: an experimental demonstration,”

J. Lightw. Technol. , vol. 34,no. 7, pp. 1599–1609, Apr. 2016.[13] G. B¨ocherer, P. Schulte, and F. Steiner, “Probabilistic shaping and for-ward error correction for ﬁber-optic communication systems,”

IEEE/OSAJ. Lightw. Technol. , vol. 37, no. 2, pp. 230–244, Jan. 2019.[14] J. Cho and P. J. Winzer, “Probabilistic constellation shaping for opticalﬁber communications,”

IEEE/OSA J. Lightw. Technol. , vol. 37, no. 6,pp. 1590–1607, Mar. 2019.[15] P. Schulte and G. B¨ocherer, “Constant composition distribution match-ing,”

IEEE Trans. Inf. Theory , vol. 62, no. 1, pp. 430–434, Jan. 2016.[16] Y. C. G¨ultekin, F. M. J. Willems, W. J. van Houtum, S. S¸erbetli,“Approximate enumerative sphere shaping,” in

Proc. IEEE Int. Symp.Inf. Theory , Vail, CO, USA, Jun. 2018, pp. 676–680.[17] T. Fehenberger, D. S. Millar, T. Koike-Akino, K. Kojima, and K. Par-sons, “Multiset-partition distribution matching,”

IEEE Trans. Commun. ,vol. 67, no. 3, pp. 1885–1893, Mar. 2019. [18] T. Yoshida, M. Karlsson, and E. Agrell, “Hierarchical distributionmatching for probabilistically shaped coded modulation,”

J. Lightw.Technol. , vol. 37, no. 6, pp. 1579–1589, Mar. 2019.[19] P. Schulte and F. Steiner, “Divergence-optimal ﬁxed-to-ﬁxed lengthdistribution matching with shell mapping,”

IEEE Wireless Commun.Lett. , vol. 8, no. 2, pp. 620–623, Apr. 2019.[20] J. Cho, “Preﬁx-free code distribution matching for probabilistic constel-lation shaping,”

IEEE Trans. Commun. , vol. 68, no. 2, pp. 670–682, Feb.2020.[21] W. G. Bliss, “Circuitry for performing error correction calculations onbaseband encoded data to eliminate error propagation,”

IBM Techn.Discl. Bul. , vol. 23, pp. 4633–4634, 1981.[22] J. L. Fan and J. M. Ciofﬁ, “Constrained coding techniques for softiterative decoders,” in

Proc. Global Telecommunications Conference(GLOBECOM) , Rio de Janeiro, Brazil, Dec. 1999, vol. 1(B), pp. 723–727.[23] I. B. Djordjevic and B. V. Vasic, “Constrained coding techniques forthe suppression of intrachannel nonlinear effects in high-speed opticaltransmission,”

J. Lightw. Technol. , vol. 24, no. 1, pp. 411–419, Jan. 2006.[24] C. E. Shannon, “A matthematical theory of communication,”

The BellSystem Technical Journal

IEEE Trans. Commun. , vol. 39,no. 6, pp. 838–846, June 1991.[27] J. Kliewer and R. Thobaben, “Parallel concatenated joint source–channelcoding,”

IEE Electron. Lett. , vol. 39, no. 23, pp. 1664–1665, November2003.[28] F. Vacondio, O. Bertran-Pardo, Y. Pointurier, J. Fickers, A. Ghazisaeidi,G. de Valicourt, J.-C. Antona, P. Chanclou, and S. Bigo, “FlexibleTDMA access optical networks enabled by burst-mode software deﬁnedcoherent transponders,” in

Proc. Eur. Conf. Opt. Comm. (ECOC) ,London, UK, Sep. 2013, Paper We.1.F.2.[29] S. Civelli and M. Secondini, “Hierarchical distribution matching: aversatile tool for probabilistic shaping,” in

Proc. Opt. Fib. Commun.Conf. (OFC) , San Diego, CA, USA, Mar. 2020, Paper Th1G.4.[30] S. Civelli and M. Secondini, “Hierarchical distribution matching forprobabilistic amplitude shaping,”

Entropy , vol. 22, no. 9, pp. 958–984,Aug. 2020.[31] T. Yoshida, M. Karlsson, and E. Agrell, “Joint source-channel codingvia compressed distribution matching in ﬁber-optic communications,” in

Proc. Opt. Fib. Commun. Conf. (OFC) , San Diego, CA, USA, Mar. 2019,Paper M4B.6.[32] T. Yoshida, M. Binkai, S. Koshikawa, S. Chikamori, K. Matsuda,N. Suzuki, M. Karlsson, and E. Agrell, “FPGA implementation of dis-tribution matching and dematching,” in

Proc. Eur. Conf. Opt. Commun.(ECOC) , Dublin, Ireland, Sep. 2019, Paper M.2.D.2.[33] T. Yoshida and K. Igarashi, “Probabilistic constellation shaping andquasi data compression in ﬁber-optic communications,”

The Institute ofElectronics, Information and Communication Engineers (IEICE) Trans.Commun. , vol. J103-B, no. 9, pp. 361–371, Sep. 2020.[34] Q. Yu, S. Corteselli, J. Cho, “FPGA implementation of preﬁx-free codedistribution matching for probabilistic constellation shaping,” in

Proc.Opt. Fib. Commun. Conf. (OFC) , San Diego, CA, USA, Mar. 2020,Paper Th1G.7.[35] Q. Yu, S. Corteselli, J. Cho, “FPGA implementation of rate-adaptable preﬁx-free code distribution matching for probabilis-tic constellation shaping,”

IEEE/OSA J. Lightw. Technol. , DOI:10.1109/JLT.2020.3035039, Nov. 2020.[36] D. A. Huffman, “A method for the construction of minimum-redundancycodes,” in

Proc. the I.R.E. , Sep. 1952, pp. 1098–1102.[37] S. Zhang, Z. Qu, F. Yaman, E. Mateo, T. Inoue, K. Nakamura, Y. Inada,and I. B. Djordjevic , “Flex-rate transmission using hybrid probabilisticand geometric shaped 32QAM,” in

Proc. Opt. Fib. Commun. Conf.(OFC) , San Diego, CA, USA, March 2018, Paper M1G.3.[38] T. Yoshida, E. Agrell, and M. Karlsson, “Hierarchical distribution match-ing with massively parallel interfaces for ﬁber-optic communications,” in

Proc. International Zurich Seminar on Information and Communication(IZS) , Zurich, Switzerland, Feb. 2020, pp. 16–20.[39] J. Cho, S. L. I. Olsson, S. Chandrasekhar, P. Winzer, “Information rate ofprobabilistically shaped QAM with non-ideal forward error correction,”in

Proc. Eur. Conf. on Opt. Comm. (ECOC) , Roma, Italy, Sep. 2018,p. Th.1.H.5. [40] L. Schmalen, A. Alvarado, R. Rios-M¨uller, “Performance prediction ofnonbinary forward error correction in optical transmission experiments,” J. Lightw. Technol. , vol. 35, no. 4, pp. 1015–1027, Feb. 2017.[41] S. ten Brink and R. Mahadevappa, “Implementation aspects of high-speed wireless LAN systems,”

Conference Record of the Thirty-EighthAsilomar Conference on Signals, Systems and Computers