Data compression with low distortion and finite blocklength
aa r X i v : . [ c s . I T ] F e b Data compression with low distortionand finite blocklength
Victoria Kostina,
Member, IEEE
Abstract —This paper considers lossy source coding of n -dimensional memoryless sources and shows an explicit approxi-mation to the minimum source coding rate required to sustainthe probability of exceeding distortion d no greater than ǫ ,which is simpler than known dispersion-based approximations.Our approach takes inspiration in the celebrated classicalresult stating that the Shannon lower bound to rate-distortionfunction becomes tight in the limit d → . We formulate anabstract version of the Shannon lower bound that recoversboth the classical Shannon lower bound and the rate-distortionfunction itself as special cases. Likewise, we show that anonasymptotic version of the abstract Shannon lower boundrecovers all previously known nonasymptotic converses.A necessary and sufficient condition for the Shannon lowerbound to be attained exactly is presented. It is demonstratedthat whenever that condition is met, the rate-dispersion func-tion is given simply by the varentropy of the source. Re-markably, all finite alphabet sources with balanced distortionmeasures satisfy that condition in the range of low distortions.Most continuous sources violate that condition. Still, weshow that lattice quantizers closely approach the nonasymp-totic Shannon lower bound, provided that the source density issmooth enough and the distortion is low. This implies that finemultidimensional lattice coverings are nearly optimal in therate-distortion sense even at finite n . The achievability prooftechnique is based on a new bound on the output entropyof lattice quantizers in terms of the differential entropy ofthe source, the lattice cell size and a smoothness parameterof the source density. The technique avoids both the usualrandom coding argument and the simplifying assumption ofthe presence of a dither signal. Index Terms —Lossy source coding, lattice coding, rate-distortion function, Shannon’s lower bound, low distortion,high resolution, finite blocklength regime, dispersion, rate-dispersion function, nonasymptotic analysis.
I. I
NTRODUCTION
We showed in [3] that for the compression of a stationarymemoryless source under a single-letter distortion mea-sure, the minimum achievable source coding rate R ( n, d, ǫ ) comparable with blocklength n and the probability ǫ ofexceeding distortion d is given by R ( n, d, ǫ ) = R ( d ) + r V ( d ) n Q − ( ǫ ) + O (cid:18) log nn (cid:19) , (1)where Q is the complementary Gaussian cdf, R ( d ) is therate-distortion function, and V ( d ) is the rate-dispersion func- V. Kostina is with California Institute of Technology (e-mail:[email protected]). This research is supported in part by the NationalScience Foundation (NSF) under Grant CCF-1566567, and by the SimonsInstitute for the Theory of Computing. Parts of this paper were presentedat the Allerton conference [1], [2]. tion of the source. The rate-dispersion function quantifiesthe overhead over the rate-distortion function incurred bythe finite blocklength constraint. Dropping the O (cid:16) log nn (cid:17) remainder term in (1), we obtain a simple approximationto the minimum achievable coding rate. That approximationprovides good accuracy even at short blocklengths, as evi-denced by the numerical results in [3].In this contribution, we derive a simplification of (1) inthe regime of low d , which corresponds to the practicallyrelevant regime of high resolution data compression. Theinterest in pursuing such a simplification stems from thefact that closed-form formulas for R ( d ) and V ( d ) are rarelyavailable. Indeed, as shown in [3], both the rate-dispersionand the rate-distortion function are described parametricallyin terms of the solution to the rate-distortion convex mini-mization problem, defined for a source distribution P X anda distortion measure d : X × Y 7→ R + as R X ( d ) , inf P Y | X : X 7→Y E [ d ( X,Y )] ≤ d I ( X ; Y ) . (2)The rate-distortion and the rate-dispersion function are givenby the mean and the variance of the d -tilted information, therandom variable that is defined for a distribution P X and adistortion measure d as X ( x, d ) , log 1 E [exp { λ ⋆ d − λ ⋆ d ( x, Y ⋆ ) } ] , (3)where λ ⋆ = − R ′ X ( d ) is the negative of the slope ofthe rate-distortion function of X at distortion d , and theexpectation is with respect to the unconditional distributionof Y ⋆ , the random variable that attains the minimum in(2), i.e. R X ( d ) = I ( X ; Y ⋆ ) . Although the convexity ofthe problem in (2) often allows for an efficient numericalcomputation of its optimum [4], closed-form expressions areavailable only in special cases. In those cases, the distortionmeasure is carefully tailored to match the source.The absence of an explicit expression for the d -tiltedinformation motivates a closer look into the behavior of(3). This paper shows that under regularity conditions andas long as d is small enough, the d -tilted information in arandom variable X ∈ R n is closely approximated by, withhigh probability, X ( X, d ) ≈ log 1 f X ( X ) − φ ( d ) , (4) We assume for now that such a random variable exists. here f X is the source density, and φ ( d ) is a term thatdepends only on the distortion measure and distortionthreshold d . For example, for the mean-square error (MSE)distortion, φ ( d ) = n log √ πed. (5)If the source alphabet is finite and all columns of thedistortion matrix { d ( x, y ) } x,y consist of the same set ofentries (balanced distortion measure) an even stronger claimholds, namely, X ( X, d ) = log 1 P X ( X ) − φ ( d ) a.s. , (6)as long as d ≤ d c , where d c > is a function of P X andthe distortion measure only.The value of X ( x, d ) can be loosely interpreted as theamount of information that needs to be stored about x inorder to restore it with distortion d [3]. The explicit natureof (4) illuminates the tension between the likelihood of x and the target distortion: the likelier realization x is, thefewer bits are required to store it; the lower tolerable d is,the more bits are required in order to represent the sourcewith that distortion. This intuitively pleasing insight is notafforded by the general formula (3).To gain further understanding of the form of (4), recallthat the Shannon lower bound [5] states that the rate-distortion function is bounded below by the differencebetween the differential entropy of the source and a termthat depends only on the distortion measure and distortionthreshold d : R X ( d ) ≥ R X ( d ) = h ( X ) − φ ( d ) , (7)where h ( X ) is the differential entropy of the source. Dueto its simplicity and because it becomes increasingly tightin the limit of low distortion [6], [7], the Shannon lowerbound is often used as a convenient proxy for R X ( d ) . Thestatement in (4) can be viewed as a nonasymptotic refine-ment of those results. More precisely, this paper proposesa nonasymptotic version of the Shannon lower bound, validat any d , and demonstrates that at low d , the bound canbe approached by a lattice quantizer followed by a losslesscoder. A careful analysis of those bounds reveals that fora class of difference distortion measures and stationarymemoryless sources with sufficiently smooth densities, as d → and n → ∞ , the nonasymptotically achievable sourcecoding rate admits the following expansion: R ( n, d, ǫ ) = R ( d ) + r V n Q − ( ǫ ) + O (cid:16) √ d (cid:17) + O (cid:18) log nn (cid:19) , (8)where R ( d ) = R X ( d ) is Shannon’s lower bound for P X , thesingle-letter distribution of the source, and V , Var [log f X ( X )] . (9) φ ( d ) is strictly increasing in d . Thus, similar to (1), R ( d ) and V are given by the meanand the variance of the random variable on the right side of(4) (particularized to P X ). The term O (cid:16) √ d (cid:17) in (8), whichis always nonnegative, is the penalty due to the densityof the source not being completely flat within each latticequantization cell. Naturally, this term vanishes as the sizesof quantization cells decrease, and its magnitude dependson smoothness of the source density.Since (8) is attained by lattice quantization, lattice quan-tizers are nearly optimal at high resolution even at finiteblocklength. The implication for engineering practice is that,in a search for good codes, it is unnecessary to considermore complex structures than lattices if the goal is highresolution digital representation of the original analog signal.Due to the regularity of the code vector locations, latticequantizers offer a great reduction in the complexity ofencoding algorithms (e.g. [8], [9]). Therefore, both their per-formance and their regular algebraic structure make latticesa particularly appealing choice for an efficient analog-to-digital conversion.This paper also develops new results on the rate-distortionperformance of lattice quantization of continuous sourceswith memory. We prove that for a class of sources satisfyinga smoothness condition, variable-length lattice quantizationattains Shannon’s lower bound in the limit of n → ∞ and d → , even if the source is nonergodic or nonstationary.Furthermore, if the source density is log-concave, we showthat Shannon’s lower bound is attained at a speed O (cid:16) √ n (cid:17) with increasing blocklength, providing the first result of thissort for lossy compression of sources with memory.The key to our study of lattice quantization is an explicitnonasymptotic bound on the probability distribution (and,in particular, the entropy) observed at the output of a latticequantizer for X . The bound is a function of the lattice cellsize and a smoothness parameter of the source density. Thebound provides an estimate of the speed of convergence inthe classical results by R´enyi [10, Theorem 4] and Csisz´ar[11].Another essential ingredient of our development is anew, abstract formulation of the Shannon lower bound thatencompasses the classical Shannon lower bound as a specialcase and that does not impose any symmetry conditions onthe distortion measure. An appropriate choice of an auxiliarymeasure in the abstract Shannon lower bound makes it equalto the rate-distortion function. Likewise, a nonasymptoticversion of the abstract Shannon lower bound recovers allpreviously known nonasymptotic converses.We state necessary and sufficient conditions for theabstract Shannon lower bound to hold with equality. Inparticular, those conditions allow us to establish the validityof (6) for finite alphabet sources with balanced distortion.Inserting (6) into (1), we conclude that for discrete memo-ryless sources with balanced distortion, for all d ≤ d c , the2onasymptotic fundamental limit is given simply by R ( n, d, ǫ ) = R ( d ) + r V n Q − ( ǫ ) + O (cid:18) log nn (cid:19) , (10)which is a sharpened version of (8) without the O (cid:16) √ d (cid:17) term. More generally, (10) holds for any stationary mem-oryless source whose rate-distortion function meets theShannon lower bound with equality.Notable prior contributions to the understanding of latticequantizers in large dimension include the works by Rogers[12], Gersho [13], Zamir and Feder [14] and Linder andZeger [15]. Rogers [12] showed the existence of efficientlattice coverings of space. Using a heuristic approach, Ger-sho [13] studied tessellating vector quantizers, i.e. quan-tizers whose regions are congruent with some tessellatingconvex polytope P . Although every lattice quantizer isa tessellating quantizer, the converse is not true. Gersho[13] showed heuristically that in the limit of low distor-tion, tessellating vector quantizers approach n -dimensionalShannon’s lower bound to within a term of order O (cid:16) log nn (cid:17) .Relying on a conjecture by Gersho, Linder and Zeger [15]streamlined the proof of Gersho’s result and reported thatthe minimum entropy among all n -dimensional tessellatingvector quantizers approaches the n -letter Shannon’s lowerbound in the limit of low d , provided that the Gershoconjecture is true. Zamir and Feder [14] considered thesetting in which a signal called a dither is added to an inputsignal prior to quantization, namely, dithered quantization,and showed an upper bound on the achievable conditional(on the dither) output entropy of dithered lattice quantizers.Their result relied on a rather restrictive assumption on thesource density violated even by the Gaussian distribution.That assumption was later relaxed by Linder and Zeger [15].Zamir and Feder [16] went on to study the properties ofa vector uniformly distributed over a lattice quantizationcell, and showed that the normalized second moment ofthe optimal lattice quantizer approaches that of a ball, πe ,as the dimension increases. While the assumption of theavailability of the dither signal both at the encoder andthe decoder greatly simplifies the analysis and improvesthe performance somewhat by smoothening the underlyingdensities, it can also substantially complicate the engineer-ing implementation. This paper does not consider ditheredquantization.Historically, theoretical performance analysis of lossycompressors proceeded in two disparate directions: boundsderived from Shannon theory [17], and bounds derivedfrom high resolution approximations [18], [19]. The formerprovides asymptotic results as the sources are coded usinglonger and longer blocks. The latter assumes fixed inputblock size but estimates the performance as the encodingrate becomes arbitrarily large. This paper fuses the two A polytope P is called tessellating if there exists a partition of R n consisting of translated and rotated copies of P . approaches to study the performance of high resolutionblock compressors from the Shannon theory viewpoint.So as not to clutter notation, in those statements in whichthe Cartesian structure of the space is unimportant, wewill denote random vectors simply by X , Y , etc., omittingthe dimension n . Wherever necessary, we will make thedimension explicit, writing X n , Y n in lieu of X , Y . Fora stationary memoryless process X , X , . . . , we denoteby X the random variable that is distributed the same as X i , i = 1 , , . . . . All logarithms are arbitrary common base.The Euclidean norm is denoted by k · k and the L p -norm isdenoted by k · k p .The rest of the paper is organized as follows. Section IIpresents the abstract Shannon lower bound together with itsnonasymptotic counterpart, shows a necessary and sufficientcondition for the abstract Shannon lower bound to beattained with equality and demonstrates (6). Section IIIpresents a nonasymptotic (in both n and d ) analysis of latticequantization and shows an upper bound on the output en-tropy of lattice quantizers. Section IV presents an asymptoticanalysis of the bounds in Section II and Section III. Fora class of sources with memory, Section IV-A shows thatlattice quantization attains the Shannon lower bound in thelimit of large n and small d . For stationary memorylesssources, Section IV-B presents a refined asymptotic anal-ysis that quantifies how fast that limit is approached andestablishes (8). For clarity of presentation, the exposition inSection III and Section IV is restricted to the MSE distor-tion. The generalization to non-MSE distortion measures ispostponed until Section V.II. T HE ABSTRACT S HANNON LOWER BOUND AND ITSNONASYMPTOTIC COUNTERPART
A. The abstract Shannon lower bound
The (informational) rate-distortion function in (2) admitsthe following parametric representation.
Theorem 1 (Parametric representation of R X ( d ) , [20]) . Suppose that the following basic assumptions are satisfied.(a) R X ( d ) is finite for some d , i.e. d min < ∞ , where d min , inf { d : R X ( d ) < ∞} . (11) (b) The distortion measure is such that there exists a finiteset E ⊂ Y such that E (cid:20) min y ∈ E d ( X, y ) (cid:21) < ∞ . (12) For each d > d min , it holds that R X ( d ) = max g ( x ) , λ {− E [log g ( X )] − λd } , (13) where the maximization is over g ( x ) ≥ and λ ≥ satisfying the constraint E (cid:20) exp ( − λ d ( X, y )) g ( X ) (cid:21) ≤ ∀ y ∈ Y . (14)3 emark . The maximization over g ( x ) ≥ in (13) can berestricted to only ≤ g ( x ) ≤ [20]. Equality in (14) holdsfor P Y ⋆ -a.s. y . Remark . The d -tilted information (defined in (3), [3]) canbe alternatively defined as X ( x, d ) = − log g ( x ) − λ ⋆ d, (15)where the pair ( g ( · ) , λ ) attains the maximum in (13). So, R X ( d ) = E [ X ( X, d )] . (16)Furthermore, if the infimum in (2) is attained by some Y ⋆ ,i.e. R X ( d ) = I ( X ; Y ⋆ ) , then g ( x ) = E [exp ( − λ ⋆ d ( x, Y ⋆ ))] (17)leads to the definition in (3).For finite alphabet sources, a parametric representation of R X ( d ) is contained in Shannon’s paper [5]; both Gallager’s[21, Theorem 9.4.1] and Berger’s [22] texts contain para-metric representations of R X ( d ) for discrete and continuoussources. However, it was Csisz´ar [20] who gave rigorousproofs of (13) in the following much more general setting: X belongs to a general abstract probability space, and theexistence of the conditional distribution P Y ⋆ | X attaining R X ( d ) is not required.Here, we leverage the result of Csisz´ar to state a general-ization of the Shannon lower bound to abstract probabilityspaces.Each choice of λ ≥ and g satisfying (14) gives riseto a lower bound to R X ( d ) . The Shannon lower boundcorresponds to a particular choice of ( λ, g ) .Let µ be a measure on X such that the distribution of X isabsolutely continuous with respect to µ . Denote the densityof the distribution of X with respect to µ (Radon-Nikodymderivative) by f X k µ ( x ) , dP X dµ ( x ) , (18)and the corresponding log-likelihood ratio by ı µ ( x ) , log dµdP X ( x ) . (19)The differential entropy with respect to µ can be definedas h µ ( X ) , E [ ı µ ( X )] (20) = − D ( f X k µ ) . (21)If X is a continuous random variable, a natural choicefor µ is the Lebesgue measure. Then, the density in (18)is known as the probability density function, and h µ ( X ) issimply h ( X ) , the differential entropy of X . If X is a discreterandom variable, a natural choice for µ is the countingmeasure. Then, the density in (18) is the probability massfunction, and h µ ( x ) is equal to H ( X ) , the entropy of X .It is easy to verify that the choice of λ and g inTable I satisfies (14). The Shannon lower bound can now Σ , sup y ∈Y Z exp( − λ d ( x, y )) dµ ( x )= Z exp( − λ d ( x, y λ )) dµ ( x ) dP X | Y ⋆ = y dµ ( x ) , exp( − λ d ( x, y )) R exp( − λ d ( x, y )) dµ ( x ) φ µ ( d ) , log Σ + λdg ( x ) = f X k µ ( x )Σ λ > : arbitrary TABLE I:
The choice of ( g ( x ) , λ ) in (13) that leads to the abstractShannon’s lower bound in Theorem 2. be generalized to abstract spaces and arbitrary distortionmeasures as follows. Theorem 2 (the abstract Shannon lower bound) . Fix ameasure µ such that the distribution of X is absolutelycontinuous with respect to µ . For all d > d min , R X ( d ) ≥ h µ ( X ) − φ µ ( d ) . (22)Theorem 2 provides a family of lower bounds parame-terized by the choice of base measure µ . In the classicalShannon lower bound, µ is a Lebesgue measure (or a count-ing measure, if the alphabet is discrete) and the distortionmeasure satisfies a symmetry condition, so that the integralin the definition of Σ in Table I does not depend on thechoice of y . Shannon’s original derivation [17] applied tocontinuous sources under the mean-square error distortion,and it did not use a parametric representation of R X ( d ) . Adecade later, Pinkston [23] derived a version of the bound forfinite alphabet sources with a distortion measure such that allthe columns of the per-letter distortion matrix { d ( x, y ) } x,y consist of the same set of entries. A generalization of thediscrete Shannon lower bound to distortion measures notsatisfying any symmetry conditions was put forth by Gray[24]. The bound in Theorem 2 is more general than theseresults and recovers them as special cases.The right-side of (22) can be made equal to R X ( d ) bychoosing µ to satisfy: ı µ ( x ) = X ( x, d ) . (23)To verify that the choice of µ in (23) results in equality in(22), observe using (20) that h µ ( X ) = E [ X ( X, d )] , (24)4nd that φ µ ( d ) = log Σ + λd (25) = sup y ∈Y log E [exp ( − λ d ( X, y ) + X ( X, d ))] + λd (26) = 0 , (27)where to obtain (27) we used (14), (15) and Remark 1.The long-standing appeal of the Shannon lower boundis that one can obtain a tight bound on the rate-distortionfunction even without the knowledge of the distribution thatattains it, as (23) demands. For an illustration of such acalculation, suppose that X is a set endowed with a groupoperation, “+ ′′ , satisfying the group axioms. Then, it makessense to talk about x + y and x − y = x + ( − y ) , where − y is the inverse of y (according to the group operation).Distortion measures of the form d ( x, y ) = d ( x − y ) (28)are called difference distortion measures. If X = R n and d is a difference distortion measure, then letting µ be theLebesgue measure, we obtain Σ = Z exp( − λ d ( x − y )) dx, (29)regardless of the choice of y . So, we may set y = 0 , andobtain the classical Shannon lower bound: R X ( d ) ≥ R X ( d ) = h ( X ) − φ ( d ) (30)- see Table II for the definition of φ ( d ) . In the same fashion,if X is a discrete group, letting µ be the counting measureon X , we notice that Σ = X x ∈X exp( − λ d ( x − y )) , (31)for all y ∈ X . Therefore, we may let y = 0 (the identityelement of group X ) and obtain Pinkston’s variant of theShannon lower bound [23]. See Table II.Throughout the paper, we denote by R ( d ) = E (cid:2) X ( X, d ) (cid:3) (32)the classical Shannon’s lower bound, obtained with thechoice of the auxiliary measure µ in Table II. The randomvariable X ( X, d ) is also defined in Table II.The calculation in Table II tacitly assumes that there existssolution λ > to E [ Z λ ] = d. (33)An important question is under what conditions this solutionexists. For continuous X , Linkov [6, Lemma 1] showed that(33) has a unique solution for all sufficiently small d , as longas d ( · ) satisfies the following mild regularity conditions:(34) d ( r ) = 0 only at r = 0 , and d ( r ) is nondecreasing.(35) There exists such ν > that lim r → r − ν d ( r ) < ∞ .(36) R R + d ( r ) exp( − d ( r )) dr < ∞ . X is discrete group, X ∈ X X ∈ R n is continuous Σ = P z ∈X exp( − λ d ( z )) Σ = R R n exp( − λ d ( z )) dzP Z λ ( z ) = exp ( − λ d ( z )) P z ∈X exp ( − λ d ( z )) f Z λ ( z ) = exp( − λ d ( z )) R R n exp( − λ d ( z )) dzφ ( d ) = H ( Z λ )= log X z ∈X exp ( λd − λ d ( z )) φ ( d ) = h ( Z λ )= log Z R n exp ( λd − λ d ( z )) dzg ( x ) = P X ( x ) exp ( φ ( d ) − λd ) g ( x ) = f X ( x ) exp ( φ ( d ) − λd ) X ( x, d ) , log 1 P X ( x ) − φ ( d ) X ( x, d ) , log 1 f X ( x ) − φ ( d ) λ > : solution to equation E [ d ( Z λ )] = d TABLE II:
In the case where d is difference distortion measure,the classical Shannon lower bound is obtained by letting the basemeasure µ in Table I to be the counting measure if X is a discretegroup, and the Lebesgue measure if X = R n . For discrete sources, we show that if d (0) = 0 , d ( z ) > , z = 0 . (37)and |X | = m , then (33) has a solution for all d ∈ (0 , E [ d ( Z )]] . Indeed, observe using (37) that E [ d ( Z )] = 1 m X z ∈X d ( z ) , (38) lim λ → + ∞ E [ d ( Z λ )] = 0 . (39)Since E [ d ( Z λ )] is continuous as a function of λ on [0 , + ∞ ) ,it follows that (33) has a solution for all d ∈ (0 , E [ d ( Z )]] .We proceed to list several examples of the calculation ofthe Shannon lower bound for difference distortion measures. Example.
In the special case of X ∈ R n and mean-squareerror distortion, we recover the classical Shannon lowerbound [17] as follows. Let d be the mean-square errordistortion: d ( x, y ) = 1 n k x − y k . (40)A straightforward calculation using Table II reveals that, λ = n d log e, (41) φ ( d ) = n log √ πed, (42)so if X is a continuous real-valued random vector of length n , R X ( d ) = h ( X ) − n log √ πed. (43) Example.
For weighted mean-square error distortion mea-sure, d ( x, y ) = 1 n k W ( x − y ) k , (44)where W is an invertible n × n matrix, Shannon’s lowerbound is given by R X ( d ) = h ( X ) − n log √ πed + log | det W | . (45) Example.
Let d be the scaled L p norm distortion: d ( x, y ) = n − sp k x − y k sp , (46)5here s > . A direct calculation using Table II shows thatShannon’s lower bound is given by R X ( d ) = h ( X ) + ns log 1 d − np log n − log b n,p + ns log nse − log Γ (cid:16) ns + 1 (cid:17) , (47)where b n,p is the volume of a unit L p ball: b n,p , (cid:16) (cid:16) p + 1 (cid:17)(cid:17) n Γ (cid:16) np + 1 (cid:17) . (48) Example.
Assume that the alphabet is finite, |X | = m , andconsider the symbol error distortion d ( z ) = 1 { z = 0 } . (49)Then, R X ( d ) = H ( X ) − h ( d ) − d log( m − . (50) B. The nonasymptotic abstract Shannon lower bound
As it turns out, the abstract Shannon lower bound inTheorem 2 has a nonasymptotic kin expressed in terms ofthe Neyman-Pearson function.The optimal performance achievable among all random-ized tests P W | X : A → { , } between measures P and Q on A is denoted by ( indicates that the test chooses P ): β α ( P, Q ) = min P W | X : P [ W =1] ≥ α Q [ W = 1] . (51)Note that the Neyman-Pearson function β α ( P, Q ) is welldefined even if P and Q are not probability measures.An ( M, d, ǫ ) fixed-length lossy compression code is a pairof mappings f : X 7→ { , . . . , M } and g : X 7→ { , . . . , M } ,such that P [ d ( X, g ( f ( X ))) > d ] ≤ ǫ. (52) Theorem 3.
Let P X be the source distribution defined onthe alphabet X . Any ( M, d, ǫ ) code must satisfy, for anymeasure µ on X :a) M ≥ β − ǫ ( P X , µ )sup y ∈Y µ [ d ( X, y ) ≤ d ] . (53) b) ǫ ≥ sup γ> { P [ ı µ ( X ) − φ µ ( d ) ≥ log M + γ ] − exp( − γ ) } , (54) Proof.
The inequality in (53) is due to [3, Theorem 8]. Toshow (54), note that for all ζ > (e.g. [25]), ζβ − ǫ ( P X , µ ) ≥ P [ ı µ ( X ) ≥ − log ζ ] − ǫ. (55) On the other hand, by Markov’s inequality, the µ -volume ofthe distortion d -ball is linked to φ µ ( d ) as follows. µ [ d ( X, y ) ≤ d ] = Z dµ ( x )1 { d ( x, y ) ≤ d } (56) ≤ Z dµ ( x ) exp( λd − λ d ( x, y )) (57) ≤ sup y ∈Y Z dµ ( x ) exp( λd − λ d ( x, y )) (58) = exp( φ µ ( d )) . (59)Applying (55) and (59) to (53), we conclude that for all ζ > , ǫ ≥ P [ ı µ ( X ) ≥ − log ζ ] − ζM exp( φ µ ( d )) . (60)Re-parameterizing (60) through ζ = 1 M exp( φ µ ( d ) + γ ) , (61)we obtain (54).As clear from the proof of Theorem 3, the bound in (54)is a weakening of the bound in (53).Note the striking parallels between Theorem 3 and theabstract Shannon lower bound in Theorem 2. Both boundsrequire a choice of the base measure µ . The optimal binaryhypothesis test in (53) is a function of the log-likelihoodratio ı µ ( X ) only, whose expectation is equal to h µ ( X ) , thefirst term in (22). Furthermore, the denominator in the rightside of (53) is linked to φ µ ( d ) , the second term in (22),through (59).The similarities between between Theorem 2 and Theo-rem 3 become even more apparent if we look at the boundin (54). In a typical usage of (54), γ is chosen so thatits contribution to the both terms in (54) is negligible, andthus the excess-distortion probability is bounded through thedistribution of ı µ ( X ) − φ µ ( d ) as ǫ ' P [ ı µ ( X ) − φ µ ( d ) ≥ log M ] . (62)As discussed above, the abstract Shannon lower boundin Theorem 2 attains its largest value, the rate-distortionfunction, with the choice of µ in (23). The same choice of µ in (53) and (54) leads to M ≥ β − ǫ ( P X , P X exp( X ( X, d )))sup y ∈Y E [exp( X ( X, d ))1 { d ( X, y ) ≤ d } ] , (63) ǫ ≥ sup γ> { P [ X ( X, d ) ≥ log M + γ ] − exp( − γ ) } . (64)The bound in (64) is just [3, Theorem 7]. This boundis first- and second-order optimal, that is, for memorylesssources and separable distortion measures the converse partof the result in (1) can be recovered using (64). The boundin (63), which is new, is always better than (64), as theproof of Theorem 3 shows. In the special cases of a binarysource with Hamming distortion and the Gaussian sourcewith mean-square distortion, (63) reduces to the bounds in[3, Theorem 20] and [3, Theorem 36]. A bound that is6umerically tighter than [3, Theorem 20] and [3, Theorem36] in some cases was recently proposed by Palzer andTimo [26]. The bound in [26] involves an optimization overan auxiliary scalar, while (63) provides the tightest knowngeneral converse bound to date that does not require anoptimization over auxiliary parameters. C. The necessary and sufficient condition
The following result pins down the necessary and suffi-cient conditions for equality in (22) to hold.
Theorem 4.
Assume that the infimum in (2) is achieved bysome P Y ⋆ | X . Then, the following statements are equivalent.A. The rate-distortion function is equal to Shannon’s lowerbound, R X ( d ) = h µ ( x ) − φ µ ( d ) . (65) B. For P X -a.s. x , X ( x, d ) = ı µ ( x ) − φ µ ( d ) . (66) C. The backward conditional distribution that achieves R X ( d ) satisfies, for P Y ⋆ -a.s. y , dP X | Y ⋆ = y dµ ( x ) = exp( − λ d ( x, y ))Σ . (67) Proof. B ⇒ A is trivial. To show A ⇒ B, note thatthe existence of P Y ⋆ | X that achieves the infimum in (2)implies differentiability of R X ( d ) [20]. It follows that themaximum in (13) is attained by a unique g ( x ) [20]. Since(65) establishes that g ( x ) that attains the maximum in (13)is that in Table I, (66) is immediate.To show B ⇔ C, recall the following equivalent represen-tation of X ( x, d ) [3]: X ( x, d ) = log dP X | Y ⋆ = y dP X ( x ) + λ d ( x, y ) − λd. (68)Equality in (68) holds for P Y ⋆ -a.s. y . Comparing (66) and(68) we conclude the equivalence B ⇔ C.The necessary and sufficient conditions in Theorem 4assume a particularly simple form for difference distortionmeasures and the choice of µ as in Table II. In that case,clause C can be replaced byC ′ . There exists a random variable Y ⋆ such that X = Y ⋆ + Z λ , (69)where Y ⋆ is independent of Z λ , and Z λ is defined inTable II. Example. If X is equiprobable on a finite group, (65) alwaysholds. Indeed, in that case, equiprobable Y ⋆ satisfies (69). That is, P X | Y ⋆ such that P X P Y ⋆ | X = P X | Y ⋆ P X . Example.
For a binary X with bias p under Hammingdistortion measure, Z λ is binary with bias d , and (65) issatisfied for all ≤ d ≤ p by Y ⋆ with bias q , where q (1 − d ) + (1 − q ) d = p . Example.
Gaussian source with mean-square error distortionsatisfies the conditions of Theorem 4; indeed, X = Y ⋆ + Z λ ,where X ∼ N (0 , σ I ) , Y ⋆ ∼ N (0 , ( σ − d ) I ) ⊥⊥ Z λ ∼N (0 , d I ) .Theorem 4 extends a result by Gerrish and Schultheiss[27], who showed that for the compression of a continu-ous random vector under the mean-square error distortion,the Shannon lower bound gives the actual value of rate-distortion function if and only if X can be written as thesum of two independent random vectors X = Y ⋆ + Z , where Z ∼ N (0 , d I ) . Theorem 4 also generalizes the backward-channel condition for equality in the Shannon lower boundgiven in [22, Theorem 4.3.1]. Unlike these classical results,Theorem 4 applies to abstract sources and does not enforceany symmetry assumptions on the distortion measure.Most continuous probability distributions do not meet thecondition in (69). In particular, an X with indecomposabledistribution cannot satisfy (69), for any difference distortionmeasure. In contrast, as the following result shows, for finitealphabet sources the classical Shannon lower bound (i.e.taking µ be the counting measure) is always attained withequality, as long as target distortion d is not too large. Theorem 5 (Pinkston [23]) . Let X ∈ X , where X is a finitealphabet. Let the distortion measure d : X 7→ X be such that d ( x, x ) = 0 , d ( x, y ) > for all x = y , and all columns ofthe distortion matrix { d ( x, y ) } x,y consist of the same setof entries (balanced distortion measure). Then, there existsa d c > such that the classical Shannon lower bound issatisfied with equality for ≤ ∀ d ≤ d c . (70) Example.
For symbol error distortion equality in (50) holdsfor all ≤ d ≤ ( m −
1) min x ∈X P X ( x ) . (71)Difference distortion measures satisfying (37) are in-cluded in the assumption of Theorem 5. Generalizationsof Pinkston’s result are found in the works of Gray [24],[28], [29], who showed in [24] that the rate-distortionfunction equals the Shannon lower bound in the rangeof small distortions for stationary ergodic finite alphabetsources, generalizing and simplifying the proofs of Gray’sprevious results in [28] (binary Markov source with BERdistortion and Gauss-Markov source) and [29] (finite statefinite alphabet Markov sources).Leveraging the necessary and sufficient conditions in The-orem 4, we conclude that under the conditions of Theorem 5,the d -tilted information is given by (6). Applying (6) to (1),7e conclude that for the compression of a discrete mem-oryless sources under a balanced distortion measure, theminimum achievable rate compatible with excess probability ǫ at distortion d satisfies (10) for all d ≤ d c .As mentioned above, continuous sources rarely meet theclassical Shannon lower bound with equality, and thus (10)does not hold in general. Nevertheless, as we will see,lattice quantization of continuous sources often approaches(10). This striking phenomenon is the major focus of theremainder of the paper. The next section introduces the topicby discussing lattice coverings of space.III. N ONASYMPTOTIC ANALYSIS OF LATTICEQUANTIZATION
A. Lattices: definitions
Let C be a non-degenerate lattice in R n : C , { c = G i : i ∈ Z n } , (72)where the n × n generator matrix G is non-singular.The nearest-neighbor C -quantizer is the mapping q C : R n
7→ C defined by q C ( x ) , argmin c ∈C k x − c k , (73)and the Voronoi cell V C ( c ) is the set of all points quantizedto c : V C ( c ) , { x ∈ R n : q C ( x ) = c } . (74)The ties in (73) are resolved so that the resulting Voronoicells are congruent. We denote by V C the cell volume oflattice C : V C , Vol ( V C (0)) (75) = | det G | . (76)The radius of the Voronoi cell, i.e. the minimum radiusof a ball containing V C (0) , is called the covering radius oflattice C : r C , max x ∈ R n k x − q C ( x ) k . (77) Covering efficiency of a lattice C is measured by the nor-malized ratio of the volume of the ball of radius r C to thevolume of the Voronoi cell: ρ C , r C b n n V n C , (78)where b n is the volume of a unit ball: b n , π n Γ (cid:0) n + 1 (cid:1) . (79)By definition, ρ C ≥ , (80)and the closer ρ C is to the more sphere-like the Voronoicells of C are and the better lattice C is for covering. B. The distribution at the output of lattice quantizers
The purpose of this subsection is to express the distri-bution at the output of lattice quantizers in terms of thedistribution of the raw data X and the sizes of quantizationcells. We will characterize both the information at the outputof the quantizer, that is, the random variable ı ( q C ( X )) = log 1 q C ( X ) , (81)and the corresponding entropy, H ( q C ( X )) , given by theexpectation of (81). This characterization holds the key tostudying the fundamental limits of data compression withlattices. Indeed, it is well known that L ⋆S , the minimumaverage length required to losslessly encode discrete randomvariable S is bounded in terms of the entropy of S as [30],[31] H ( S ) − log ( H ( S ) + 1) − log e ≤ L ⋆S (82) ≤ H ( S ) . (83)Likewise, in fixed-length almost lossless data compression,the probability of error ǫ is bounded in terms of thedistribution of the information random variable as (e.g. [32]) P [ ı ( S ) ≥ log M + γ ] − exp( − γ ) ≤ ǫ (84) ≤ P [ ı ( S ) > log M ] , (85)where M is the number of distinct values at the output ofthe compressor.For a lattice C ∈ R n , denote the n -dimensional randomvector X C by X C , q C ( X ) + U C , (86)where the random vector U C is uniform on V C (0) . See Fig.1. f X f X C Fig. 1:
Example: densities of X and X C for n = 1 . It is easy to verify that the entropy at the output of thequantizer based on lattice C can be expressed as H ( q C ( X )) = h ( X C ) − log V C . (87)If h ( X ) > −∞ , then D ( X k X C ) = − h ( X ) + h ( X C ) , (88)8nd (87) can be rewritten as H ( q C ( X )) = h ( X ) − log V C + D ( X k X C ) . (89)If the distortion measure is the mean-square error, themaximum distortion is related to the covering radius as d = r C n , (90)so that log V C = n log √ nd + log b n − n log ρ C , (91)and we can continue (89) as H ( q C ( X )) = h ( X ) − n log √ nd − log b n + n log ρ C + D ( X k X C ) . (92)By Stirling’s approximation, log b n = n πen −
12 log n + O (1) . (93)It follows that the difference between the first three terms in(92) and Shannon’s lower bound is of order log n + O (1) ,as n → ∞ . The term D ( X k X C ) is the penalty due to f X not being completely flat. Intuitively, this term decreases asthe quantization cells shrink, an effect we will explore indetail shortly (see Theorem 7 and Theorem 8 below). Theterm n log ρ C is the penalty due to the lattice cells not beingperfect spheres. To understand how large this term can be,note that the thinnest lattice covering in dimensions 1 to 5is proven to be A ∗ n (Voronoi’s principal lattice of the firsttype) [33], which has covering efficiency ρ A ∗ n = b n n ( n + 1) n s n ( n + 2)12( n + 1) , (94)so for n = 1 , , . . . , , H (cid:0) q A ∗ n ( X ) (cid:1) = h ( X ) − n log √ nd + n n ( n + 2)12( n + 1)+ 12 log( n + 1) + D ( X k X C ) . (95)Actually, A ∗ n is the thinnest lattice covering known in alldimensions n ≤ . But A ∗ has covering efficiency ≈ . and is inferior to the Leech lattice Λ , for which ρ Λ ≈ . .More generally, the following result demonstrates theexistence of covering-efficient lattices. Theorem 6 (Rogers [12, Theorem 5.9]) . For each n ≥ ,there exists an n -dimensional lattice C n with coveringefficiency n log ρ C n ≤ log √ πe (log n + log log n + c ) , (96) where c is a constant. A natural question to ask next is the following: whatis the minimum H ( q C ( X )) attainable among all latticequantizers? For a distortion measure d : R n × R n R + , denote the minimum entropy at the output of a latticequantizer for the random vector X ∈ R n by L X ( d ) , inf C : d ( X,q C ( X )) ≤ d a.s. H ( q C ( X )) . (97)The definition in (97) parallels the definition of d -entropy[34]: H X ( d ) , inf q : R n R n d ( X,q ( X )) ≤ d a.s. H ( q ( X )) , (98)with the distinction that in (98) the infimization is performedover all mappings q : R n R n and not just lattice quan-tizers. For that reason, we call the function in (97) lattice d -entropy . Note that R X ( d ) ≤ H X ( d ) ≤ L X ( d ) . (99)Using (92), if h ( X ) > −∞ , we can characterize thelattice d -entropy under the mean-square error distortion (40)as, L X ( d ) = h ( X ) − n log √ nd − log b n + inf C : r C ≤√ nd { D ( X k X C ) + n log ρ C } . (100)Since the term inside the infimum in (100) is nonnegative, L X ( d ) is lower-bounded by the first three terms in (100).Applying (93) we see that these three terms are within log n + O (1) information units from Shannon’s lowerbound, so L X ( d ) ≥ h ( X ) − n log √ πed + 12 log n + O (1) . (101)On the other hand, upper-bounding the infimum in (100) bypicking any lattice C that satisfies the Rogers condition (96),we obtain L X ( d ) ≤ h ( X ) − n log √ πed + D ( X k X C ) + O (log n ) . (102)As we will see shortly, for small d , the term D ( X k X C ) isalso small. Intuitively, this means that at large n and small d , good lattice quantizers are almost as good as the bestoptimal quantizer.In the rest of this section, we explore the behavior of h ( X C ) and D ( X k X C ) . The next result, Theorem 7 below,concerns itself with the behavior of h ( X C ) in the limitof increasing point density, or vanishing cell volume. Asevident from (76), a scaling of G by V n | det G | n results inthe lattice of cell volume V . Fixing G and consideringlattices generated by V n | det G | n G , we obtain a continuum oflattices parameterized by V . We will be interested in thebehavior of h ( X C ) as V → . Clearly, as quantizationcells become smaller, the distribution of X C becomes abetter approximation of the distribution of X . Theorem 7below, due to Csisz´ar [11], formalizes this intuition by In literature, the distortion threshold is sometimes denoted by ǫ andthe quantity in (98) is referred to as the “epsilon-entropy”. In this paper, ǫ is reserved for the excess distortion probability and d for the distortionthreshold. X C andthe differential entropy of X .For vector x n ∈ R n , ⌊ x n ⌋ denotes the vector of integerparts of its components, that is, ⌊ x n ⌋ = ( ⌊ x ⌋ , . . . , ⌊ x n ⌋ ) . Theorem 7 (Csisz´ar [11]) . Let X be a random variable andlet C be a lattice in R n . Assume that H ( ⌊ X ⌋ ) < ∞ . (103) For any sequence of lattices with vanishing cell volume, lim V C → h ( X C ) = h ( X ) , (104) where X C is defined in (86) . Theorem 7 holds even if X does not have a density; inthat case, h ( X ) = −∞ . If h ( X ) > −∞ , using (88), we canrewrite (104) as lim V C → D ( X k X C ) = 0 . (105)Theorem 7 also holds for the more general case of non-lattice partitions of R n into sets of equal volume.Assumption (103) is needed to ensure that the tails of f X ( x ) log f X ( x ) are well behaved. If the probability densityfunction f X is continuous and is supported on a compactset, then one can show that (104) holds in the followingelementary manner (cf. [35, Sec. 8.3]). Applying the meanvalue theorem to f X C ( c − u ) = E [ f X ( c − U C )] , (106)for each c ∈ C we note the existence of u c ∈ V C (0) suchthat f X ( c − u c ) = f X c ( c − u ) , ∀ u ∈ V C (0) . (107)It follows that h ( X C ) is the Riemann sum for f X ( x ) log f X ( x ) and the partition generated by theVoronoi cells of C labeled by c − u c , c ∈ C , that is, h ( X C ) = X c ∈C V C f X C ( c ) log 1 f X C ( c ) (108) = X c ∈C V C f X ( c − u c ) log 1 f X ( c − u c ) . (109)Convergence to h ( X ) follows by the definition of theRiemann integral.The sufficient condition for (103) to hold is E (cid:20) log (cid:18) √ n k X k (cid:19)(cid:21) < ∞ , (110)which in turn holds for any source vector with E [ k X k α ] < ∞ for some α > . This sufficient condition was proved byWu and Verd´u [36, Proposition 1] for scalar X ; Koch [37]noticed that it continues to hold in the vector case as well.Prior to Csisz´ar, the validity of (104) under a morerestrictive assumption was proved by R´enyi [10, Theorem4]. Csisz´ar [11] showed the validity of (104) under thefollowing assumption. Suppose there exists some Borel measurable partition {B , B , . . . } of R n into sets of finiteLebesgue measure such that the following two conditionsare satisfied.1) X i P X ( B i ) log 1 P X ( B i ) < ∞ . (111)2) There exist ρ > and s ∈ N such that for all k , thedistance between B k and B ℓ , k = ℓ , is greater than ρ for all but at most s indices ℓ .Recently, Koch [37] showed that the rate-distortion functionis infinite for all d > if H ( ⌊ X ⌋ ) = ∞ , as long as thedifference distortion measure has form k · k r , where k · k isan arbitrary norm and r > . Thus, the assumption (103) isas general as Csisz´ar’s assumption in most cases of interest.The strength of Theorem 7 is that it requires only a verymild assumption on the source density, namely, (103). Theweakness is that it does not offer any estimate on the speedof convergence to the limit in (104) (or, equivalently, in(105)); such an estimate will be crucial in our study of thebehavior of the output distribution of lattice quantizers inthe limit of increasing dimension. Naturally, for the relativeentropy D ( X k X C ) to be small, the probability densityfunction of X should not change too abruptly within a singlequantization cell.The following smoothness condition will be instrumentalin quantifying the variability of f X . Definition 1 ( v -regular density) . Let v : R n R + . Differ-entiable probability density function f X is called v -regularif k∇ f X ( x ) k ≤ v ( x ) f X ( x ) , ∀ x ∈ R n . (112)All differentiable probability density functions are v -regular, for some v . The function v ( x ) measures how fastthe probability density function f X ( x ) varies as x varies.In the extreme case of X uniform on Ω , an open subset of R n , we have v ( x ) ≡ , ∀ x ∈ Ω . In general, the function v ( x ) ≥ can be thought of as the measure of the distancebetween f X and the uniform distribution. In fact, v ( x ) isclosely related to the total variation of f X ( x ) : TV( f X ) , Z R n k∇ f X ( x ) k dx (113) ≤ E [ v ( X )] . (114)Using (114), we see that a differentiable f X has finitevariation if and only if (112) holds with equality for f X -a.s. x and some function v such that E [ v ( X )] < ∞ ; thetotal variation of f X is then given by E [ v ( X )] .Another way to look at (112) is to observe that at any x with f X ( x ) > , (112) is equivalent to k∇ log f X ( x ) k ≤ v ( x ) log e. (115)Thus, v ( x ) can be taken to be the norm of the gradient ofthe natural logarithm of f X ( x ) , which results in equality10n (115) and is thereby optimal. For example, if X ∼N (0 , σ I ) , then the optimal choice is v ( x ) = σ k x k .Since the function v ( x ) quantifies how much the densityof X can change within a single quantization cell, it willbe useful in bounding the entropy (and information) at theoutput of a lattice quantizer for X in terms of the sizes oflattice quantization cells.Definition 1 presents a generalization of a smoothnesscondition recently suggested by Polyanskiy and Wu [38],who considered densities satisfying (115) with v ( x ) = c k x k + c . (116)for some c ≥ , c > .A wide class of c k x k + c -regular densities is identifiedin [38]. In particular, the density of B + Z , with B ⊥⊥ Z and Z ∼ N (0 , σ I ) is σ E [ k B k ]+ σ k x k -regular. Likewise, ifthe density of Z is c k x k + c -regular, then that of B + Z ,where k B k ≤ b a.s., B ⊥⊥ Z , is c k x k + c + c b -regular.Furthermore, if X has c k x k + c -regular density and finitesecond moment, then its differential entropy is finite.Regularity of a product density is easily established if themarginal densities are regular, as the following result details. Proposition 1. If f X n = f X . . . f X n and f X i is v i -regular,then f X n is v -regular, where v ( x n ) = k v ( x ) , . . . , v n ( x n ) k . (117) Proof.
Since f X i is v i -regular, we have: (cid:12)(cid:12) f ′ X i ( x i ) (cid:12)(cid:12) ≤ v i ( x i ) f X i ( x i ) . (118)Therefore, k∇ f X ( x ) k (119) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) f ′ X ( x ) f X ( x ) , . . . , f ′ X n ( x n ) f X n ( x n ) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) f X ( x ) · . . . · f X n ( x n ) ≤ v ( x n ) f X n ( x n ) . (120)We now state the main result of Section III, whichprovides upper bounds on ı ( q C ( x )) and H ( q C ( X )) forregular densities. Theorem 8.
Let X be a random variable with h ( X ) > −∞ and v -regular density. Let C be a lattice in R n . Then theinformation random variable and the entropy at the outputof lattice quantizer can be bounded as ı ( q C ( x )) ≤ log 1 f X ( x ) − log V C + 2 r C v C ( x ) log e, (121) H ( q C ( X )) ≤ h ( X ) − log V C + 2 r C E [ v C ( X )] log e, (122) where r C is the lattice covering radius defined in (77) , and v C ( x ) is given by v C ( x ) = max u ∈V C ( q C ( x )) v ( u ) . (123) Furthermore, if v ( x ) = v ( k x k ) is convex and nondecreasing,then (121) and (122) can be strengthened by replacing (123) with v C ( x ) = 12 v ( k x k ) + 12 v ( k x k + 2 r C ) , (124) and if v ( x ) = c k x k + c , then (124) particularizes as v C ( x ) = c k x k + c r C + c . (125) Proof.
Observe that, | log f X ( a ) − log f X ( b ) | = (cid:12)(cid:12)(cid:12)(cid:12)Z ( ∇ log f X ( ta + (1 − t ) b ) , a − b ) dt (cid:12)(cid:12)(cid:12)(cid:12) (126) ≤ k a − b k log e Z v ( ta + (1 − t ) b ) dt (127) ≤ max ≤ t ≤ v ( ta + (1 − t ) b ) k a − b k log e, (128)where ( · , · ) denotes the scalar product, and (127) holdsby Cauchy-Schwarz inequality. Using (107) and (128), weevaluate the information in q C ( x ) as ı ( q C ( x ))= log 1 f X C ( x ) − log V C (129) = log 1 f X ( x ) − log V C + log f X ( x ) f X ( q C ( x ) − u c ) (130) ≤ log 1 f X ( x ) − log V C + 2 r C v C ( x ) log e, (131)which is equivalent to (121), and (122) is immediate upontaking an expectation of (121).If v ( x ) = v ( k x k ) , convex and nondecreasing, westrengthen (128) by applying Jensen’s inequality to (127): | log f X ( a ) − log f X ( b ) |≤ log e v ( k a k ) + v ( k b k )) k a − b k (132) ≤ log e ( v ( k a k ) + v ( k a k + 2 r C )) r C , (133)where to get (133) we used the triangle inequality, theassumption that v is nondecreasing, and the fact that k a − b k ≤ r C for a and b from the same quantizationcell. Modifying (131) accordingly results in a strengtheningof (121) and (122) with v C in (124).Comparing (89) and (122), we see that Theorem 8 estab-lishes D ( X k X C ) ≤ r C E [ v C ( X )] log e. (134)Note that v ( x ) = v C ( x ) ≡ if and only if X is uniform.In that case, D ( X k X C ) = 0 , and the third term in (122)vanishes. Otherwise, the third term in (122) is positive. Itbecomes larger if f X varies noticeably within each quanti-zation cell, and it vanishes as the sizes of the quantization11ells become smaller ( r C → ). Thus, Theorem 8 allows oneto quantify the convergence rate in Csisz´ar’s Theorem 7.Succeeding a lattice quantizer C by an optimal losslesscoder (see Fig. 2) and keeping only M most likely real-izations of the output of the lossless coder, one obtains,according to (85), an ( M, d, ǫ ) code with ǫ ≤ P [ ı ( q C ( X )) > log M ] . (135) LATTICE QUANTIZER LOSSLESS CODER
Fig. 2:
Separated architecture of lattice quantization.
Applying the upper bound on ı ( q C ( X )) in (121) to (135),we conclude that there exists an ( M, d, ǫ ) lattice code with ǫ ≤ P (cid:20) log 1 f X ( X ) − log V C + 2 r C v C ( X ) log e > log M (cid:21) (136) ≤ P (cid:20) log 1 f X ( X ) − log V C + γ > log M (cid:21) + P [2 r C v C ( X ) > γ ] , (137)where (137) holds for any γ ≥ by the union bound.Applying (91), (93) and (96) to (137), we obtain ǫ ≤ P (cid:2) ( X, d ) + γ + O (log n ) > log M (cid:3) + P [2 r C v C ( X ) > γ ] , (138)where ( X, d ) is defined in Table II. Furthermore, as wewill show in Section IV, under regularity conditions γ = nO (cid:16) √ d (cid:17) can be chosen so that the second term in (138)is negligible, implying that ǫ / P h ( X, d ) + nO (cid:16) √ d (cid:17) + O (log n ) > log M i , (139)which provides a matching upper bound for (62). Together,(62) and (139) say that the excess distortion probability ofthe best code is given roughly by the complementary cdf of ( X, d ) evaluated at log M , the logarithm of the code size.In other words, as advertised in (4), ( X, d ) approximatesthe amount of information that needs to be stored about X in order to restore it with distortion d .IV. A SYMPTOTIC ANALYSIS OF LATTICE QUANTIZATION
A. First order analysis
Lattice coverings of space become more efficient as the di-mension increases. In this section we study the fundamentalrate-distortion tradeoffs attainable by lattice quantizers in thelimit of large dimension n . This analysis is afforded by thebounds presented Section II and Section III. Section IV-Apresents the first-order (Shannon-type) asymptotic resultscomparing the behavior of lattice quantizers in the limit of infinite n to the Shannon lower bound. A refined, second-order analysis quantifying how fast this asymptotic limit isapproached is presented in Section IV-B below.The rate-distortion function can be defined as follows. Definition 2.
The rate-distortion function for the compres-sion of a sequence of random variables X , X , . . . isdefined by R ( d ) , lim sup n →∞ n H X n ( d ) (140) where H X n ( d ) is the d -entropy of vector X n , defined in (98) . The lattice rate-distortion function can be defined asfollows.
Definition 3.
The lattice rate-distortion function for thecompression of a sequence of random variables X , X , . . . is defined by L ( d ) , lim sup n →∞ n L X n ( d ) (141)The operational meaning of (140) is the minimum averagerate asymptotically compatible with maximal distortion d .Indeed, substituting S = q ( X n ) into (82) and (83) anddividing through by n , we conclude that (140) is equivalentto the operational definition R ( d ) = lim sup n →∞ n inf q : R n R n d ( X n ,q ( X n )) ≤ d a.s. L ⋆q ( X n ) . (142)Similarly, (141) is equivalent to the operational definition L ( d ) = lim sup n →∞ n inf C : d ( X n ,q C ( X n )) ≤ d a.s. L ⋆q C ( X n ) . (143)For convenience, denote the limsup of normalized n -dimensional Shannon’s lower bounds R ( d ) , lim sup n →∞ n R X n ( d ) , (144)where recall that R X ( d ) denotes the classical Shannon lowerbound for X .The first result in this section provides a characterizationof the lattice rate-distortion function for sources with regulardensities. Theorem 9.
Consider a random process X , X , . . . . Thelattice rate-distortion function under the mean-square errordistortion satisfies, R ( d ) = h − log √ πed (145) ≤ R ( d ) (146) ≤ L ( d ) , (147) where h , lim sup n →∞ n h ( X n ) . (148)12 urthermore, suppose that the density f X n is c k x n k + c √ n -regular with some c ≥ , c ≥ , and that thereexists a constant α > such that E [ k X n k ] ≤ √ nα. (149) Then, as d → , the lattice rate-distortion function is upperbounded by, L ( d ) ≤ R ( d ) + O (cid:16) √ d (cid:17) . (150) Proof.
The inequality in (146) is obtained by applying (99)and the Shannon lower bound to the expression under thelimsup in (140), and taking n to infinity. The inequality in(150) is obtained by applying the bound D ( X k X C ) ≤ r C (cid:0) c E [ k X k ] + c r C + c √ n (cid:1) log e, (151)which is a particularization of (134), to (102), normalizingby n and taking a limsup in n .Theorem 9 establishes that for a wide class of sourceswith sufficiently smooth densities, which includes non-stationary and non-ergodic sources, the lattice rate-distortionfunction approaches Shannon’s lower bound at a speed O (cid:16) √ d (cid:17) as d → .Theorem 9 implies that for sources with regular densities, lim d → lim sup n →∞ (cid:20) n L X n ( d ) + log √ d (cid:21) = h − log √ πe. (152)A weaker result, namely, lim sup n →∞ lim d → (cid:20) n L X n ( d ) + log √ d (cid:21) = h − log √ πe, (153)can be obtained with a weaker assumption on the sourcedistribution: for (153) to hold, only H ( ⌊ X n ⌋ ) < ∞ isrequired. Indeed, applying Csisz´ar’s result (105) to (102)and taking the limit in d , we obtain lim d → h L X ( d ) + n log √ d i = h ( X ) − n log √ πe + O (log n ) . (154)Dividing by n and taking n to infinity leads to (153). Thereason Csisz´ar’s result in Theorem 7 is insufficient to prove(152) is that even though it establishes that D ( X k X C ) converges to as d → for any fixed n , it leavesunaddressed the behavior of D ( X k X C ) as n grows.Equality in (154) implies that for X ∈ R n with H ( ⌊ X ⌋ ) < ∞ , lim d → L X ( d ) n log d = 1 , (155)which can be viewed as a lattice counterpart of R´enyiinformation dimension [10].While the result in (152) is new, the statements similarto (153) and (154) are found in the existing literature.The counterpart of (153) for dithered lattice quantizationis contained in Zamir’s text [39]. The following result was shown by Linkov [6] and revisited, under progressively moregeneral assumptions, by Linder and Zamir [7] and by Koch[37], who showed that as long as H ( ⌊ X ⌋ ) < ∞ , it holdsthat lim d → h R X ( d ) + n log √ d i = h ( X ) − n log √ πe, (156)where X ∈ R n , and R X ( d ) is the minimal mutual infor-mation quantity defined in (2). Regarding the operationalmeaning of (156) in the context of n -dimensional quantiza-tion, we note the following observations, which highlight thedifference between dithered and non-dithered quantization. • Koch and Vazquez-Vilar [40] recently showed that ifone replaces R X ( d ) in (156) by the minimum outputentropy attainable by an n -dimensional quantizer oper-ating at average distortion d , then the resulting limit as d → is strictly greater than the right side of (156). • The reasoning in [6], [7], [37] reveals that lim d → h I ( X ; X + Z ) + n log √ d i = h ( X ) − n log √ πe, (157)where the choice of Z satisfies E [ d ( X, X + Z )] ≤ d .Since, operationally, I ( X ; X + Z ) corresponds to thequantization rate (see e.g. [39]) of X dithered by Z ,there exists an n -dimensional dithered quantizer oper-ating at average distortion d and whose rate satisfies(157). Remark . If, instead of requiring that P (cid:2) n k X − q C ( X ) k ≤ d (cid:3) = 1 as in (97), we ask only that E (cid:2) n k X − q C ( X ) k (cid:3) ≤ d , then the analog of the resultin (154) can be alternatively obtained as follows. Denotethe minimum of normalized second moments over all n -dimensional lattices by G ⋆n , min C n E [ k U C n k ] nV n C n , (158)where U C n is uniform on V C n (0) . In [16, (25)], it is shownthat G ⋆n converges to πe at a rate n log(2 πeG ⋆n ) = O (cid:18) log nn (cid:19) , (159)a result which Zamir and Feder attributed to Poltyrev. Let C ⋆n,d be the lattice whose normalized second moment equals G ⋆n rescaled so that its mean square error (with respect to X n ) is d . Using C ⋆n,d in [15, Theorem 1], one concludesthat lim d → (cid:16) H (cid:16) q C ⋆n,d ( X ) (cid:17) + n log √ d (cid:17) = h ( X ) + 12 log( G ⋆n ) . (160)Substituting (159) into (160), one obtains the same asymp-totics as in (154). A gentle modification of the aboveargument (apparent from [16, (26)] and [15, Lemma 1])leads to (154) for the maximal distortion criterion as well.13 . Second order analysis The minimum achievable coding rate at a given block-length and a given excess distortion probability is definedas R ( n, d, ǫ ) , n min { log M : ∃ ( M, d, ǫ ) code for X ∈ R n } . (161)Theorem 10, stated next, provides a refined approxima-tion to R ( n, d, ǫ ) at a given blocklength and a given lowdistortion. Theorem 10.
Let X ∈ R have c | x | + c -regular density f X such that E (cid:2) | log f X ( X ) | (cid:3) < ∞ and E (cid:2) X (cid:3) < ∞ . Forthe compression of the source consisting of i.i.d. copies of X under the mean-square error distortion, it holds that R ( n, d, ǫ ) = R ( d )+ r V n Q − ( ǫ )+ O (cid:16) √ d (cid:17) + O (cid:18) log nn (cid:19) . (162) where R ( d ) and V are given by the mean and variance of X ( X , d ) = log 1 f X ( X ) − log √ πed, (163) respectively, and ≤ O (cid:16) √ d (cid:17) ≤ O (cid:16) √ d (cid:17) , (164) O (cid:18) n (cid:19) ≤ O (cid:18) log nn (cid:19) (165) ≤ log (2 √ πe ) log nn + O (cid:18) n log log n (cid:19) . (166) Furthermore, (162) is attained by lattice quantization.Proof of the converse part.
We show that as long as X ∈ R has a density (regularity is not required for the converse),the minimum rate required to quantize X , which is a vectorof n i.i.d. copies of X , is at least nR ( n, d, ǫ ) ≥ nR ( d ) + p n V Q − ( ǫ ) + O (1) . (167)The proof consists of the analysis of the converse bound in(53). Letting µ be the Lebesgue measure on R n and letting d be the mean-square error, observe that regardless of thechoice of y ∈ R n , µ [ d ( X, y ) ≤ d ] is equal to the volumeof Euclidean ball of radius √ nd , i.e. log µ [ d ( X, y ) ≤ d ] = log b n + n log √ nd (168) = n log √ πed −
12 log n + O (1) , (169)where to get (169) we invoked (93). Furthermore, theNeyman-Pearson function expands as [41, Lemma 58], [3,(251)] log β − ǫ ( P X , µ ) = nh ( X ) + p n V Q − ( ǫ ) −
12 log n + O (1) . (170) According to (53), for any ( M, d, ǫ ) code, log M is lowerbounded by the difference between (170) and (169), whichis exactly (167). Proof of the achievability part.
The proof consists of theanalysis of the bound on the excess distortion probabilityof lattice quantizers in (137). Assume that X is a vector of n i.i.d. copies of X . According to Proposition 1, the densityof X is c k x k + c √ n -regular.First, consider the case of non-uniform distribution: Var [ f X ( X )] > . The second term in (137) is equal to P [2 r C v C ( X ) > γ ]= P h √ nd ( c k X k + c √ nd + c √ n ) > γ i . (171)Denote the constant α , E (cid:2) X (cid:3) . (172)By Chebyshev’s inequality, P (cid:2) k X k > nα (cid:3) ≤ n . (173)So, the choice γ = 2 n √ d (cid:16) c √ α + c √ d + c (cid:17) (174)ensures that the probability in (171) is upper bounded by n .To analyze the the first term in (137), note that accordingto the Berry-Esse´en theorem, for all < ǫ ′ < , P (cid:20) log 1 f X ( X ) > nh ( X ) + p n Var [log f X ( X )] Q − ( ǫ ′ ) (cid:21) ≤ ǫ ′ + B √ n , (175)where B = 6 E (cid:2) | log f X ( X ) + h ( X ) | (cid:3) Var [log f X ( X )] (176)is the Berry-Esse´en constant, finite by the assumptions Var [log f X ( X )] > and E (cid:2) | log f X ( X ) | (cid:3) < ∞ . Therefore,letting log M (177) = nh ( X ) − log V C + p n Var [log f X ( X )] Q − ( ǫ ′ ) + γ, we conclude that P (cid:20) log 1 f X ( X ) − log V C + γ > log M (cid:21) ≤ ǫ ′ + B √ n . (178)Finally, choosing ǫ ′ as ǫ ′ = ǫ − B √ n − n , (179)we conclude that the sum of both terms in (137) does notexceed ǫ . It follows that there exists an ( M, d, ǫ ) code with M given in (177) and ǫ given in (179). Letting C be a lattice14atisfying (96) and applying (91), (93) and (96) to (177), weexpress (177) as log M = nh ( X ) − n log √ πed + p n Var [log f X ( X )] Q − ( ǫ )+ log (2 √ πe ) log n + O (log log n ) + O (cid:16) √ d (cid:17) , (180)which concludes the proof of the achievability part of (162)for non-uniform X .If X is uniform on a compact set, then log f X ( X ) = nh ( X ) a.s., v C ( x ) ≡ , and (137) implies that there exists an ( M, d, ǫ ) code with ǫ = 1 { nh ( X ) − log V C > log M } . (181)Choosing log M = nh ( X ) − log V C (182)results in ǫ = 0 . It follows that if X is uniform, then thereexists an ( M, d, code with log M = nh ( X ) − n log √ πed + log (2 √ πe ) log n + O (log log n ) . (183)It can be shown [2] that (167) continues to hold for finitealphabet sources. The O (1) lower bound on the third orderterm in (167) presents an improvement for the cases whereShannon’s lower bound is tight over the general − log n + O (1) lower bound shown in [3] .We conclude this section with a result that provides anestimate of the speed of convergence to R ( d ) for sourceswith memory. At this level of generality, even a first orderasymptotic analysis is highly nontrivial, and no second-orderresults exist to date. Theorem 11 below shows that fora class of sources with memory, the rate of approach tothe rate-distortion function is of order √ n . Even thoughTheorem 11 does not specify the constant in front of √ n , itpresents a step forward in the notoriously difficult problemof quantifying the rate-distortion tradeoffs for sources withmemory. Theorem 11 is an easy implication of the approachdeveloped in Section II and Section III. Theorem 11.
Let the random process X , X , . . . be suchthat the density f X n is log-concave and c k x n k + c √ n -regular with some c ≥ , c ≥ , and that the expectationof the norm of X n is bounded as in (149) . For the compres-sion of X , X , . . . under mean-square error distortion, itholds that R ( n, d, ǫ ) = R ( d ) + q ( ǫ ) √ n + O (cid:16) √ d (cid:17) + O (cid:18) log nn (cid:19) , (184) where R ( d ) is given in (145) , and − r − ǫ ≤ q ( ǫ ) ≤ r ǫ . (185) Moreover, (184) is attained by lattice quantization. Proof of the converse part.
We weaken (54) by choosing γ = log n and letting µ be the Lebesgue measure todeduce that the parameters of any ( M, d, ǫ ) code mustsatisfy the inequality ǫ ≥ − P (cid:20) log 1 f X ( X ) − φ ( d ) < log M + 12 log n (cid:21) − √ n . (186)For log M = h ( X ) − φ ( d ) − s Var [ f X ( X )]1 − ǫ ′ − √ n −
12 log n, (187)we observe that due to Chebyshev’s inequality, the probabil-ity in the right side of (186) is upper-bounded by − ǫ ′ − √ n .We conclude that ǫ ≥ ǫ ′ , which implies the validity of theconverse part of (184) when combined with the recent resultof Fradelizi et al. [42, Theorem 2.3], which states that aslong as X ∈ R n has log-concave density, Var [log f X ( X )] ≤ n. (188) Proof of the achievability part.
The proof mimics the proofof the achievability part of Theorem 10, replacing theapplication of the Berry-Esse´en theorem by Chebyshev’sinequality. Namely, due to (149), (173) continues to hold.By Chebyshev’s inequality, P " log 1 f X ( X ) − h ( X ) ≥ r Var [ f X ( X )] ǫ ′ ≤ ǫ ′ . (189)Letting log M = h ( X ) − log V C + r Var [ f X ( X )] ǫ ′ + γ, (190) ǫ ′ = ǫ − n , (191)where γ is chosen as in (174), we conclude that the sum ofboth terms in (137) does not exceed ǫ . The proof is completeupon applying (188).V. B EYOND
MSE
DISTORTION
Section III discussed lattices that are good for cover-ing with respect the Euclidean norm, and accordingly, theasymptotic analysis in Section IV focused on the mean-square error distortion. This section summarizes how to gen-eralize those results to a wider class of distortion measures.We consider distortion measures of form d ( x, y ) = d ( n − p k W ( x − y ) k p ) (192)where W is an n × n invertible matrix, k·k p is the L p norm in R n , ≤ p ≤ ∞ , and d : R + R + is right-continuous. Thescaling by n − p in (192) is chosen so that the distortion doesnot have a tendency to increase with increasing dimension n .15 xample. Scaled weighted L p norm distortion fits the frame-work of (192): d ( x, y ) = n − sp k W ( x − y ) k sp , (193)where s > . Plugging s = 2 and p = 2 in (193) onerecovers the MSE distortion measure. An interesting specialcase is that of the L ∞ norm, which corresponds to thedistortion measure d ( x n , y n ) = max ≤ i ≤ n | x i − y i | s . (194) Example.
Weighted MSE distortion measure also fits theframework of (192): d ( x, y ) = 1 n k W ( x − y ) k . (195)Defining the nearest-neighbor quantizer and the latticecovering radius in terms of weighted (by W ) L p norm, ratherthan the Euclidean norm, one can generalize Section III todistortion measures of type (192). The maximum distortionis related to the (weighted L p ) covering radius as r C = n p r ( d ) , (196) r ( d ) , inf { r ≥ d ( r ) ≤ d } . (197)If d : R + R + is invertible, then simply r ( d ) = d − ( d ) .For example, the distortion measure in (193) corresponds to d ( r ) = r s ; therefore, r ( d ) = s √ d . The lattice cell volumecan be expressed as log V C = n log r C + log b n,p + log | det W | − n log ρ C , (198)where b n,p is the volume of a unit L p ball: b n,p , (cid:16) (cid:16) p + 1 (cid:17)(cid:17) n Γ (cid:16) np + 1 (cid:17) . (199)A curious special case is that of L ∞ norm, which corre-sponds to the distortion measure in (194): since an L ∞ ballis simply a cube, the cubic lattice quantizer attains the bestcovering efficiency ρ C = 1 .Substituting (198) into (89), we express the entropy at theoutput of the weighted L p quantizer based on lattice C as H ( q C ( X )) = h ( X ) − n log r C − log b n,p − log | det W | + n log ρ C + D ( X k X C ) . (200)By Stirling’s approximation, as n → ∞ , (199) expandsas log b n,p = n log c p − np log n −
12 log n + O (1) , p < ∞ , (201) log b n, ∞ = n log c ∞ = 2 n, (202)where c p , (cid:18) p + 1 (cid:19) ( pe ) p , p < ∞ , (203) c ∞ , lim p →∞ c p = 2 . (204) To study the covering efficiency of lattices with respect to d , we invoke the following result of Rogers to complementRogers’ Theorem 6: Theorem 12 (Rogers [12, Theorem 5.8], generalization ofTheorem 6) . For each n ≥ , there exists an n -dimensionallattice C n with covering efficiency (with respect to any norm) n log ρ C n ≤ log n (log n + c log log n ) , (205) where c is a constant. When particularized to the Euclidean norm, Theorem 12presents a weakened version of Theorem 6.It follows from (200), (201), (202) and Theorem 12 thatfor ≤ p ≤ ∞ , the lattice d -entropy with respect to thedistortion measure d satisfies, as n → ∞ , L X ( d ) ≥ h ( X ) − n log r ( d ) − n log c p − log | det W | + 12 log n + O (1) , (206) L X ( d ) ≤ h ( X ) − n log r ( d ) − n log c p − log | det W | + D ( X k X C ) + O (log n ) . (207)Plugging r ( d ) = √ d , c = √ πe and W = I into (206)and (207), one recovers the corresponding bounds for themean-square error distortion, namely, (101) and (102).It is enlightening to compare (206) with Shannon’s lowerbound. For the distortion measure in (193), a direct calcu-lation using Table II shows that Shannon’s lower bound isgiven by, for n → ∞ , R X ( d ) = h ( X ) + ns log 1 d − np log n − log b n,p + ns log nse − log Γ (cid:16) ns + 1 (cid:17) − log | det W | (208) = h ( X ) + ns log 1 d − n log c p − log | det W | + O (1) , (209)which up to the terms of order log n + O (1) is the same asa particularization of (206) to the distortion in (193). Moregenerally, if d ( · ) is differentiable at and < d ′ (0) < ∞ ,then by Taylor’s approximation, r ( d ) = d d ′ (0) + o ( d ) . (210)If d ′ (0) = . . . = d ( s − (0) = 0 , and < d ( s ) (0) < ∞ , then r ( d ) = s s s ! d d ( s ) (0) + o (cid:16) s √ d (cid:17) . (211)Suppose further that d ( · ) in the right side of (192) satisfiesLinkov’s regularity conditions (34)–(36). Then, [6, Corollar-ies 1, 2] imply that R X ( d ) = h ( X ) + ns log d ( s ) (0) s ! d − n log c p − log | det W | + n o (1) + O (1) , (212)16here o (1) denotes a term that vanishes (uniformly in n )as d → , and O (1) denotes a term that is bounded by aconstant. Again, up to the remainder terms, this coincideswith (206).Next, we study the behavior of (207), which requires thefollowing notion of regularity with respect to a weighted L q distance: a differentiable probability density function f X is v -regular if k W − ∇ f X ( x ) k q ≤ v ( x ) f X ( x ) , ∀ x ∈ R n . (213)Theorem 8 generalizes as follows: Theorem 13.
Let ≤ p ≤ ∞ and let p + q = 1 . Let X bea random variable with h ( X ) > −∞ and v -regular density(according to (213) ). Let C be a lattice in R n . Then theinformation random variable and the entropy at the outputof lattice quantizer formed for the distortion in (192) canbe bounded as (121) and (122) , respectively, with V C in (198) . Furthermore, if v ( x ) = v ( k W x k p ) is convex andnondecreasing, then (121) and (122) can be strengthenedby replacing (123) with v C ( x ) = 12 v ( k W x k p ) + 12 v ( k W x k p + 2 r C ) , (214) and if v ( x ) = c k W x k p + c , then (214) particularizes as v C ( x ) = c k W x k p + c r C + c . (215) Proof.
The reasoning leading up to (128) is adjusted as: | log f X ( a ) − log f X ( b ) | = (cid:12)(cid:12)(cid:12)(cid:12)Z (cid:0) W − ∇ log f X ( ta + (1 − t ) b ) , W ( a − b ) (cid:1) dt (cid:12)(cid:12)(cid:12)(cid:12) (216) ≤ k W ( a − b ) k p log e Z v ( ta + (1 − t ) b ) dt (217) ≤ max ≤ t ≤ v ( ta + (1 − t ) b ) k W ( a − b ) k p log e, (218)where (217) is by H¨older’s inequality. The proof of (214)and (215) is identical to the proof of (124) and (125).We are now prepared to state the generalizations of theasymptotic results in Section IV to non-MSE distortionmeasures.Theorem 9 generalizes to the distortion measure in (192)as follows. Theorem 14.
Consider a random process X , X , . . . anda sequence of distortion measures given by (192) with W = W n such that the limit ω , lim n →∞ n log | det W n | (219) exists and finite.The lattice rate-distortion function satisfies, h − log r ( d ) − log c p − ω ≤ L ( d ) , (220) and h is defined in (148) . Furthermore, suppose that p ≥ and that the density f X n is c k W n x n k p + c n p -regular withsome c ≥ , c ≥ , and that there exists a constant α > such that E [ k W n X n k p ] ≤ n p α. (221) Then, as d → , the lattice rate-distortion function is upperbounded by, L ( d ) ≤ h − log r ( d ) − log c p − ω + O ( r ( d )) . (222) Proof.
The lower bound in (220) follows from (206). Toshow (222), we apply the bound D ( X k X C ) (223) ≤ n p r ( d ) (cid:16) c E [ k W n X k p ] + c n p r ( d ) + c n p (cid:17) log e, which is a particularization of (134), to (102), we normalizeby n and we take a limsup in n .Using (209), we see that for the distortion measure in(193), Shannon’s lower bound is given by R ( d ) = h + 1 s log 1 d − log c p − ω, (224)which coincides with (220). More generally, if d ( · ) satisfiesLinkov’s conditions (34)–(36), observe using (210), (211)and (212) that R ( d ) = h − log r ( d ) − log c p − ω + o (1) , d → . (225)It follows from Theorem 14 that for a large class of distor-tion measures, the lattice rate-distortion function approachesthe Shannon lower bound as d → : L ( d ) = R ( d ) + o (1) , d → . (226)Theorem 10 generalizes as follows. Theorem 15 (Generalization of Theorem 10) . Let p ≥ .Assume that the density of X satisfies, | f ′ X ( x ) | ≤ ( c | x | + c ) f X ( x ) , ∀ x ∈ R , (227) where c ≥ and c ≥ , and that E (cid:2) | log f X ( X ) | (cid:3) < ∞ and E (cid:2) | X | q (cid:3) < ∞ , where p + q = 1 . Consider a sequenceof distortion measures of type (192) with d ( · ) satisfyingLinkov’s conditions (34) – (36) , and W = W n is such that n log | det W n | = ω + O (cid:18) log nn (cid:19) , (228) for some ω ∈ R , and that the minimum singular value of W n is bounded below by some σ > . For the compressionof the source consisting of i.i.d. copies of X under suchdistortion measure, it holds that R ( n, d, ǫ ) = R ( d ) + r V n Q − ( ǫ ) + o (1) + O (cid:18) log nn (cid:19) , (229)17 here and R ( d ) and V are given by the mean and thevariance of X ( X , d ) = log 1 f X ( X ) − log r ( d ) − log c p − ω, (230) respectively, and o (1) denotes a term that vanishes uni-formly in n as d → . For d in (193) , o (1) can be refinedto O (cid:16) s √ d (cid:17) . Lattice quantization attains (229) .Proof. The only observation required for the proof of Theo-rem 10 to apply is the following. If X is a vector of n i.i.d.copies of X , by (227) and Proposition 1 it holds that k∇ f X ( x ) k q ≤ (cid:16) c k x k q + c n q (cid:17) f X ( x ) . (231)It follows that k W n ∇ f X ( x ) k q ≤ ( c k W n x k q + c n q ) σ − f X ( x ) , (232)that is, X has a regular density in the sense of (213), andTheorem 13 can be applied in the same manner Theorem 8is used in the proof of Theorem 10.Note that (229) does not require the distortion measureto be separable. VI. C ONCLUSION
Shannon’s lower bound provides a powerful tool to studythe rate-distortion function. We started the discussion by pre-senting an abstract Shannon’s lower bound in Theorem 2 andits nonasymptotic analog in Theorem 3. Theorem 4 statesthe necessary and sufficient conditions for the Shannonlower bound to be attained exactly. According to Pinkston’sTheorem 5, all finite alphabet sources satisfy that conditionfor a range of low distortions. Whenever the Shannon lowerbound is attained exactly, the d -tilted information in x alsoadmits a simple representation as the difference between theinformation in x and a term that depends only on tolerateddistortion d (see (66)). This implies in particular that therate-dispersion function of a discrete memoryless sourcewith a balanced distortion measure is given simply by thevarentropy of the source, as long as the target distortion islow enough.Although continuous sources rarely attain Shannon’slower bound exactly, they often approach it closely at lowdistortions. For a class of sources whose densities satisfya smoothness condition, Theorem 8 presents a new boundon the output entropy of lattice quantizers in terms of thedifferential entropy of the source and the size of the latticecells. The gap between the lattice achievability bound inTheorem 8 and the Shannon lower bound can be explicitlybounded in terms of the target distortion, the source di-mension and the lattice covering efficiency. Theorem 8 alsopresents a bound on the information random variable at theoutput of lattice quantizers. That latter bound is particularly Note that Proposition 1 applies to any norm. useful for quantifying the nonasymptotic fundamental limitsof lattice quantization.Leveraging the bound in Theorem 8, we evaluated thebest performance theoretically attainable by variable-lengthlattice quantization of general (i.e. not necessarily ergodicor stationary) real-valued sources in the limit of largedimension (Theorem 9). For high definition quantizationof stationary memoryless sources whose densities satisfy asmoothness condition, we showed a Gaussian approximationexpansion of the minimum achievable source coding rate(Theorem 10). The appeal of the new expansion is itsexplicit nature and a simpler form compared to the moregeneral result in [3]. Going beyond memoryless sources, weshowed that for a class of sources with memory, the Shannonlower bound is attained at a speed O (cid:16) √ n (cid:17) with increasingblocklength (Theorem 11). The engineering implication isthat as long as the dimension n is not too small and thetarget distortion is not too large, a separated architecture ofa lattice quantizer followed by a lossless coder displayedin Fig. 2 is nearly optimal. Using a lattice with coveringefficiency ρ C induces a penalty of log ρ C to the attainablenonasymptotic coding rate. If the simplest uniform scalarquantizer is used, the penalty due to its covering inefficiencyis still only log πe ≈ . bits per sample.VII. A CKNOWLEDGEMENT
I would like to thank the Simons Institute for the The-ory of Computing in Berkeley for providing the nurturingenvironment where this work started amid the discussionsin the Spring of 2014; Dr. Tam´as Linder for his insightfulcomments and in particular for describing an alternativeway to obtain (154), now included in Remark 3; Dr. TobiasKoch for pointing out the references [37], [40]; Dr. YuryPolyanskiy for mentioning the result in [42].R
EFERENCES[1] V. Kostina, “Data compression with low distortion and finite block-length,” in
Proceedings 53rd Annual Allerton Conference on Commu-nication, Control and Computing , Monticello, IL, Oct. 2015.[2] ——, “When is Shannon’s lower bound tight?” in
Proceedings54th Annual Allerton Conference on Communication, Control andComputing , Monticello, IL, Oct. 2016.[3] V. Kostina and S. Verd´u, “Fixed-length lossy compression in thefinite blocklength regime,”
IEEE Transactions on Information Theory ,vol. 58, no. 6, pp. 3309–3338, June 2012.[4] R. Blahut, “Computation of channel capacity and rate-distortionfunctions,”
IEEE Transactions on Information Theory , vol. 18, no. 4,pp. 460–473, Jul. 1972.[5] C. E. Shannon, “Coding theorems for a discrete source with a fidelitycriterion,”
IRE Int. Conv. Rec. , vol. 7, no. 1, pp. 142–163, Mar. 1959,reprinted with changes in
Information and Decision Processes , R. E.Machol, Ed. New York: McGraw-Hill, 1960, pp. 93-126.[6] Y. N. Linkov, “Evaluation of ǫ -entropy of random variables for small ǫ ,” Problems of Information Transmission , vol. 1, pp. 18–26, 1965.[7] T. Linder and R. Zamir, “On the asymptotic tightness of the Shannonlower bound,”
IEEE Transactions on Information Theory , vol. 40,no. 6, pp. 2026–2031, Nov. 1994.[8] J. H. Conway and N. Sloane, “Fast quantizing and decoding andalgorithms for lattice quantizers and codes,”
IEEE Transactions onInformation Theory , vol. 28, no. 2, pp. 227–232, Mar. 1982.
9] A. Gersho and R. M. Gray,
Vector quantization and signal compres-sion . Springer Science & Business Media, 2012, vol. 159.[10] A. R´enyi, “On the dimension and entropy of probability distributions,”
Acta Mathematica Academiae Scientiarum Hungarica , vol. 10, no. 1-2, pp. 193–215, Mar. 1959.[11] I. Csisz´ar, “Generalized entropy and quantization problems,” in
Trans.Sixth Prague Conf Inform. Theory, Statist. Decision Functions, Ran-dom Processes . Prague, Czechoslovakia: Akademia, Sep. 1971, pp.159–174.[12] C. A. Rogers,
Packing and covering . Cambridge University Press,1964, no. 54.[13] A. Gersho, “Asymptotically optimal block quantization,”
IEEE Trans-actions on Information Theory , vol. 25, no. 4, pp. 373–380, Jul. 1979.[14] R. Zamir and M. Feder, “On universal quantization by randomizeduniform/lattice quantizers,”
IEEE Transactions on Information The-ory , vol. 38, no. 2, pp. 428–436, Mar. 1992.[15] T. T. Linder and K. K. Zeger, “Asymptotic entropy-constrained perfor-mance of tessellating and universal randomized lattice quantization,”
IEEE Transactions on Information Theory , vol. 40, no. 2, pp. 575–579, Mar. 1994.[16] R. Zamir and M. Feder, “On lattice quantization noise,”
IEEETransactions on Information Theory , vol. 42, no. 4, pp. 1152–1159,Jul. 1996.[17] C. E. Shannon, “Probability of error for optimal codes in a Gaussianchannel,”
Bell Syst. Tech. J. , vol. 38, no. 3, pp. 611–656, Mar. 1959.[18] W. R. Bennett, “Spectra of quantized signals,”
Bell Systems Tech. J. ,vol. 27, pp. 446–472, 1948.[19] P. Zador, “Asymptotic quantization error of continuous signals and thequantization dimension,”
IEEE Transactions on Information Theory ,vol. 28, no. 2, pp. 139–149, Mar. 1982.[20] I. Csisz´ar, “On an extremum problem of information theory,”
StudiaScientiarum Mathematicarum Hungarica , vol. 9, no. 1, pp. 57–71,Jan. 1974.[21] R. Gallager,
Information theory and reliable communication . JohnWiley & Sons, Inc. New York, 1968.[22] T. Berger,
Rate distortion theory . Prentice-Hall, Englewood Cliffs,NJ, 1971.[23] J. Pinkston, “An application of rate-distortion theory to a converseto the coding theorem,”
IEEE Transactions on Information Theory ,vol. 15, no. 1, pp. 66–71, Jan. 1969.[24] R. M. Gray, “Information rates of stationary ergodic finite-alphabetsources,”
IEEE Transactions on Information Theory , vol. 17, no. 5,pp. 516–523, Sep. 1971.[25] Y. Polyanskiy, “6.441: Information theory lecture notes,”
Dep. Elec-trical Engineering and Computer Science, M.I.T. , 2012.[26] L. Palzer and R. Timo, “A converse for lossy source coding inthe finite blocklength regime,” in
Proceedings International ZurichSeminar on Communications , Mar. 2016, pp. 15–19.[27] A. Gerrish and P. Schultheiss, “Information rates of non-Gaussianprocesses,”
IEEE Transactions on Information Theory , vol. 10, no. 4,pp. 265–271, Oct. 1964.[28] R. Gray, “Information rates of autoregressive processes,”
IEEE Trans-actions on Information Theory , vol. 16, no. 4, pp. 412 – 421, July1970.[29] ——, “Rate distortion functions for finite-state finite-alphabet Markovsources,”
IEEE Transactions on Information Theory , vol. 17, no. 2,pp. 127 – 134, Mar. 1971.[30] N. Alon and A. Orlitsky, “A lower bound on the expected length ofone-to-one codes,”
IEEE Transactions on Information Theory , vol. 40,no. 5, pp. 1670–1672, Sep. 1994.[31] A. Wyner, “An upper bound on the entropy series,”
Information andControl , vol. 20, no. 2, pp. 176–181, Mar. 1972.[32] S. Verd´u, “ELE528: Information theory lecture notes,”
PrincetonUniversity , 2009.[33] J. H. Conway and N. J. A. Sloane,
Sphere packings, lattices andgroups . Springer Science & Business Media, 2013, vol. 290.[34] E. C. Posner, E. R. Rodemich, and H. Rumsey, “Epsilon-entropy ofstochastic processes,”
The Annals of Mathematical Statistics , vol. 38,no. 4, pp. 1000–1020, Aug. 1967.[35] T. M. Cover and J. A. Thomas,
Elements of information theory ,2nd ed. John Wiley & Sons, 2012. [36] Y. Wu and S. Verd´u, “R´enyi information dimension: Fundamentallimits of almost lossless analog compression,”
IEEE Transactions onInformation Theory , vol. 56, no. 8, pp. 3721–3748, Aug. 2010.[37] T. Koch, “The Shannon lower bound is asymptotically tight,”
IEEETransactions on Information Theory , vol. 62, no. 11, pp. 6155–6161,Nov. 2016.[38] Y. Polyanskiy and Y. Wu, “Wasserstein continuity of entropy andouter bounds for interference channels,”
IEEE Transactions on Infor-mation Theory , vol. 62, no. 7, pp. 3992–4002, July 2016.[39] R. Zamir,
Lattice Coding for Signals and Networks: A StructuredCoding Approach to Quantization, Modulation, and Multiuser Infor-mation Theory . Cambridge University Press, 2014.[40] T. Koch and G. Vazquez-Vilar, “Rate-distortion bounds for high-resolution vector quantization via Gibbs’s inequality,” arXiv preprintarXiv:1507.08349 , 2015.[41] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Channel coding rate infinite blocklength regime,”
IEEE Transactions on Information Theory ,vol. 56, no. 5, pp. 2307–2359, May 2010.[42] M. Fradelizi, M. Madiman, and L. Wang, “Optimal concentration ofinformation content for log-concave densities,” in
High DimensionalProbability VII . Springer, 2016, pp. 45–60.
Victoria Kostina (S’12–M’14) joined Caltech as an Assistant Professor ofElectrical Engineering in the fall of 2014. She holds a Bachelor’s degreefrom Moscow institute of Physics and Technology (2004), where she wasaffiliated with the Institute for Information Transmission Problems of theRussian Academy of Sciences, a Master’s degree from University of Ottawa(2006), and a PhD from Princeton University (2013). Her PhD dissertationon information-theoretic limits of lossy data compression received Prince-ton Electrical Engineering Best Dissertation award. She is also a recipientof Simons-Berkeley research fellowship (2015). Victoria Kostina’s researchspans information theory, coding, wireless communications and control.(S’12–M’14) joined Caltech as an Assistant Professor ofElectrical Engineering in the fall of 2014. She holds a Bachelor’s degreefrom Moscow institute of Physics and Technology (2004), where she wasaffiliated with the Institute for Information Transmission Problems of theRussian Academy of Sciences, a Master’s degree from University of Ottawa(2006), and a PhD from Princeton University (2013). Her PhD dissertationon information-theoretic limits of lossy data compression received Prince-ton Electrical Engineering Best Dissertation award. She is also a recipientof Simons-Berkeley research fellowship (2015). Victoria Kostina’s researchspans information theory, coding, wireless communications and control.