[PDF] Coded Computing with Noise

Abstract

Distributed computation is a framework used to break down a complex computational task into smaller tasks and distributing them among computational nodes. Erasure correction codes have recently been introduced and have become a popular workaround to the well known "straggling nodes" problem, in particular, by matching linear coding for linear computation tasks. We observe that decoding tends to amplify the computation "noise", i.e., the numerical errors at the computation nodes. We use noise amplification as a performance measure to compare various erasure-correction codes, and in particular polynomial codes (which Reed-Solomon codes and other popular codes are a subset of). We show that noise amplification can be significantly reduced by a clever selection of the sampling points and powers of the polynomial code.

Full PDF

CCoded Computing with Noise

Royee Yosibash and Ram Zamir

EE - Systems DepartmentTel Aviv University, IsraelEmail: [email protected], [email protected]

Abstract —Distributed computation is a framework used tobreak down a complex computational task into smaller tasksand distributing them among computational nodes. Erasurecorrection codes have recently been introduced and have becomea popular workaround to the well known “straggling nodes”problem, in particular, by matching linear coding for linearcomputation tasks. We observe that decoding tends to amplify thecomputation “noise”, i.e., the numerical errors at the computationnodes. We use noise ampliﬁcation as a performance measureto compare various erasure-correction codes, and in particularpolynomial codes (which Reed-Solomon codes and other popularcodes are a subset of). We show that noise ampliﬁcation can besigniﬁcantly reduced by a clever selection of the sampling pointsand powers of the polynomial code.

Index Terms — distributed computation, erasure codes,polynomial codes, noise ampliﬁcation, DFT, frames, dif-ference set, equiangular tight frames, Jacobi/MANOVAdistribution.

I. I

NTRODUCTION

In the last years, some algorithms have struggled withthe run time of large scale and computationally complextasks needing a many consecutive calculations. A commonpractice for decreasing run time in such algorithms is using alarge distributed system comprised of individual computationalnodes (described in detail in section III). However, theselarge systems have to present new challenges to whom thesystem designer has to mitigate. One of the more signiﬁcantchallenges are the "stragglers" – computational nodes whom,unexpectedly, have a signiﬁcantly higher response time thantheir non-straggling counterparts. These straggler nodes createa computational bottleneck that delays the end product ofthe ﬁnal computational product needed from all the systemsnodes. Taking this uncertainty into account calls for system’sengineer to ﬁnd some sort of "back-up" scheme in order toensure high-quality service. One such method is implementinga coding technique taken from the realm of information andcode theory. In information theory’s terms, these stragglernodes are considered as "erasures" – a symbol in a streamthat is transmitted and instead of arriving correctly throughthe channel is lost and the receiver knows only throughside information that the symbol’s real value is unknown.The purpose of section IV is to explain how these erasuresand a coding scheme ﬁt into a distributed computing model.Some of the codes that have been looked into include generalmaximum distance separable (MDS) codes [6], Reed-Solomon (RS) or Bose–Chaudhuri–Hocquenghem (BCH) codes [9] andthe general case polynomial codes [7]. In order to evaluatewhich code is best suited for distributed computation, manyresearch groups have chosen performance measures such ascomputational complexity of recovery [7], or average run-time[6], in order to show that a certain code is a good solution.This paper discusses another perspective of coding anddecoding that was left rather unexplored: noisy calculation.Section V proposes a model that introduces some form of noiseto the computation due to ﬁnite word-length. The existence ofsuch noise promotes a question of the noise ampliﬁcation arising in the decoding process, which becomes even morecomplex with the assumption of stragglers. Using aspects offrame theory and random matrix theory this paper will presenthow the decoding scheme ampliﬁes "noisy" computationsreturned from computational nodes with respect to the randomaspect induced by stragglers. In section VI we discuss recenttheoretical results in frame theory and random matrix theoryto provide different frames that are good/bad benchmarks forthe noise ampliﬁcation performance measure. We also presenthow new variations of polynomial codes over the complexﬁeld could be constructed with design guidelines taken fromthese good/bad benchmarks. In section VII We demonstratevia simulations that the noise ampliﬁcation of these codesfollow theoretical expectations and demonstrate that one ofthe suggested type of polynomial codes has near identicalampliﬁcation as the benchmark for good noise ampliﬁcation.II. N

OTATION

In order to not cause confusion due to coinciding notationsbetween Frame and code theory, in this paper the notationsshall be set as deﬁned in this section. The notation for anyone dimensional variable a is written with any roman letters.The vector or set of one dimensional elements notation v iswritten in bold lowercase letters. All vectors are representedas column vectors and so for vectors dim ( v ) = h × . Thematrix notation A = { A j,i } , in bold capital letters, will bedeﬁned as having dim ( A ) = h × (cid:96) . The parameter γ is calledthe frame aspect ratio or the redundancy ratio and is deﬁnedfor A as γ = h(cid:96) . The (cid:96) column vectors of A will be markedas a i and the vector cross correlation is deﬁned as: c i ,i = (cid:104) a i , a i (cid:105) = h (cid:88) j =1 A ∗ j,i A j,i (1) a r X i v : . [ c s . I T ] F e b ere ∗ is the notation for conjugate transpose. Matrices/framesare deﬁned as being "unit-norm columns" if for all i ∈ [ (cid:96) ] : c i,i = 1 . If all c i,i (cid:54) = 0 then a matrix/frame can be transformedinto being unit-norm columns by dividing all a i by c i,i .III. T HE NOISELESS SETUP

We consider a problem in a distributed computation setupas a function f over a data set A which is deﬁned over somearbitrary ﬁeld F h × (cid:96) . The function f over the data set A mustalso return an output in the same ﬁeld but may have differentdimensions. The task of computing f ( A ) is too complex andso the task is broken down into some m simpler tasks denoted g , · · · , g m each operating on A or a subset of elements in A , denoted also as A , · · · , A m . The master node utilizesthe worker nodes by sending over the g i functions with thecorresponding subsets A i to the nodes so each node i receives g i and A i . Each node computes the simpler task g i ( A i ) andreturns the answer to the master node. The restriction on thefunction set { g i } and the subsets { A i } is that the master nodehas to be able to reconstruct f ( A ) with only { g i ( A i ) } . Ablock diagram of a distributed computation setup is given inFigure 1. Fig. 1. The distributed computation setup without coding

In a distributed computation setup a coding scheme is afunction that operates on the m subsets { A i } to create n > m new subsets denoted { A (cid:48) i } and n new functions { g (cid:48) i } . Thesesubsets and functions are now sent in the same manner to n nodes. After receiving the answers the decoder scheme is afunction that operates on { g (cid:48) i ( A (cid:48) i ) } and converts them backinto { g i ( A i ) } for the master node to compute f ( A ) . Theencoder might not need all n nodes, and might be able to useonly m < k < n answers from nodes to retrieve all { g i ( A i ) } .In order to simplify, we only discuss cases in which all g i ’sare identical and unchanged when preformed on A (cid:48) i . Therefore,the coding scheme only operates on { A i } . The encoding anddecoding schemes slightly alter the setup described in Figure1 and the new setup is described in Figure 2.In order to satisfy the prerequisite of reconstruction of all { g ( A i ) } from the returned { g ( A (cid:48) i ) } without the need to ofcreating a suitable coding scheme that has to be custom madefor each function g , the function and coding scheme have to becommutative operators. For this reason, we restrict our codingschemes to linear codes and the function g to be linear andcommutative to these linear transforms of the subsets { A i } .The linear transformation of the m elements of { A i } to the Fig. 2. The distributed computation setup with coding n elements of { A (cid:48) i } can be represented in matrix form by thecode generator matrix F T with dim ( A ) = n × m × F h × (cid:96) : F T [ A i , · · · , A m ] T = [ A (cid:48) i , · · · , A (cid:48) m ] T (2)A good example of a problem that might beneﬁts from adistributed computation solution is the matrix multiplicationproblem: For some matrix A , with dim ( A ) = h × (cid:96) , and avector x , with dim ( x ) = (cid:96) × , the distributed model uses n computational nodes in order to calculate A T x . The masternode then sends each node i some matrix T i along with thevector x and receives back a result in the form of a vector r i .After gathering enough answers from the nodes to compute theproduct of A T x the master considers the computation as com-pleted successfully. A "naïve" approach, that does not yet takeinto account stragglers, is to divide the rows of A into n equalparts creating n matrices that satisfy A T = [ A T , · · · , A Tn ] T with each matrix A i having dim ( A i ) = h × (cid:100) (cid:96)n (cid:101) (addingrows of zeros if n does not divide (cid:96) ). Each computationalnode i can then calculate r i = A Ti x and after transmitting theresult back to the master node. In turn, the master node sumsup all the results to get: A T x = [( r ) T , · · · , ( r n ) T ] T . Byassuming the nodes calculate at the same pace and neglectingcommunication delays it is easy to see the master node hasincreased the speed of the calculation of A T x by a factor of n . To this point, the motivation for coding wasn’t obvious. Theintroduction of stragglers is what makes coding an importantpart of distributed computation.IV. T HE ERASURE MODEL

Erasure, as described before, is the event of a computationalnode not returning a response in due time (or at all) and thedecoder deems it unresponsive. This can be seen as the decoderhaving side information on which of the n nodes are valid andwhich are erased. The number of remaining nodes is denotedas k and we denote the set of node indices that have not beenerased as k ⊆ [ n ] .Returning to the model of trying to compute f ( A ) usingcomputational nodes, the introduction of straggles creates amajor drawback to this distributed setup: The calculation timeis lower bounded by the slowest node. If any node is strag-gling, and all nodes are needed to complete the calculation(the naïve approach), then the effectiveness of this methodis dramatically reduced. In a system with a very large n thedesired runtime, that would have decreased with n , is hinderedy lower bound on runtime if the system suffers at least onestraggler nodes (the probability of which increases with n ).As been alluded to before, one way to try to mitigate thestraggler problem is by using a coding scheme with erasureresilient codes.Continuing with the example of a matrix multiplicationproblem, it easy to show an example for a simple codingscheme to mitigate a single straggler: For n = 3 computationalnodes tasked to solve A T x and assuming that no more thanone node will straggle we choose to recreate A T x by dividing A T into A T = [ A T , A T ] T and then encoding these twomatrices as three matrices with the following erasure code: F = (cid:20) I dim ( A i ) dim ( A i ) I dim ( A i ) dim ( A i ) I dim ( A i ) I dim ( A i ) (cid:21) (3) F T (cid:104) A T , A T (cid:105) T = (cid:104) A T , A T , (cid:0) A T + A T (cid:1)(cid:105) = (cid:104) A (cid:48) T , A (cid:48) T , A (cid:48) T (cid:105) T (4)Its easy to see that after receiving any two out of threecalculation results (cid:104) A (cid:48) T x , A (cid:48) T x , A (cid:48) T x (cid:105) one can recreateboth A T x and A T x and then give out the solution originaltask A T x = [ A T x , A T x ] .It is useful to have a representation of erasures as an opera-tor that operates on F . The operator uses the ( n − k ) elementsin k (cid:48) = { x ⊆ [ n ] : x / ∈ k } and nulliﬁes the same indexcolumns in F (and equivalently the same index rows in F T ).This operator is simply a "column retain matrix", P , whichright hand multiplies F to create the equivalent F s = F · P code generator matrix. The matrix P is constructed by takinga k × k identity matrix and inserting rows of all zeros to sothat the matrix P has rows of zeros in the same indices asthe elements of k (cid:48) . One can also create a slightly different P matrix by using an n × n identity matrix and changing everycolumn to a column of all zeros if its index is in k (cid:48) . Thisalternative method has zeroed columns instead of having thesecolumns omitted and while is an equivalent representation oferasures, the zeroed columns are redundant and have no use inthe decoding scheme. The new frame created due to erasureshas a new sub-frame aspect ratio of β = km .As mentioned before, the decoder does not necessarilyknow how many nodes will straggle. We could also use anassumption that the events of erasures are an i.i.d Bernoulliprocess with probability p and with that immediately followsthat the expected value of k is: E [ k ] = n · (1 − p ) (5)In the latter sections we will try to compare frames, and sohold k as a constant and the random variations will be whichelements comprises the set k .V. T HE NOISY SETUP

Prior to this paper, many papers have been discussingthe straggler problem in the framework discussed above. Wechoose to discuss another perspective that was left unexplored: noisy computation. Assume a simple model of a distributedcomputation system designed with an encoding method chosenas F Ts to better handle stragglers. After transmitting the taskto be performed at each computational node, a node i returnsthe output, u i = A i x with some noise denoted z . While noisefrom computation might not be detached from the value of u i , we choose to approximate the noise as some additivei.i.d process in order to simplify the model. The returnedtransmission is therefore: r i = u i + z = A i x + z (6)After gathering enough r i ’s the reconstruction of the origi-nal task is done, with the assumption of HSNR, by using thepseudo-inverse of the encoding matrix: A dec = (cid:0) ( F Ts ) ∗ F Ts (cid:1) − ( F Ts ) ∗ (7)The noise ampliﬁcation is deﬁned as the MSE dividedby the variance of the i.i.d noise. It equivalently deﬁnes thedegradation in SNR. Therefore, by decoding with the pseudo-inverse of the encoding matrix the noise ampliﬁcation is: N oise Amplif ication = M SEσ z = 1 k trace (( A ∗ dec A dec ) − ) (8) Lemma 1. [2]

The noise ampliﬁcation of any unit-normcolumn matrix A with dim ( A ) = h × (cid:96) is lower boundedby its frame aspect ratio γ . Because noise ampliﬁcation is in direct relation to trace ( A ∗ dec A dec ) − we choose have all frames discussedbecome unit-norm in order to later properly compare themin a noise ampliﬁcation performance measure.It’s important to note that having code generator matricestransformed into unit-norm has no effect on the "wellness" ofa coding scheme as this "new" generator matrix retains a nearidentical decoding schemes as the original decoding scheme(with the generator matrix that wasn’t normalized). Proof.

Dividing all columns of the generator matrix by theirrespective c i,i is equivalent to creating a new source of m elements - each new element being the same elementas the one in the previous source set only divided by thenormalization factor of the corresponding column index. Soif the "new" source elements can be decoded by the originalcode scheme then recovering the original source elements backfrom the now decoded "new source" is trivial – proof that theoperation retains a viable decoding scheme.From this point forward all code generator matrices dis-cussed, unless speciﬁed otherwise, are to have unit-normcolumns. VI. C ODES AS F RAMES

A. Known frames

When choosing and discussing noise ampliﬁcation as theperformance measure of one type of frame or another, it’simportant to show examples of the types of frames who haveeen referred to as benchmarks to good/bad performancesin this measure. Frames based off choosing some s ⊆ [ n ] rows of the DF T ( n ) matrix can have very different noiseampliﬁcations after erasures. The ﬁrst example is choosing s = [ m ] (the ﬁrst m consecutive rows) of the DFT matrix.The pattern created is recognized as the matrix form of a low-pass ﬁlter as the truncated ( n − m ) rows of the DF T ( n ) matrixare those who give weight to the ( n − m ) highest frequencies.The band-pass \ notch ﬁlter frame is created in the same wayby choosing s as consecutive series [ n ] with a cyclic shift over n . All these frames have the same Gram matrix because theyare all identical up to a scaling factor who is some root ofunity (and is canceled out in the Gram matrix) and so havethe same noise ampliﬁcation. These frames also seem to bevery noise-amplifying [2] [8] [10].A better frame in terms of noise ampliﬁcation is a matrix A with dim ( A ) = m × n whose entries are all random i.i.dGaussian random variables with variance √ m . Any sub-matrixwith any k of the n columns chosen has asymptotically forlarge m a column norm of 1. Its been shown in [11] thatthe eigenvalues distribution of the Gram matrix of this frameconverges to the Marchenko–Pastur (MP) density: f MP ( x ) = (cid:113) ( x − λ MP − )( λ MP + − x )2 πβx · I ( λ MP − ,λ MP + ) (9) λ MP ± = (cid:16) ± (cid:112) β (cid:17) (10)As noted in [2], the noise ampliﬁcation of this type offrame is asymptotically β − . While this result is better thanthe low pass ﬁlter, we can improve on both by using theDFT matrix but choosing s whom qualiﬁes as a differenceset. This sub-frame of the DFT was proved to be an ETF(Equiangular Tight Frame) [12]. ETFs have suggested to haveGram matrix eigenvalues that be asymptotically distributed asin the MANOVA distribution [3]: f MANOV A ( x ) = (cid:112) ( x − r − )( r + − x )2 πβx (1 − γ · x ) · I ( r − ,r + ) (11) r ± = (cid:16)(cid:112) (1 − γ ) β ± (cid:112) − γβ (cid:17) (12)Notice that for γ → the MANOVA distribution con-verges to the MP distribution. ETFs seem to have betternoise ampliﬁcation [3] than most (if not all) frames withthe same dimensions and so are a benchmark for good noiseampliﬁcation.In [1] it is shown in that the eigenvalue distribution of arandom selection of s ⊆ [ n ] converges almost surely to theMANOVA distribution. B. Polynomial codes represented as frames

Now that the motivation for discussing erasure codes andthe noise ampliﬁcation arising from these codes are clear, letus deﬁne and discuss the following codes and their generatormatrices:

Deﬁnition 1.

Polynomial code

Given two parameters ( n, m ) ∈ N and two sets s ∈ F n (here F is either some Galois ﬁeld or an inﬁnite ﬁeld) and z ⊆ [ n − ∪{ } a polynomial code is a linear transformation of m elements, in the ﬁeld F h × l and denoted A j , in the followingmanner: A (cid:48) i = m − (cid:88) j =1 A j s z j i (13)Where A (cid:48) i are n encoded elements who are deﬁned over thesame ﬁeld as A i . The n elements in s are also called samplepoints. The m elements of z are also called the polynomialpowers. This linear transformation could also be described inmatrix form:  A (cid:48) ... A (cid:48) n  =  s z s z · · · s z m ... ... ... s z n − s z n − · · · s z m n − s z n s z n · · · s z m n (cid:124) (cid:123)(cid:122) (cid:125) F TPC  A ... A m  (14)While polynomial codes are deﬁned over an arbitrary ﬁeld,we will continue to discuss only polynomial codes deﬁnedover the complex ﬁeld. Its important to note that all samples s must be distinct and / ∈ s . Also, notice that if z = [ m − ∪ { } then F P C is the Vandermonde matrix, otherwise itis a generalized Vandermonde matrix [4]. In the case of z =[ m − ∪ { } , the code is the well known Reed-Solomon codeover the complex.We will now deﬁne and discuss a few key examples in thisfamily of polynomial codes: Deﬁnition 2.

Polynomial code with uniform sampling of theunit circle (USPC)

We deﬁne a polynomial code with a uniform sampling as apolynomial code with z having [ m − consecutive membersin [ n − and the sample points in s as: s j = exp (cid:18) πin · ( j − (cid:19) = ω j − (15)For z = [ m − ∪ { } these samples create the followingcode generating matrix: F TUSP C = 1 √ m  · · · ω · · · ω m − ... ... ... ω n − · · · ω ( n − m −  (16)If z (cid:54) = [ m − ∪ { } then in the process of making theframe unit norm it will also divide the columns of the frameby s z i and the unit norm transform of the frame will then beidentical to the unit norm transform of F TUSP C . Notice thatframe deﬁned in equation (16) is identical to the

DF T ( n ) matrix with the ( n − m ) latter columns omitted and thenmultiplied by (cid:112) nm (due to the frame the normalization factor).s mentioned in section VI-A, this is recognized as a low-passand as a frame it has been shown to be very noise amplifying. Deﬁnition 3.

Polynomial code with non-uniform sampling ofthe unit circle (NUSPC)

We deﬁne polynomial code with a uniform sampling asa polynomial code as a variation of the same code with auniform sampling. The NUSPC is deﬁned by the parameters ( n, m, b, r ) ∈ N and the set y ⊆ N rb and also y ⊆ [ r − .With these parameters the NUSPC sample points set is: s = { ω yj + r · αb : y j ∈ y , α ∈ N } (17)Notice that the constraint on the number of elements in y isimposed so the number of unique sample points in s remains n . We can also see that a USPC is a special case of a NUSPCand can be created by choosing y j that are all equally spacedin r . This means NUSPC also has the potential to be is verynoise amplifying. Deﬁnition 4.

Polynomial code with uniform sampling of theunit circle and non consecutive powers.

The two codes deﬁned in deﬁnitions 2 and 3 vary the choicein sample points. This choice creates variations/irregularitiesin the dimension that is subject to erasures. We would also liketo try to introduce irregularities in the dimension is not subjectto erasures. We deﬁne this type of polynomial code as one thatuses the same samples as in deﬁnition 2 but chooses the set z as one that does not contain a consecutive series in [ n − . Thediscussion in subsection VI-A implies the following lemma: Lemma 2. If z is a difference set, and the samples set are asdescribed in (15) , then this polynomial code is an ETF. More generally, we expect that even if z is chosen wiselyyet not a difference set (e.g., randomly), the code will stillhave noise ampliﬁcation that is close to the good benchmarkof ETFs. VII. N UMERICAL R ESULTS

In order to show the performances of the codes describedin VI-B, we show the empirical results for codes with m = 50 and n = γ · m with γ in a wide range. The number of nodeserased out of the total amount of nodes was set as kn = 0 . .Each measurement of a single m, γ and code type choose200 random codes that ﬁtted the description of valid code indeﬁnitions 2, 3, 4 and averaged the noise ampliﬁcation in trials. The best noise ampliﬁcation of a code with the givenset of m, γ was then chosen. The comparison is shown the inthe graph described in Figure 3:As theorized in subsection VI-B, it is clear that the poly-nomial code created by choosing non consecutive powers out-preforms polynomial codes with consecutive powers. We alsosee that as expected, the code created by a choosing nonconsecutive powers closely mimics the noise ampliﬁcation of amatrix with eigenvalues drawn from the MANOVA probabilitydensity function. In relation to the discussion in 4, we concludethat the introduction of irregularities in the dimension that Fig. 3. Comparison of noise ampliﬁcation Vs. γ − for different polynomialcodes is not subject to erasures is preferable to only introducingirregularities in the same dimension as the erasures. Thisphenomena is interesting and should be investigated in futureworks.We also see that using non-uniform sampling of the unitcircle doesn’t seem to have a clear advantage in comparisonto a uniform sample in γ ’s that are much greater then the kn ratio. The reason for this behavior is that the worst-case noiseampliﬁcation under the uniform sampling are much higherthan the non-uniform sampling case. But outside of these fewout-liners (who also considerably drive up the mean noiseampliﬁcation), uniform sampling has lower noise ampliﬁcationthan most variations of the non-uniform sampling codes. Inan average log ( N oise Amplif ication ) measure the uniformsampling code out-preforms the non-uniform sampling codein all γ ’s tested. VIII. F UTURE W ORK

While we deﬁned the deﬁnition of polynomial codes overan arbitrary ﬁeld F , we choose to analyze codes deﬁned overthe complex ﬁeld in order to draw from theoretical results inframe theory and random matrix theory. In the future we hopeto analyze cases of codes deﬁned over ﬁnite ﬁelds where thedesign guidelines are not as immediate. As we discussed insection V the noise model might be too simplistic and futurework might expanded it to some other form of non-additivenoise. IX. A CKNOWLEDGMENTS

We would like to thank Itzhak Tamo for the insightful dis-cussion on suitable erasure codes for distributed computationsystems. This research was partially supported by the IsraelScience Foundation, grant

EFERENCES[1] B. Farrell L imiting Empirical Singular Value Distribution of Restrictionsof Discrete Fourier Transform. Journal of Fourier Analysis and Applica-tions 17.4 (2011): 733-753.[2] M. Haikin and R. Zamir Analog coding of a source with erasures . 2016IEEE International Symposium on Information Theory (ISIT). IEEE,2016.[3] M. Haikin, R. Zamir and M. Gavish

Frame moments and welch boundwith erasures . 2018 IEEE International Symposium on Information The-ory (ISIT). IEEE, 2018.[4] E. R. Heineman

Generalized vandermonde determinants . Transactions ofthe American Mathematical Society 31.3 (1929): 464-476.[5] K. S. Kedlaya and C. Umans

Fast polynomial factorization and modularcomposition . SIAM Journal on Computing 40.6 (2011): 1767-1802.[6] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos and K. Ramchandran

Speeding up distributed machine learning using codes . IEEE Transactionson Information Theory 64.3 (2017): 1514-1529.[7] S. Li and A. S. Avestimehr

Coded Computing: Mitigating FundamentalBottlenecks in Large-Scale Distributed Computing and Machine Learning ,(2020), pp. 66-102.[8] A. Mashiach and R. Zamir

Noise-shaped quantization for nonuniformsampling G radient Codingfrom Cyclic MDS Codes and Expander Graphs. IEEE Transactions onInformation Theory 66.12 (2020): 7475-7489.[10] D. Seidner, and M. Feder Noise Ampliﬁcation of Periodic NonuniformSampling , IEEE Trans. Signal Process., vol. 48, no. 1, pp. 275-277, Jan.2000.[11] A. M. Tulino, S. Verdú

Random matrix theory and wireless communi-cations . Now Publishers Inc, 2004.[12] P. Xia, S. Zhou, and G. B. Giannakis