[PDF] Statistical Robust Chinese Remainder Theorem for Multiple Numbers

Abstract

Generalized Chinese Remainder Theorem (CRT) is a well-known approach to solve ambiguity resolution related problems. In this paper, we study the robust CRT reconstruction for multiple numbers from a view of statistics. To the best of our knowledge, it is the first rigorous analysis on the underlying statistical model of CRT-based multiple parameter estimation. To address the problem, two novel approaches are established. One is to directly calculate a conditional maximum a posteriori probability (MAP) estimation of the residue clustering, and the other is based on a generalized wrapped Gaussian mixture model to iteratively search for MAP of both estimands and clustering. Residue error correcting codes are introduced to improve the robustness further. Experimental results show that the statistical schemes achieve much stronger robustness compared to state-of-the-art deterministic schemes, especially in heavy-noise scenarios.

Full PDF

IIEEE TRANSACTIONS ON SIGNAL PROCESSING 1

Statistical Robust Chinese Remainder Theorem forMultiple Numbers

Hanshen Xiao, Nan Du, Zhikang T. Wang and Guoqiang Xiao

Abstract —Generalized Chinese Remainder Theorem (CRT) isa well-known approach to solve ambiguity resolution relatedproblems. In this paper, we study the robust CRT reconstructionfor multiple numbers from a view of statistics. To the best ofour knowledge, it is the ﬁrst rigorous analysis on the underlyingstatistical model of CRT-based multiple parameter estimation.To address the problem, two novel approaches are established.One is to directly calculate a conditional maximum a posterioriprobability (MAP) estimation of the residue clustering, and theother is based on a generalized wrapped Gaussian mixture modelto iteratively search for MAP of both estimands and clustering.Residue error correcting codes are introduced to improve therobustness further. Experimental results show that the statisticalschemes achieve much stronger robustness compared to state-of-the-art deterministic schemes, especially in heavy-noise scenarios.

Index Terms —Chinese Remainder Theorem (CRT), AmbiguityResolution, Generalized Gaussian Mixture Model, Maximum aposteriori probability (MAP),

I. I

NTRODUCTION

Pioneered by Xia’s remarkable works [21], [20], there is a richline of works to advance the understanding of number theorybased sparse sensing. Due to physical limitation, estimationswith integer ambiguity solution are frequently encounteredin many practical scenarios. Such cases include frequencydetermination of undersampled waveforms [5], [8], [19], [28],[29], phase unwrapping [9], [23] etc., which can be modeledby solving some Diophantine equations. Therefore, algebraicapproaches can be applied as an alternative way to thoseclassic problems. Representatively, Chinese Remainder Theo-rem (CRT) based reconstruction and co-prime or nested basedsampling/arrays [11], [13], [14] are two successful examples.In particular, due to the nature of the distributed representationof a number with its residues, CRT based reconstructionshave been further used in applications such as localizationestimation in wireless networks [6], [15], detection of movingtargets using multi-frequency antenna array Synthetic ApertureRadar (SAR) [3], [4], [16], [31], which can be even executeddistributively. Generally speaking, the underlying problem canbe described as follows.

Problem of interests:

Consider a set of N numbers, Y [1: N ] = { Y , Y , ..., Y N } , and L ﬁxed moduli m [1: L ] = Hanshen Xiao is with CSAIL and the EECS Department, MIT, Cambridge,USA. E-mail: [email protected] Du is with Department of Statistics, Harvard University, Cambridge,USA. E-mail: [email protected] T. Wang is with the Department of Physics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan. E-mail: [email protected] Xiao is with the College of Computer and Information Science,Southwest University, Chongqing, China. E-mail: [email protected] { m , m , ..., m L } , which are all assumed to be integers tem-porarily. For each m l , one may observe an unordered set R [1: N ] ,l = { R l , R l , ..., R Nl } , where R il = (cid:104) Y i + ∆ il (cid:105) m l ,i.e., the residue of Y i modulo m l is perturbed by a noise ∆ il . Here, (cid:104) A (cid:105) B denotes the residue of A modulo B and ∆ il are assumed to be independently and identically distributed(i.i.d.) Gaussian noises for each i . R [1: N ] ,l are assumed to beunordered, which implies that the correspondences betweenthe elements R il in R [1: N ] ,l and Y [1: N ] are unknown. Theultimate goal is to robustly reconstruct Y [1: N ] using R [1: N ] ,l , l = 1 , , ..., L .The model described above captures a large class ofproblems. Suppose a sinusoidal signal of multiple frequen-cies Y [1: N ] is undersampled with multiple rates, m [1: L ] , andFourier transform is conducted on the L sample sequences toget frequency spectrums, respectively. From the locations ofpeaks in the spectrums, which may be perturbed with noise,one can estimate the residues of frequency Y i modulo m l .However, the correspondence relationship between the peaks,represented by R il , in the spectrum and Y [1: N ] are unknowndue to the modulo operation, where the disambiguation isshown to be a nontrivial problem [7], [22], [25]. The distanceestimation via multi-frequency phase measurement can also bethe case as above, where Y i stands for the distance while m l represents the carrier wavelength [6], [15]. Prior Art : On the whole, the underlying challengesare twofold: the correspondence ambiguity and perturbation.Though there have been limited interactions cutting acrossboth, each subproblem has been well studied separately.For the single number case, i.e., N = 1 , the numberreconstruction is usually called robust CRT (RCRT), whereresidues are perturbed with errors. Error control of Hamming-weighted errors in residue codes dated back to 1960s [12] andthe ﬁrst polynomial time decoding scheme was proposed in[1]. Nonetheless, in our case, small errors may occur acrossall observations, { R il } , and we are more interested in errorsbounded with inﬁnity norm. To this end, the ﬁrst closed-formRCRT for errors with a bounded magnitude was proposedin [19]. Generalized versions can be found in [29], [30]. Bydeploying non-co-prime moduli, where m l = Γ M l such that { M l , l = 1 , ..., L } are pairwise co-prime and Γ can be a realnumber, it is demonstrated that when | ∆ il | < Γ4 for each i and l , the reconstruction error is upper bounded by Γ4 as well[19]. The proof has been shown in [26], where such boundand modulus selections are optimal.For multiple numbers with errorless residues, the recon-struction is previously termed generalized CRT (GCRT). Themain focus is the largest dynamic range D for Y [1: N ] such a r X i v : . [ s t a t . O T ] A ug EEE TRANSACTIONS ON SIGNAL PROCESSING 2 that for arbitrary Y [1: N ] ∈ [0 , D ) N , they can be uniquelydetermined from their unordered residues modulo m [1: L ] . Theﬁrst generic lower bound of D was given in [20], and furthersharpened by [10]. In particular, when the residues of Y [1: N ] modulo m l are distinct for l ∈ { , , ..., L } , polynomial timeGCRT exists [24]. So far, the closed form of D is only knownwhen N = 2 [17].To tackle correspondence ambiguity and perturbation simul-taneously, generalized robust CRT(GRCRT) has been devel-oped in [7] for N = 2 and later generalized to arbitrary N [25]. Analogously, under the same setup, it is found when Γ (cid:81) Ll =1 M l = O ( D N ) and max i,l | ∆ il | < Γ4 N , Y [1: N ] can beuniquely and robustly reconstructed with deviation boundedby Γ4 N [25]. Motivation:

CRT suggests that, given a set of moduli m [1: L ] , there is a bijection between a non-negative number X and its residues modulo m [1: L ] when X is less than theleast common multiple (lcm) of m [1: L ] . Said another way, thelcm of m [1: L ] is the maximal utilization of moduli for numberreconstruction. However, when N ≥ , the severe limitation ofprior works mainly arises from the large redundancy requiredto disambiguate residues and tolerate perturbation. When themoduli are in a form m l = Γ M l , the robust deterministicreconstruction relies on the assumption that all | ∆ il | shouldbe bounded by Γ4 N . On the other hand, either in [25] or[10], it trades off the utilization rate of moduli by shrinkingthe dynamic range D to O ( lcm ( m [1: L ] ) /N ) , to uniquelydetermine the correspondences between R [1: N ] ,l and Y [1: N ] ,where the number of moduli L is proportional to the numberof estimands N . Consequently, as N increases, which incurs alarger L , Γ has to be sharply enlarged as well to meet the errorbound in considerable probability. Apparently, it is a paradoxthat existing schemes behave even worse with more samplesobtained.Clearly, the unknown correspondences between residues andreconstructed numbers are the essential bottleneck. When thecorrespondences are known, the reconstruction of N numbersis simpliﬁed to apply RCRT for a single number N times.Indeed, as explained in [19], RCRT matches the maximalmoduli utilization rate. Therefore, it is natural to ask whetherGRCRT can also achieve such maximal moduli utilization rate.In this paper, resorting to statistics, we answer this questionafﬁrmatively. Contribution and Organization:

To the best of our knowl-edge, the underlying statistical model of GRCRT has not beensystematically studied. The most closely related work is themaximum likelihood estimation (MLE) based RCRT exploredin [18], [6] for a single number. In this paper,1) We show GRCRT can be described by a generalizedwrapped Gaussian Mixture Model (GMM) with extrainformation on sampling. A systematic statistical analy-sis is presented.2) Any successful estimation depends both on reliablestatistical inference and a computationally efﬁcient im-plementation. We propose two efﬁcient algorithms toaddress the problem. In Algorithm 1, we ﬁrst derivethe maximum a posteriori probability (MAP) of residueclustering under Assumption 1 in a semi-closed form and the problem is thus reduced to N independent con-ventional RCRT. In addition, inspired by K -means clus-tering, we further propose Algorithm 2 as an iterativescheme to approximate the MAP of both reconstructed Y i and residue clustering in general.3) We show that the tradeoff amongst the three primaryparameters, N , L and Γ , can be further improved by in-corporating error correcting codes against outliers. Thor-ough simulation results show that the statistical schemessigniﬁcantly improve the performance compared withdeterministic methods, especially for the high-noisecase. For the extremely low-noise case, the deterministicmethods may outperform the proposed methods, whichis consistent with the theoretical analysis.The rest of the paper is organized as follows. In PartII, the background and methology of the proposed schemesare presented. In part III, the MAP of residue clustering isanalyzed and we prove the optimal solution can be expressedin a semi-closed form under Assumption 1. Part IV develops aframework of generalized GMM and an expectation maximiza-tion (EM) based scheme is proposed to approximate the MAPof both clustering and estimands. In Part V, the simulationresults of the performance comparison and parameter tradeoffare presented. Residue codes are further introduced to tolerateclustering errors. We conclude and provide future prospects inPart VI. II. B ACKGROUND & M

ETHODOLOGY

First of all, we specify the problem formally. Given moduli m [1: L ] = { m l = Γ M l | l = 1 , ..., L } , where M l are pair-wise co-prime, there are N numbers, denoted by Y [1: N ] = { Y i | i = 1 , ..., N } , to be reconstructed. For the l th samplerwith a modulus m l as sampling rate, an unordered sampleset, { R il , i = 1 , , ..., N } , is obtained, where R il is theresidue of Y i interfered with noise ∆ il modulo m l , i.e., R il = (cid:104) Y i +∆ il (cid:105) m l . Here, ∆ il , i = 1 , , ..., N , is i.i.d. Gaussian noisefollowing N (0 , σ l ) . We deﬁne R [1: N ] ,l = ( R l , R l , ..., R Nl ) for short. Furthermore, we assume Y [1: N ] are independentlyand uniformly distributed in [0 , D ) , where D = Γ (cid:81) Ll =1 M l ,i.e., the lcm of m [1: L ] , termed as the dynamic range. In thefollowing, let ˆ Y [1: N ] denote the estimations of Y [1: N ] . Remark 1.

It is noted that the deﬁnition of erroneous residuesin our paper is a generalization of Wang and Xia’s prior works[19], which assumes R il = (cid:104) Y i (cid:105) m l + ∆ il . Such deﬁnition ig-nores the cases when (cid:104) Y i (cid:105) m l + ∆ il < or (cid:104) Y i (cid:105) m l + ∆ il ≥ m l . Robustness:

We ﬁrst ﬂesh out how existing works achieverobustness. Different from the binary systems, the residuenumber systems are very sensitive to residue errors, wherea small error occurring in one residue may cause a largedeviation in reconstruction by trivially applying CRT. It ismainly resulted from the non-weighted nature of residuerepresentation.Therefore, the elegant idea applied in Wang’s work [19] isto recover the quotient of Y i divided by Γ , i.e., (cid:98) Y i Γ (cid:99) . Indeed,once (cid:98) Y i Γ (cid:99) is correctly reconstructed, we can escape the restrainof modulo operations and the rest things are trivial to estimate EEE TRANSACTIONS ON SIGNAL PROCESSING 3 (cid:104) Y i (cid:105) Γ . Geometric explanations can be found in [25] [28]. Thefollow-up works on GRCRT also follow the same idea. Theimplementation of GRCRT [7] [25] can be simply concludedas two steps:1) First, convert GRCRT to GCRT by constructing newresidue set ˆ R l = {(cid:104)(cid:98) Y i Γ (cid:99)(cid:105) M l , i = 1 , , ..., N } ;2) Apply GCRT to ﬁnd the correspondence relationship ofelements between ˆ R l and Y [1: N ] and then reconstruct { Y i } .However, in order to get ˆ R l , according to [7], [25], it isrequired that for each i , max l ∆ il − min l ∆ il < Γ2 N = δ. (1)For simplicity, let us assume that the variances σ l are the samefor each l as σ temporarily here to ease the analysis. Sincethe errors ∆ il are i.i.d. Gaussian noises, the probability that(1) holds is (cid:18) (cid:90) ∞−∞ p ( x )(Φ( x + 2 δ ) − Φ( x )) L − dx (cid:19) N (2)where p and Φ are the probability density and cumulativedistribution function of a Gaussian N (0 , σ ) , respectively.Clearly, (2) can be further upper bounded by (Φ( δ ) − Φ( − δ )) N ( L − (3)As N increases, with ﬁxed σ , (3) decays exponentially in anorder of O ( N ) . Methology:

In Fig. 1, it provides a more intuitive view withrespect to the algebraic structure of residue representation.With modulo operation, the real axis R is folded and wrappedinto a circle, of which the length equals to the modulus. Whenthe moduli are in such a form m l = Γ M l , the following holds: µ i = (cid:104)(cid:104) Y i (cid:105) m l (cid:105) Γ = (cid:104) Y i (cid:105) Γ (4)As a property shared by all residues of Y i modulo m [1: L ] ,we term µ i the common residue of Y i . The operation ofmodulo Γ can be viewed as a projection in residue space,shown in Fig. 1. Obviously, if { µ i } are distinct, they canbe used to ﬁnd the correspondences between R [1: N ] ,l and Y [1: N ] . However, with the occurrence of errors, the strategyfails to provide correct determination. However, it inspires usto estimate the correspondences from clustering (cid:104) R il (cid:105) Γ . Thisis the key idea of proposed reconstruction schemes, where weonly deal with µ i instead of searching across [0 , D ) . On theother hand, CRT plays a role to aggregate the residues across m [1: L ] to ﬁnd out the number they represent on the outer circlemodulo Γ (cid:81) Ll =1 M l . The two key operations, projections to thecircle modulo Γ and CRT, which will frequently appear in thefollowing context, are illustrated in Fig. 1.Throughout the rest of the paper, we will show that inorder to achieve the maximal possible dynamic range, all thestatistical analyses on Y i can be elegantly replaced by those We ﬁnd that the assumptions that all | ∆ il | < Γ4 N in [7], [25] can berelaxed to (1). L is indeed linear proportional to N since it is required that the value ofthe lcm of LN moduli should be bigger than Y i . 𝑚𝑜𝑑 Γ 𝑚𝑜𝑑 Γ𝑀 ’ 𝑚𝑜𝑑 Γ𝑀 ( … 𝑚𝑜𝑑 Γ * 𝑀 +,+-’ Proj : . CRT

Fig. 1: Illustration for residue projection and CRTTABLE I: List of Notations

Notations Explanation L The number of samplings / moduli selected m l Moduli selected N The number of real numbers to be reconstructed Y i Real number to be reconstructed K l Permutation variable for each sampling R il Raw observations ∆ il Gaussian noise in observation µ i Common residue, residue of Y i modulo Γ r il Residue of observation R il modulo Γˆ Y i Estimation of Y i ˆ µ i Estimation of common residue of the erroneous common residues, r il = (cid:104) R il (cid:105) Γ . For theconvenience of readers, all constantly used notations are listedin Table I.III. A LGORITHM O NE : M AXIMUM A P OSTERIORI E STIMATION F OR R ESIDUE C LUSTERING

In this section, we will introduce our non-informative priorand describe the problem as a Bayesian statistical model. Wefurther show that under Assumption 1, the MAP of residueclustering is in a semi-closed form and can be determined from O ( N L ) candidates. Relying on the MAP of residue clustering,it is reduced to N conventional RCRT for a single number.For N real numbers, Y [1: N ] = { Y i , i = 1 , ..., N } uniformlydistributed in [0 , D ) , on achieving the maximal dynamic range, D is set as D = Γ × (cid:81) Ll =1 M l . For brevity, all noisyresidues sampled with L samplers are represented by R [1: L ] =( R [1: N ] , , R [1: N ] , , ..., R [1: N ] ,L ) . To specify the problem, weintroduce K l , l = 1 , , .., L , as a set of i.i.d. N -permutationvariables, which represents the underlying correspondencesbetween real numbers and residues. It is assumed that thepermutation variable K [1: L ] = ( K , K , ..., K L ) subjects touniform distribution. Under a speciﬁc K [1: L ] , it implies thatwe assume { R K l ( i ) ,l , l = 1 , , ..., L } are the residues of Y i .We decompose Y i as Y i = k i Γ + µ i , where µ i := (cid:104) Y i (cid:105) Γ denotes the residue of Y i modulo Γ , and k i denotes thecorresponding quotient. Since Y i follows a prior of uniformdistribution in [0 , D ) , k i is an integer random variable uni-formly distributed within { , , , ..., D Γ − } , and µ i uniformly EEE TRANSACTIONS ON SIGNAL PROCESSING 4 distributed within [0 , Γ) . Similarly, we decompose R il as j il Γ + r il , where r il := (cid:104) R il (cid:105) Γ denotes residue of R il modulo Γ , and j il denotes the quotient accordingly. We therefore moveall parameters and observations onto a ’smaller (inner) circle’(refer to Fig. 1) modulo Γ . Accordingly, we estimate K [1: L ] with MAP, denoted by ˆ K [1: L ] , i.e., ˆ K [1: L ] := arg max K [1: L ] p ( K [1: L ] | R [1: L ] ) ∝ arg max K [1: L ] p ( R [1: L ] | K [1: L ] ) ∝ arg max K [1: L ] (cid:90) Y ... (cid:90) Y N p ( R [1: L ] | Y [1: N ] , K [1: L ] ) dY ...dY N (5)where A ∝ B denotes that for two probability density A and B , A = cB for some constant c . The complexity ofdirectly solving the above objective function is prohibitivelyhigh, where there exist exponential many, L × N ! , candidatesof K [1: L ] . On the other hand, given a speciﬁc K [1: L ] , theintegration in (5) can be simpliﬁed to calculating the followingequation, (cid:90) Y i p ( R [1: L ] | K [1: L ] , Y i ) dY i ∝ (cid:90) Γ0 D Γ (cid:88) k i =0 L (cid:89) l =1 ∞ (cid:88) j Kl ( i ) l = −∞ p ( j K l ( i ) l Γ + r K l ( i ) l | k i Γ + µ i ) dµ i ∝ (cid:90) Γ0 D Γ (cid:88) k i =0 L (cid:89) l =1 ∞ (cid:88) j Kl ( i ) l = −∞ √ πσ l e − ( rKl ( i ) l − µi +( jKl ( i ) l − ki )Γ)22 σ l dµ i ∝ (cid:90) Γ0 L (cid:89) l =1 ∞ (cid:88) j (cid:48) Kl ( i ) l = −∞ √ πσ l e − ( rKl ( i ) l − µi + j (cid:48) Kl ( i ) j Γ)22 σ l dµ i (6)Here, j K l ( i ) l enumerates all integers in Z , so does j (cid:48) K l ( i ) j = j K l ( i ) l − k i . It is noted that when L = 1 , we can get: ∞ (cid:88) j (cid:48) Kl ( i ) l = −∞ (cid:90) Γ0 √ πσ l e − rKl ( i ) l − µi + j (cid:48) Kl ( i ) l Γ)22 σ l dµ i = (cid:90) ∞−∞ √ πσ l e − rKl ( i ) l − µi )22 σ l dµ i (7)This motivates us to think about whether we can removethe product term on l in (6) and simplify it into a closed-form formula as (7) under some mild assumptions. In thefollowing, we introduce Assumption 1, under which a poly-nomial time algorithm is creatively proposed to determinis-tically derive the MAP estimation for K [1: L ] . We start fromintroducing some notations for noise distributing intervals :for each i = 1 , , ..., N , we deﬁne an clockwise interval I i as I i = [ µ i + min l ∆ il , µ i + max l ∆ il ] , i.e., starting from µ i +min l ∆ il to µ i +max l ∆ il clockwise, which are illustratedin Fig. 2. In addition, let | I i | denote the length of the directedinterval I i , i.e., | I i | := max l ∆ il − min l ∆ il . Assumption 1.

There exists some point τ on the circle modulo Γ such that it is not within any interval I i and | I i | < Γ2 for i = 1 , , ..., N . Fig. 2: Illustration for the noise intervalFig. 3: Illustration for straightened circle cut at τ Remark 2.

As illustrated in Fig. 2, τ can be an arbitrarypoint on the circle which is not within any intervals I i . Onthe other hand, even if K [1: L ] is determined and the problemis simpliﬁed to N independent RCRT for a single number, westill need further limitations on ∆ il to guarantee successfulreconstructions. Robustness is proved to be achieved in [19],[26] when | I i | < Γ2 for i = 1 , , ..., N . That is the reason whywe assume | I i | < Γ2 for each i in Assumption 1. Assumption 1 provides the convenience in analysis wherethe circle can be virtually cut at point τ , and straightened intoa line where the order of ∆ il for each i is still preserved. Fig.3 is an illustration for the above operation if we cut the circlein Fig. 2 at τ and straighten it to a line. The order of r il on theline corresponds to the order ∆ il accordingly in an ascendingorder. Speciﬁcally, for each i , we denote { r ( il ) , l = 1 , , ..., L } as a clockwise order statistic of { r il , l = 1 , , ..., L } . Here, ( il ) denotes a permutation on the index { l, l = 1 , , ..., L } foreach i , such that r ( il ) is l th element of { r il , l = 1 , , ..., L } clockwise distributed on the circle starting from τ , illustratedin Fig. 2. Lemma 1.

For each i , the errors { ∆ ( il ) } are corresponding tothe subsequence r ( i , r ( i , ..., r ( iL ) , which are in ascendingorder, i.e., { ∆ ( il ) } are the order statistic.Proof. According to Assumption 1, τ is not within any I i .Therefore, I i , the directed interval deﬁned, is clockwise dis-tributed starting from (cid:104) µ i + min l ∆ il (cid:105) Γ to (cid:104) µ i + max l ∆ il (cid:105) Γ and τ is in the complementary part, [0 , Γ) /I i . It is obvious that r ( i is closest to τ in counterclockwise direction, while r ( iL ) is the closest one in clockwise direction among all r i [1: L ] . Foreach i , r ( il ) is exactly in the order, so corresponding { ∆ ( il ) } are arranged in ascending order as well.Lemma 1 shows that if τ is known, starting from τ , foreach i , the clockwise order of { r il } on the circle is exactlythe order of { ∆ il } in an ascending order accordingly. In orderto intuitively understand the relative positions of { r il } , weconvert the distribution of them on a circle to the one on EEE TRANSACTIONS ON SIGNAL PROCESSING 5 an axis (cutting the circle at τ and stretching it into a line),which will ease the following analysis. To this end, we givethe following deﬁnition and lemma. Deﬁnition 1.

When ≤ τ ≤ min r ( il ) or max r ( il ) ≤ τ < Γ ,for i = 1 , , ..., N and l = 1 , , ..., L , ˜ r il = r il (8) Otherwise, (cid:40) ˜ r il = r il , when r il ≤ τ ˜ r il = r il − Γ , when r il > τ (9) Lemma 2.

Under Assumption 1, given { r il } and { K l } for i =1 , , ..., N and l = 1 , , ..., L , I i can be uniquely determined.Proof. All samples { r il } are divided into N subsets accordingto { K l } , each of which includes the L error residues of Y i ,represented by { r K l ( i ) l , l = 1 , , ..., L } . When Assumption 1holds, there should exist a clockwise directed interval overthe circle starting from r K l ( i ) l and ending at r K l ( i ) l forsome l (cid:54) = l ∈ { , , ..., L } such that the length is smallerthan Γ2 . All the remaining samples { r K l ( i ) l , l = 1 , , ..., L, l (cid:54) = l , l } lie in the interval I i . Clearly, r K l ( i ) l is clockwiseneighboring to r K l ( i ) l . We claim such an interval is unique.Otherwise, we assume there are two indices l and l suchthat the clockwise directed interval, I (cid:48) i , starting from r K l ( i ) l to r K l ( i ) l also has a length smaller than Γ2 , which contains { r K l ( i ) l , l = 1 , , ..., L, l (cid:54) = l , l } . Thus, the interval I (cid:48) i includes the complement part of I i . Since | I i | is smaller than Γ2 , therefore, | I (cid:48) i | ≥ Γ −| I i | > Γ2 , which incurs a contradiction.Thus, our claim holds.To proceed from Lemma 2, we use A standing for As-sumption 1 for simplicity, shown in the following formulas.We further modify our objective function as follows: ˆ K [1: L ] := arg max K [1: L ] p ( K [1: L ] | R [1: L ] , A ∝ arg max K [1: L ] p ( K [1: L ] | R [1: L ] , A × p ( R [1: L ] | A )= arg max K [1: L ] p ( R [1: L ] | A , K [1: L ] ) × p ( K [1: L ] | A ) (10)In the following, we prove Assumption 1 is independent ofpermutation K [1: L ] . For any K [1: L ] , we have Pr( A | K [1: L ] ) = (cid:90) R [1: L ] p ( A , R [1: L ] | K [1: L ] ) d R [1: L ] = (cid:90) R [1: L ] p ( A | K [1: L ] , R [1: L ] ) × p ( R [1: L ] | K [1: L ] ) d R [1: L ] = (cid:90) R [1: L ] p ( A | K [1: L ] , R [1: L ] ) × p ( R [1: L ] ) d R [1: L ] (11) Also, Pr( A

1) = (cid:90) R [1: L ] (cid:88) K [1: L ] p ( A , K [1: L ] , R [1: L ] ) × p ( K [1: L ] ) d R [1: L ] = (cid:88) K [1: L ] (cid:90) R [1: L ] p ( A | K [1: L ] , R [1: L ] ) × p ( R [1: L ] | K [1: L ] ) × p ( K [1: L ] ) d R [1: L ] (12)Since we assume K [1: L ] are uniformly distributed, it suf-ﬁces to show independence that (cid:82) R p ( A | K [1: L ] , R [1: L ] ) × p ( R [1: L ] | K [1: L ] ) d R [1: L ] remains constant across all K [1: L ] .On the other hand, as the conditional probability den-sity of p ( R [1: L ] | K [1: L ] ) is a normal distribution, we know (cid:82) R [1: L ] p ( A | K [1: L ] , R [1: L ] ) × p ( R [1: L ] | K [1: L ] ) d R [1: L ] is con-stant across all K [1: L ] . Thus, Assumption 1 is independent ofpermutation K [1: L ] .If K [1: L ] is a correct residue classiﬁcation, assuming thatthe cutting point is τ and following the notations given inDeﬁnition 1, a closed form of (5) can be derived as follows. Lemma 3.

When

Pr( K [1: L ] , R [1: L ] , A (cid:54) = 0 , Pr( R [1: L ] | A , K [1: L ] ) = Pr( r [1: L ] | A , K [1: L ] ) ∝ N (cid:89) i =1 (cid:90) ∞−∞ e − (cid:80) Ll =1 w l ( x − ˜ r Kl ( i ) l ) dx (13)where w l is the weight determined by σ l , i.e., w l = σ l . Proof.

First we need to clarify, given r [1: L ] and K [1: L ] , if thereare multiple candidates of cutting point τ , the value of (13) isinvariant to different selections of τ . For a ﬁxed i and differentpossible τ (cid:54)∈ I i , the relative positions of { ˜ r K l ( i ) l } do notchange, as proved in Lemma 1. The only difference is thatthere may be a uniform shift on { ˜ r K l ( i ) l } , i.e., two differentcutting points may result in two different groups { ˜ r K l ( i ) l } and { ˜ r (cid:48) K l ( i ) l } whereas ˜ r K l ( i ) l − ˜ r (cid:48) K l ( i ) l equals a constant: a multipleof Γ . However, replacing { ˜ r K l ( i ) l } with { ˜ r (cid:48) K l ( i ) l } in (13), thevalue of (13) does not change due to the integral on x alongthe R .With lemma 2 and Assumption 1, for each i , given some µ i ∈ [0 , Γ) , if we sort { ˜ r K l ( i ) l } in an ascending order,accordingly { µ i +∆ K l ( i ) l } are also sorted ascendingly. There-fore, the errors { ∆ K l ( i ) l , l = 1 , , ..., L } must be in a form { µ i − ˜ r K l ( i )1 + g Γ , µ i − ˜ r K l ( i )2 + g Γ , ..., µ i − ˜ r K l ( i ) L + g Γ } ,for some g ∈ Z . On the other hand, since µ i are i.i.d. uniformlydistributed in [0 , Γ) , we can conclude that, for each i , underthe residue classiﬁcation K [1: L ] , the probability density of r il is proportional to (cid:82) ∞−∞ e − (cid:80) Ll =1 w l ( x − ˜ r Kl ( i ) l ) dx . Due to theindependence of µ i , (13) follows.Furthermore, we show in the following that there exists anefﬁcient scheme to determine the optimal solutions of (5).Before proceeding, we introduce the following notations forclarity. For any given τ ∈ [0 , Γ) , let γ τ ( i ) l , l ∈ { , , ..., L } ,denote the i th item of { ˜ r il , i = 1 , , ..., N } sorted in ascendingorder. EEE TRANSACTIONS ON SIGNAL PROCESSING 6

Theorem 1.

The MAP estimation of K [1: L ] under Assumption1 is to determine a cutting point τ ∈ [0 , Γ) such that arg max τ N (cid:88) i =1 [ ( (cid:80) Ll =1 ˜ r K l ( i ) l w l ) (cid:80) Ll =1 w l − L (cid:88) l =1 w l ˜ r K l ( i ) l ] (14) and the optimal clustering strategy is to group { γ τ ( i ) l , l =1 , , ..., L } : i.e., clustering the i th largest elements among eachset { ˜ r [1: N ] l } together for each i .Proof. For K [1: L ] = ( K , K , ..., K L ) of any correct residueclassiﬁcation, (13) can be further simpliﬁed as, N (cid:89) i =1 (cid:90) ∞−∞ e − [( (cid:80) Ll =1 w l ) x − (cid:80) Ll =1 ˜ r Kl ( i ) l w l x + (cid:80) Ll =1 w l ˜ r Kl ( i ) l ] dx = N (cid:89) i =1 (cid:90) ∞−∞ exp [ − ( L (cid:88) l =1 w l )( x − (cid:80) Ll =1 ˜ r K l ( i ) l w l (cid:80) Ll =1 w l ) + ( (cid:80) Ll =1 ˜ r K l ( i ) l w l ) (cid:80) Ll =1 w l − L (cid:88) l =1 w l ˜ r K l ( i ) l ] dx ∝ N (cid:88) i =1 [ ( (cid:80) Ll =1 ˜ r K l ( i ) l w l ) (cid:80) Ll =1 w l − L (cid:88) l =1 w l ˜ r K l ( i ) l ] (15)Given ˜ r il , (cid:80) Ni =1 (cid:80) Ll =1 w l ˜ r K l ( i ) l in (15) is a constant. Then,we only need to focus on N (cid:88) i =1 ( L (cid:88) l =1 ˜ r K l ( i ) l w l ) (16)For the rest, we ﬁrst prove that the residue cluster followingthe rule of grouping C τi = { γ τ ( i )1 , γ τ ( i )2 , ..., γ τ ( i ) L } for each i achieves the maximum value of (16). This is a generalizationof the following inequality. For two pairs of numbers a ≤ a and b ≤ b , we have ( a + b ) + ( a + b ) ≥ ( a + b ) + ( a + b ) (17)In general, for two sequences, { γ τ ( i )1 , γ τ ( i )2 , ..., γ τ ( i ) L } and { γ τ ( j )1 , γ τ ( j )2 , ..., γ τ ( j ) L } , both of which are sorted in non-decreasing order, the rearrangement inequality [2] tells that γ τ ( K (1))1 γ τ (1)2 + γ τ ( K (2))1 γ τ (2)2 + ...γ τ ( K ( N ))1 γ τ ( N )2 ≤ γ τ (1)1 γ τ (1)2 + γ τ (2)1 γ τ (2)2 + ...γ τ ( N )1 γ τ ( N )2 (18)where K can be any permutation on { , , ..., N } . Said anotherway, the maximum value of (16) is achieved when the orderis preserved. According to (18), the optimal value of (16) isobtained following the clustering strategy claimed: we groupthe i th largest elements among { ˜ r [1: N ] l } for each l together.Since there are N L candidate cutting points, where one mayselect τ = r il for each i and l , what we prove above presentsthe local optimal classiﬁcation strategy for a τ . Therefore, inthe worst case, by enumerating all the N L candidate cuttingpoints, we can ﬁnd the ﬁnal optimal solution to (5).To conclude, the complexity of computing the MAP forresidue clustering under Assumption 1 is reduced to ﬁnd outthe optimal τ from N L candidates { r il } . We conclude theproposed algorithm as follows. Algorithm 1

Conditional MAP Estimation of Classiﬁcation

Input:

Given moduli m l = Γ M l and the residues observed R il , i = 1 , , ..., N, l = 1 , , ..., L .1. Calculate r il = (cid:104) R il (cid:105) Γ ;2. Calculate (cid:101) r il according to Deﬁnition 1.3. Derive the permutation K l according to Theorem 1, i.e.,ﬁnd the best τ .4. Apply the conventional RCRT for a single number to get { ˆ Y i } . Output: ˆ Y i , i = 1 , , ..., N . Example 1.

Consider m = 5 × , m = 5 × , Y = 11 and Y = 18 . For simplicity w = w = 1 , i.e., noises arein a same level perturbing the samples with sampling rate m and m . Two observations are obtained R = { , } and R = { , } . Accordingly, one can derive r = { , } and r = { , } . Under Assumption 1, τ can be selected from { , , , } . Here we specify the two cases where τ = 1 and τ = 3 . When τ = 1 , recalling Deﬁnition 1, ˜ r = { , − } and ˜ r = { , − } . According to Theorem 1, the clustering strategyis to group the smallest ones, i.e., {− , − } , and group thelargest ones, i.e., { , } , in ˜ r and ˜ r , respectively. In thisscenario, the loss function in (14) equals − . When τ = 3 , ˜ r = { , − } and ˜ r = { , } . Similarly, we group {− , } and { , } together. The value of loss function in (14) is − then. Similarly, when we set τ = 4 or τ = 0 , the values of (14)are the same: − , which achieves the maximal of (14) amongthe four cases of τ . Thus, we ﬁnd the MAP of clusteringby grouping { , } and { , } as the residues of Y and Y ,respectively. The following reconstruction is applying RCRTfor a single number on the two residues sets, respectively. IV. A

LGORITHM T WO : B AYESIAN W RAPPED G AUSSIAN M IXTURE M ODEL AND T WO -S TEP M AXIMIZATION F AST A LGORITHM

In last section, we studied a conditional MAP of residueclustering. It is noted that after the permutations K [1: L ] areestimated, we still need to apply conventional RCRT for asingle number to derive the ﬁnal reconstruction of Y i [19]. It istherefore an interesting question that whether we can estimateboth permutation K [1: L ] and Y [1: N ] at the same time.In this section, we develop a two-step searching algorithmto ﬁgure out the estimations of both. Coming with a slightcompromise in computational complexity, the method pro-posed in this section can achieve stronger robustness comparedto Algorithm 1. As mentioned above, if we further considerthe problem on the ’small circle’ modulo Γ , we would ﬁndit similar to the Gaussian Mixture Model (GMM), wherethe differences lie on the wrapped gaussian distribution fornoisy R il and prior knowledge with respect to the samplegeneration. Inspired with the techniques to solve GMM, inthe following, we will treat both Y [1: N ] and K [1: L ] as thetargets of estimation instead of K [1: L ] only, and develop aMAP estimation for both variables at the same time. From τ = 0 , , are all cutting points in this example and lead to the sameclustering. EEE TRANSACTIONS ON SIGNAL PROCESSING 7 (6), it is not hard to observe that

Pr( K [1: L ] , Y [1: N ] | R [1: L ] ) =Pr( K [1: L ] , µ [1: N ] | r [1: L ] ) , which implies that we only need todeal with the MAP of ( K [1: L ] , µ [1: N ] ) instead.To this end, the objective function becomes { ˆ K [1: L ] , ˆ µ [1: N ] } := arg max K [1: N ] , µ [1: N ] Pr( K [1: L ] , µ [1: N ] | r [1: L ] ) ∝ arg max K [1: N ] , µ [1: N ] Pr( r [1: L ] | K [1: L ] , µ [1: N ] ) (19)We propose an iterative method to solve the above equation.It proceeds as follows: after initializing µ (0)[1: N ] , for ( t + 1) th iteration, • Step one: given µ ( t )[1: N ] , deducing: K ( t +1)[1: L ] = arg max K [1: L ] Pr( r [1: L ] | K [1: N ] , µ t [1: N ] ) (20) • Step two: given K ( t +1)[1: L ] , deducing: µ ( t +1)[1: N ] = arg max µ [1: N ] Pr( r [1: L ] | K ( t +1)[1: L ] , µ [1: N ] ) (21)In the remaining part of this section, we will propose a fastalgorithm to solve each step and prove that it will converge tostead state. We start from deducing a fast algorithm for stepone. Similar to equation (6), we have Pr( r [1: L ] | K [1: L ] , µ [1: N ] ) ∝ L (cid:89) l =1 N (cid:89) i =1 ∞ (cid:88) j Kl ( i ) l = −∞ p ( j K l ( i ) l Γ + r K l ( i ) l | k i Γ + µ i ) ∝ L (cid:89) l =1 N (cid:89) i =1 ∞ (cid:88) j Kl ( i ) l = −∞ √ πσ l e − ( rKl ( i ) l − µi +( jKl ( i ) l − ki )Γ)22 σ l ∝ L (cid:89) l =1 N (cid:89) i =1 ∞ (cid:88) j (cid:48) Kl ( i ) l = −∞ √ πσ l e − ( rKl ( i ) l − µi + j (cid:48) Kl ( i ) j Γ)22 σ l (22)Since K l are independently and randomly distributed, wemay simplify (22) to ﬁnd an optimal K ( t +1) l for each l : K ( t +1) l := arg max K l N (cid:89) i =1 ∞ (cid:88) j (cid:48) il = −∞ e − ( ril − µtKl ( i )+ j (cid:48) il Γ)22 σ l (23)In general, since (23) is hard to solve, here we apply theapproximation method used in [18]. We deﬁne d Γ ( a, b ) :=min j ∈ Z | a − b + j Γ | as the distance of any two real numbers a and b . When the σ l is much smaller than Γ , (23) can beapproximated as K ( t +1) l = arg min K l N (cid:88) i =1 d ( r il , µ tK l ( i ) ) , (24)since exp ( − d ( r Kl ( i ) l ,µ i ) σ l ) dominates the term (cid:80) ∞ j (cid:48) Kl ( i ) l = −∞ exp ( − ( r Kl ( i ) l − µ i + j (cid:48) Kl ( i ) l Γ) σ l ) . Although solving(24) is seemingly of exponential complexity, we will showthere exists an O ( N ) -time algorithm in the following theorem. Let r ( i ) l , i = 1 , , ..., N , denote the i th element ofincreasingly sorted sequence { r (1) l , r (2) l , ..., r ( N ) l } of r [1: N ] ,l .Similarly, µ t [ i ] denotes the i th element of increasingly sortedsequence { µ t [1] , µ t [2] , ..., µ t [ N ] } of µ t [1: N ] . Theorem 2.

There exists some ζ ∈ { , , ..., N } such that thefollowing matching strategy: ( r ( (cid:104) i + ζ (cid:105) N ) l , ˆ µ [ i ] ) , i = 1 , , ..., N ,minimize (24).Proof. Let ω i and θ i within [0 , π ) denote the angles of r ( i ) l and µ t [ i ] distributed on the ’small circle’ modulo Γ ,respectively. Therefore, both { ω i } and { θ i } are also in anascending order. Correspondingly, the difference of anglesbetween any pair ( µ t [ i ] , r ( i ) l ) is proportional to min {| ω i − θ i | , π − | ω i − θ i |} . To give its geometrical interpretation,consider two concentric circles, as shown in Fig. 4, where { µ t [ i ] } are distributed on the outer circle and { r ( i ) l } are on theinner one. We deﬁne ξ i ,i to represent an angle differencefrom µ t [ i ] to r ( i ) l in a counterclockwise direction as positiveand otherwise as negative, which is in ( − π, π ] , illustratedin Fig. 4. Suppose that there is a line connecting each pair ( µ t [ i ] , r ( i ) l ) under the optimal choice, which is also denotedby ξ i ,i , we will prove that those N lines do not intersecteach other .We prove it by contradiction. Suppose there exist i and i such that the two lines ξ i ,K ( i ) and ξ i ,K ( i ) cross eachother, without loss of generality, we set θ i = 0 , as twoconcentric circles rotating simultaneously will not affect thedistributions of θ i and ω i on the two circles. Also we set θ i ∈ (0 , π ) . Note that when θ i = π , it is impossible to resultin crossing. Therefore, when two lines cross, it falls into oneof the following four cases: • ξ i ,K ( i ) ≥ and ξ i ,K ( i ) ≥ , i.e., ≤ θ i <ω K ( i ) < ω K ( i ) ≤ π and < θ i < ω K ( i ) • ξ i ,K ( i ) ≥ and ξ i ,K ( i ) < , i.e., ≤ θ i <ω K ( i ) ≤ π and π + θ i < ω K ( i ) < π and ≤ θ i < π • ξ i ,K ( i ) < and ξ i ,K ( i ) ≥ , i.e., ≤ θ i < θ i < π and π < ω K ( i ) < ω K ( i ) ≤ θ i + π • ξ i ,K ( i ) < and ξ i ,K ( i ) < , i.e., ≤ θ i < θ i < π and π + θ i ≤ ω K ( i ) < ω K ( i ) < π If two lines, ξ i ,K ( i ) and ξ i ,K ( i ) , cross, we willprove that the interchange of K ( i ) and K ( i ) willdecrease the value of the right side of (24). Because [ d ( r i l , µ K ( i ) ) + d ( r i l , µ K ( i ) )] is proportional to [ ξ i ,K ( i ) + ξ i ,K ( i ) ] , we will use the latter instead of theformer in the following discussion.For case 1) and 4), we have ξ i ,K ( i ) + ξ i ,K ( i ) =( ω K ( i ) − θ i ) +( ω K ( i ) − θ i ) . When K ( i ) and K ( i ) areswitched, we have [( ω K ( i ) − θ i ) + ( ω K ( i ) − θ i ) ] − [( ω K ( i ) − θ i ) + ( ω K ( i ) − θ i ) ]= 2( θ i − θ i )( ω K ( i ) − ω K ( i ) ) > (25)Next for case 2) and 3), we have ξ i ,K ( i ) + ξ i ,K ( i ) =( ω K ( i ) − θ i − π ) + ( ω K ( i ) − θ i ) . Similarly, switching EEE TRANSACTIONS ON SIGNAL PROCESSING 8 K ( i ) and K ( i ) results in [( ω K ( i ) − θ i − π ) + ( ω K ( i ) − θ i ) ] − [( ω K ( i ) − θ i − π ) + ( ω K ( i ) − θ i ) ]= 2( θ i − θ i )( ω K ( i ) − ω K ( i ) ) + 4 π ( ω K ( i ) − ω K ( i ) ) > (26)Thus, under four cases, we have proved that when two linesdo not cross, the right side of (24) gets a smaller value.We now move to the second step: given residue clustering,how to ﬁgure out the optimal common residue? It is evidentthat with given residue clustering, the optimal issue is reducedto N independent estimations for a single common residue.This problem has been previously studied in [18], where itproved that the optimal estimation can be determined in O ( L ) complexity. For completeness, we present the skeleton of [18]as follows with a simpliﬁed proof.With given clustering K t +1[1: L ] , we need to ﬁgure out theoptimal µ t +1[1: N ] , i.e., µ t +1 i = arg min x ∈ [0 , Γ) L (cid:88) l =1 d ( x, r k l ( i ) l ) (27)where x denotes a point on the ’small circle’ modulo Γ .For simplicity, we assume that { γ (1) , γ (2) , ..., γ ( L ) } denote { r k l ( i ) l } in an ascending order. Based on the deﬁnition of d Γ ,there must exist b l ∈ { , ± } , such that L (cid:88) l =1 w l d ( µ t +1 i , γ l ) = L (cid:88) l =1 w ( l ) ( µ t +1 i − γ l − b l Γ) (28)As d Γ ( µ t +1 i , γ ( l ) ) ≤ Γ2 , we consider the interval I µ t +1 i =[ µ t +1 i − Γ2 , µ t +1 i + Γ2 ) . Without loss of generality, we as-sume µ t +1 i ≥ Γ2 . Thus, there must some ( j ) such that γ ( j ) , γ ( j )+1 , ..., γ ( L ) , γ (1) + Γ , ..., γ ( j − + Γ , which are alsoin an ascending order, all belong to I µ t +1 i . In such case, d Γ ( µ t +1 i , γ ( l ) ) = | µ t +1 i − γ ( l ) | if l ≥ j ; otherwise d Γ ( µ t +1 i , γ ( l ) ) = γ ( l ) + Γ − µ t +1 i . Substitute the above into(27), then we have µ t +1 i = (cid:80) Ll =1 w ( L ) γ ( l ) (cid:80) Ll =1 w ( l ) + (cid:80) jl =1 w ( l ) γ ( l ) (cid:80) Ll =1 w ( l ) . Since j ∈ [1 : L ] , therefore, µ t +1 i ∈ { (cid:80) Ll =1 w ( L ) γ ( l ) (cid:80) Ll =1 w ( l ) + (cid:80) jl =1 w ( l ) γ ( l ) (cid:80) Ll =1 w ( l ) , j = 0 , , ..., L − } , (29)which implies that in Step two, the complexity of estimatingeach µ t +1 i is O ( L ) , and total complexity is O ( N L ) . As asummary, given the estimations of µ t , we can ﬁgure out theoptimal clustering K t +1[1: L ] with O ( N L ) complexity accordingto Theorem 2. Relying on the estimations of K [1: L ] , we canfurther determine the optimal µ t +1 still in O ( N L ) complexity.It noted that the number of candidates of K [1: L ] is ﬁnite andthus the algorithm will always converge to some stationarysate. We conclude such iterative searching as follows. Example 2.

Consider m = 5 × , m = 5 × , m = 5 × , Y = 11 , Y = 18 and Y = 64 . We set w = w = w = 1 . Three observations are obtained R = { , , . } , R = { , , . } and R = { . , . , . } . Accord-ingly, one can derive r = { , , . } , r = { , , . } , Algorithm 2

MAP of Classiﬁcation and Common Residues

Input:

Given moduli m l = Γ M l and the residue observed R il , i = 1 , , ..., N and l = 1 , , ..., L .1. Calculate r il = (cid:104) (cid:101) r il (cid:105) M l .2. Initialize { ˆ µ i , i = 1 , , ..., N } .3. Begin iteration t from to T : • Step-1: Given { ˆ µ t − i , i = 1 , , ..., N } , determine the opti-mal clustering, { K tl , l = 1 , , ..., L } , following Theorem2. • Step-2: Given { K tl , l = 1 , , ..., L } , update { ˆ µ t − i , i =1 , , ..., N } by (27).4. Calculate q il = [ R K Tl ( i ) l − (cid:101) r K Tl ( i ) l Γ ] (30)and reconstruct quotient Q i from q il with moduli M l viaconventional CRT.5. Reconstruct ˆ Y i = Q i Γ + ˆ µ i ( T ) . Output: ˆ Y i , i = 1 , , ..., N . r = { . , . , . } . We initialize ˆ µ with r = { , , . } ,i.e., ˆ µ = 2 , ˆ µ = 4 and ˆ µ = 4 . . K is clear and for K ,according to theorem 2, the optimal matching must be oneof the following three cases in a rotation manner: (a) (0 → , (3 → , (3 . → . ; (b) (0 → . , (3 → , (3 . → and (c) (0 → , (3 → . , (3 . → . Clearly, (b) minimizes(24). Similarly, we ﬁnd that (0 . → , (4 . → , (4 . → . is optimal for K . Given K [1: L ] , we proceed to estimate µ [1: N ] . Here we only take µ as an example. With K [1: L ] , { , , . } are grouped together. µ must be one of thethree { . , . , . × } and we ﬁnd that µ = . minimizes (27). V. R

OBUSTNESS S TRENGTHENING AND S IMULATION

In this section, we will introduce error correcting codes to fur-ther strengthen the robustness of proposed statistical RCRTs.As we stressed earlier, to ﬁnd correct K [1: L ] plays the keyrole in reconstruction. Even if only one residue is not correctlyclustered, it may compromise estimation performance heavilyin CRT systems. A natural question is that when perfectresidue clustering is not achievable, whether robust reconstruc-tion is still possible. Fortunately, a previous work [27] has pro-vided a positive answer to this question, as it will implementrobust reconstruction under few residues with arbitrary errors.Assume that L moduli are used, where { m l = Γ M l } are inan ascending order, and L is the smallest positive integer L ≤ L , such that lcm ( m , m , ..., m L ) = Γ (cid:81) L l =1 M l > D .Said another way, if no errors exist in residues, Y i can besufﬁciently recovered from the residues of any L moduli. Bytaking a wrong classiﬁcation with an arbitrary error happeningto that residue, from [27], we have the following theorem: Theorem 3 ( [27] ) . Given K [1: L ] , where at least ( L −(cid:98) L − L (cid:99) )residues { R i [1: L ] } of Y i are correctly clustered and max l ∆ il − min l ∆ il < Γ2 hold for those correctly clustered residues { R i [1: L ] } for each i , then there exists a robust reconstructionscheme for ˆ Y i such that | ˆ Y i − Y i | ≤ . EEE TRANSACTIONS ON SIGNAL PROCESSING 9

Fig. 4: Illustration for the Step one of Algorithm 2

Remark 3.

It is worthy mentioning that with ﬁxed L , increas-ing L , i.e., with more moduli (samplers), does not guarantee tocontinuously improve the performance of reconstruction sincea larger L always degrades the clustering accuracy. Moreover,besides the threshold ( L − (cid:98) L − L (cid:99) ) requirement of K [1: L ] ,Theorem 3 also assumes that max l ∆ il − min l ∆ il < Γ2 forcorrectly clustered residues, which fails in a higher probabilitywith a larger L as analyzed in Section II. In general, the reconstruction performance depends on fac-tors including N , L , L , Γ , the noise and also the desir-able computation power, of which the relationships are toocomplicated to be concisely expressed. However, if the noiseis limited, adding redundant residues properly can alwaysimprove the performance. Besides, the other advantages ofthe error correction mechanism will be clear for the followingmajority voting based estimations.In the rest of the section, we will show the mechanism tofully utilize the samples from multiple samplers. A naturalidea is to regroup the moduli into different sets. Then, weuse the residues from each set to implement Algorithm 1or 2. Roughly speaking, the basic requirement is that eachset should include at least L moduli, of which the lcm isbigger than D in order to achieve a valid reconstruction. Thus,we can obtain several estimated { ˆ Y i } from different sets. Ifthere are κ such moduli sets, we can then correspondinglypick the N most frequent numbers from all κN reconstructednumbers as the output. However, even if both correct residueclusterings are obtained with two different sets, noise may stillcause a slight difference between two estimated ˆ Y i of the same Y i , where the above idea can not be used straightforwardly.Instead, in our scheme, we focus on the N most frequentquotients. In the simulation, we compare proposed algorithms withthe deterministic RCRT for multiple numbers in [25]. Thefollowing simulation results show the comparison amongStatistical RCRT-1, short for Algorithm 1 (MAP of residueclustering), Statistical RCRT-2, short for Algorithm 2 (MAP Even under correct residue classiﬁcation, the common residues may beuniformly shifted by Γ with the proposed scheme, depending on the choiceof the cutting point τ . Correspondingly, the difference of two reconstructionsof the quotient (cid:98) Y i Γ (cid:99) may be . However, this can be easily distinguished andhere we assume they share the same estimated quotient. of both residue clustering and common residue estimation),and deterministic one [25]. Here, we set L = 2 . Moduli areselected in a form where Γ = 100 and { M , M , ..., M L } are the sequence of primes starting from 23. Accordingly, Y [1: L ] are randomly selected from [0 , , where D =100 × × . Referring to the requirements of moduli in[25], the lcm of all moduli is larger than the product of { Y i } and we set L = L N = 2 N . We assume the variance ofnoise σ l = σ = 10 − SNR/ for each l . As for the simulationshown in Fig. 5, for both Algorithm 1 & 2, we utilize residuesfrom each pair of moduli, in total (cid:0) N (cid:1) many groups for thesimulations.We deﬁne that Y i is robustly reconstructed if the reconstruc-tion error is upper bounded by Γ . Since the reconstructionof N numbers is ﬁnally converted to N independent recon-struction processes for each Y i in both proposed statisticalRCRTs, we deﬁne the success rate on average as the expectedrate that a Y i can be robustly recovered. Similarly, the perfectreconstruction rate is the probability that all { Y i } are robustlyrecovered. The two metrics may be of different interest indifferent applications. SNR is limited within [ − , and N ∈ { , , , , } . We run 1000 simulations to estimate thesuccess rate in each scenario.The simulation results for both Algorithm 1 and Algorithm 2compared to deterministic RCRT in [25] are presented in Fig. 5(a) and (b), respectively. From Fig. 5, Algorithm 2 outperformsAlgorithm 1 as analyzed before, since Algorithm 2 is in aniterative manner, which may face a little more computationaloverhead. Heuristically, due to the non-weighted nature ofCRT, if errors happen to both residue clusterings in twodifferent moduli sets, the resultant ˆ Y i and quotients (cid:98) ˆ Y i Γ (cid:99) asso-ciated will be dramatically different in very high probability.Therefore, when estimated (cid:98) ˆ Y i Γ (cid:99) has been reconstructed at leasttwice across the sets, it is of high conﬁdence to be selectedin the majority voting. Our simulations coincide with suchintuition.The distributions of the iteration number for different N in Algorithm 2 are shown in the top subﬁgure of Fig. 6.Generally, Algorithm 2 can reach a stationary state within iterations, mostly concentrated between and . To bemore detailed, the other successive ﬁve subﬁgures in Fig. 6show how the noise level inﬂuences the number of iterations. EEE TRANSACTIONS ON SIGNAL PROCESSING 10 -40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=2 and L=4

Deterministic RCRTProposed Statistical RCRT-1 Successful Reconstruction on AverageProposed Statistical RCRT-1 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=4 and L=8

Deterministic RCRTProposed Statistical RCRT-1 Successful Reconstruction on AverageProposed Statistical RCRT-1 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=6 and L=12

Deterministic RCRTProposed Statistical RCRT-1 Successful Reconstruction on AverageProposed Statistical RCRT-1 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=8 and L=16

Deterministic RCRTProposed Statistical RCRT-1 Successful Reconstruction on AverageProposed Statistical RCRT-1 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=10 and L=20

Deterministic RCRTProposed Statistical RCRT-1 Successful Reconstruction on AverageProposed Statistical RCRT-1 Perfect Reconstruction (a) Proposed Statistical RCRT-1 and Deterministic RCRT [25] -40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=2 and L=4

Deterministic RCRTProposed Statistical RCRT-2 Successful Reconstruction on AverageProposed Statistical RCRT-2 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=4 and L=8

Deterministic RCRTProposed Statistical RCRT-2 Successful Reconstruction on AverageProposed Statistical RCRT-2 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=6 and L=12

Deterministic RCRTProposed Statistical RCRT-2 Successful Reconstruction on AverageProposed Statistical RCRT-2 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=8 and L=16

Deterministic RCRTProposed Statistical RCRT-2 Successful Reconstruction on AverageProposed Statistical RCRT-2 Perfect Reconstruction-40 -35 -30 -25 -20 -15 -10 -5 0

SNR P r obab ili t y o f S u cc e ss N=10 and L=20

Deterministic RCRTProposed Statistical RCRT-2 Successful Reconstruction on AverageProposed Statistical RCRT-2 Perfect Reconstruction (b) Proposed Statistical RCRT-2 and Deterministic RCRT [25]

Fig. 5: Comparison for Success Rate of Robust Reconstruction between the proposed two statistical RCRT and DeterministicRCRT in [25]Here ’low SNR’ stands for the cases where SNR is within [ − , − and ’high SNR’ refers to SNR within [ − , .Clearly, low SNR may incur a higher complexity, while thenumber of iterations in high SNR cases is within 3 rounds onaverage.Finally, we give two examples to show how error correctingtechniques can improve the performance, where N is setto and , respectively. Given L = 4 , we compare theperformances of Algorithm 2 with and without incorporatingerror correction , which is shown in Fig. 7. It is notedthat all the four moduli need to be simultaneously used inAlgorithm 2 to apply error correction in reconstruction; whilewithout error correction, the moduli are regrouped into (cid:0) (cid:1) ,i.e., 6, sets and the original Algorithm 2 is applied on eachset with majority voting strategy afterwards. In the case with N = 2 (the upper one in Fig. 7), error correction does notprovide a better tradeoff. That is because when N is small,clusterings produced are accurate enough. In such scenario,it is more reasonable to generate more modulus sets for From Theorem 3, here we can tolerate at most one clustering error since L = 2 and (cid:98) L − L (cid:99) = 1 . the reconstructions of numbers respectively to the majorityvoting. However, as N increases, clustering errors happen ina sharply increasing rate, where merely a larger number ofreconstructions from different sets do not beneﬁt the successrate of majority voting so much. The lower one in Fig. 7 showsthat given N = 6 , the error correction based Statistical RCRT-2 outperforms the original one when SNR is bigger than -37.5.VI. C ONCLUSION AND P ROSPECTS

In this paper, we present the ﬁrst statistical based approachesto efﬁciently solve the robust reconstruction of multiple num-bers from unordered residues. Compared with deterministicschemes, the proposed statistical RCRT methods signiﬁcantlyimprove the performance of reconstruction, which can be fur-ther strengthened with error correcting techniques. However,in extremely low noise cases, the performance of deterministicschemes can be better especially when Γ is small. Therefore,it would be of great interest to investigate the tradeoff betweenstatistical inference and deterministic error tolerance.Another problem that remains open is how to determine theoptimal size of modulus set for reconstruction. We believe itis nontrivial to describe the tradeoff between the clustering EEE TRANSACTIONS ON SIGNAL PROCESSING 11 N = N = N = N = N = Number of Iterations D en s i t y SNR

HighLow

Fig. 6: The Number of Average Iterations of Algorithm 2 -40 -35 -30 -25 -20 -15 -10 -5 0

SNR S u cc e ss R a t e Probability of Success Comparison when N=2

Statistical RCRT-2 Successful Reconstruction on AverageStatistical RCRT-2 Successful Reconstruction on Average with CorrectionStatistical RCRT-2 Perfect ReconstructionStatistical RCRT-2 Perfect Reconstruction with Correction -41 -40 -39 -38 -37 -36 -35 -34 -33 -32 -31

SNR S u cc e ss R a t e Probability of Success Comparison when N=6

Fig. 7: Proposed Statistical RCRT-2 with Error Correctionaccuracy compromise and the additional robustness gainedfrom redundant moduli for error correction.R

EFERENCES[1] Oded Goldreich, Dana Ron, and Madhu Sudan. Chinese remainderingwith errors. In

Proceedings of the thirty-ﬁrst annual ACM symposiumon Theory of computing , pages 225–234. ACM, 1999.[2] Godfrey Harold Hardy, John Edensor Littlewood, and George Pólya.

Inequalities . Cambridge university press, 1988.[3] Gang Li, Jia Xu, Ying-Ning Peng, and Xiang-Gen Xia. Bistatic linearantenna array sar for moving target detection, location, and imagingwith two passive airborne radars.

IEEE Transactions on Geoscienceand Remote Sensing , 45(3):554–565, 2007.[4] Gang Li, Jia Xu, Ying-Ning Peng, and Xiang-Gen Xia. Location andimaging of moving targets using nonuniform linear antenna array sar.

IEEE Transactions on Aerospace and Electronic Systems , 43(3), 2007.[5] Xiaoping Li, Tingzhu Huang, Qunying Liao, and Xiang-Gen Xia.Optimal estimates of two common remainders for a robust generalizedchinese remainder theorem.

IEEE Transactions on Signal Processing ,67(7):1824–1837, 2019.[6] Xiaoping Li, Wenjie Wang, Weile Zhang, and Yunhe Cao. Phase-detection-based range estimation with robust chinese remainder theo-rem.

IEEE Transactions on Vehicular Technology , 65(12):10132–10137,2016.[7] Xiaoping Li, Xiang-Gen Xia, Wenjie Wang, and Wei Wang. A robustgeneralized chinese remainder theorem for two integers.

IEEE Trans-actions on Information Theory , 62(12):7491–7504, 2016.[8] Xiaowei Li, Hong Liang, and Xiang-Gen Xia. A robust chineseremainder theorem with its applications in frequency estimation fromundersampled waveforms.

IEEE Transactions on Signal Processing ,57(11):4314–4322, 2009.[9] Xiaowei Li and Xiang-Gen Xia. A fast robust chinese remainder theorembased phase unwrapping algorithm.

IEEE Signal Processing Letters ,15:665–668, 2008.[10] Huiyong Liao and Xiang-Gen Xia. A sharpened dynamic range ofa generalized chinese remainder theorem for multiple integers.

IEEEtransactions on information theory , 53(1):428–433, 2006.[11] Piya Pal and Palghat P Vaidyanathan. Nested arrays: A novel approach toarray processing with enhanced degrees of freedom.

IEEE Transactionson Signal Processing , 58(8):4167–4181, 2010.[12] Jeremy J Stone. Multiple-burst error correction with the chineseremainder theorem.

Journal of the Society for Industrial and AppliedMathematics , 11(1):74–81, 1963.[13] Zhao Tan, Yonina C Eldar, and Arye Nehorai. Direction of arrivalestimation using co-prime arrays: A super resolution viewpoint.

IEEETransactions on Signal Processing , 62(21):5565–5576, 2014.[14] Palghat P Vaidyanathan and Piya Pal. Sparse sensing with co-prime sam-plers and arrays.

IEEE Transactions on Signal Processing , 59(2):573–586, 2010.[15] Chen Wang, Qinye Yin, and Hongyang Chen. Robust chinese remaindertheorem ranging method based on dual-frequency measurements.

IEEETransactions on Vehicular Technology , 60(8):4094–4099, 2011.

EEE TRANSACTIONS ON SIGNAL PROCESSING 12 [16] Genyuan Wang, Xiang-Gen Xia, Victor C Chen, and RL Fielder. Detec-tion, location, and imaging of fast moving targets using multifrequencyantenna array sar.

IEEE Transactions on Aerospace and ElectronicSystems , 40(1):345–355, 2004.[17] Wei Wang, Xiaoping Li, Xiang-Gen Xia, and Wenjie Wang. The largestdynamic range of a generalized chinese remainder theorem for twointegers.

IEEE Signal Processing Letters , 22(2):254–258, 2015.[18] Wenjie Wang, Xiaoping Li, Wei Wang, and Xiang-Gen Xia. Maximumlikelihood estimation based robust chinese remainder theorem for realnumbers and its fast algorithm.

IEEE Transactions on Signal Processing ,63(13):3317–3331, 2015.[19] Wenjie Wang and Xiang-Gen Xia. A closed-form robust chineseremainder theorem and its performance analysis.

IEEE Transactionson Signal Processing , 58(11):5655–5666, 2010.[20] X-G Xia. An efﬁcient frequency-determination algorithm from multipleundersampled waveforms.

IEEE Signal Processing Letters , 7(2):34–37,2000.[21] Xiang-Gen Xia. On estimation of multiple frequencies in undersampledcomplex valued waveforms.

IEEE Transactions on Signal Processing ,47(12):3417–3419, 1999.[22] Xiang-Gen Xia and Kejing Liu. A generalized chinese remaindertheorem for residue sets with errors and its application in frequencydetermination from multiple sensors with low sampling rates.

IEEESignal Processing Letters , 12(11):768–771, 2005.[23] Xiang-Gen Xia and Genyuan Wang. Phase unwrapping and a robustchinese remainder theorem.

IEEE Signal Processing Letters , 14(4):247–250, 2007.[24] Hanshen Xiao, Cas Cremers, and Hari Krishna Garg. Symmetricpolynomial & crt based algorithms for multiple frequency determinationfrom undersampled waveforms. In

Signal and Information Processing(GlobalSIP), 2016 IEEE Global Conference on , pages 202–206. IEEE,2016.[25] Hanshen Xiao, Yufeng Huang, Yu Ye, and Guoqiang Xiao. Robustness inchinese remainder theorem for multiple numbers and remainder coding.

IEEE Transactions on Signal Processing , 66(16):4347–4361, 2018.[26] Hanshen Xiao and Guoqiang Xiao. Notes on crt-based robust frequencyestimation.

Signal Processing , 133:13–17, 2017.[27] Hanshen Xiao and Guoqiang Xiao. On solving ambiguity resolutionwith robust chinese remainder theorem for multiple numbers.

IEEETransactions on Vehicular Technology , 68(5):5179–5184, 2019.[28] Li Xiao, Xiang-Gen Xia, and Haiye Huo. Towards robustness in residuenumber systems.

IEEE Transactions on Signal Processing , 65(6):1497–1510, 2017.[29] Li Xiao, Xiang-Gen Xia, and Wenjie Wang. Multi-stage robust chi-nese remainder theorem.

IEEE Transactions on Signal Processing ,62(18):4772–4785, 2014.[30] Guangwu Xu. On solving a generalized chinese remainder theorem inthe presence of remainder errors. arXiv preprint arXiv:1409.0121 , 2014.[31] Jia Xu, Zu-Zhen Huang, Zhi-Rui Wang, Li Xiao, Xiang-Gen Xia, andTeng Long. Radial velocity retrieval for multichannel sar moving targetswith time–space doppler deambiguity.