Complexity analysis of the Controlled Loosening-up (CLuP) algorithm
aa r X i v : . [ c s . I T ] S e p Complexity analysis of the Controlled Loosening-up (CLuP)algorithm
Mihailo Stojnic ∗ Abstract
In our companion paper [22] we introduced a powerful mechanism that we referred to as the ControlledLoosening-up (CLuP) for handling MIMO ML-detection problems. It turned out that the algorithm hasmany remarkable features and one of them, the computational complexity , we discuss in more details in thispaper. As was explained in [22], the CLuP is an iterative procedure where each iteration amounts to solvinga simple quadratic program. This clearly implies that the key contributing factor to its overall computationalcomplexity is the number of iterations needed to achieve a required precision. As was also hinted in [22],that number seems to be fairly low and in some of the most interesting scenarios often not even larger than10. Here we provide a Random Duality Theory based careful analysis that indeed indicates that a very smallnumber of iterations is sufficient to achieve an excellent performance. A solid set of results obtained throughnumerical experiments is presented as well and shown to be in a nice agreement with what the theoreticalanalysis predicts. Also, as was the case in [22], we again focus only on the core CLuP algorithm but domention on several occasions that the concepts that we introduce here are as remarkably general as thosethat we introduced in [22] and can be utilized in the analysis of a large number of classes of algorithmsapplicable in the most diverse of scientific fields. Many results in these directions we will present in severalof our companion papers.
Index Terms: Controlled Loosening-up (CLuP); ML - detection; MIMO systems; Algorthms;Random duality theory . In [22] we revisited the MIMO ML-detection and introduced the so-called Controlled Loosening-up (CLuP)mechanism to solve it. Since [22] is the introductory paper on this subject we used it only to introduce themost basic features of the CLuP algorithm and deferred the discussion regarding many of its key advancedproperties to separate papers. One of these properties, the so-called computational complexity , will bethe topic of the main discussion in this paper.From the discussion presented in [22] it was rather clear that the main concepts behind CLuP are verygeneral and applicable to many different problems and algorithms used for their solving. Consequently, itwas then also clear that instead of MIMO ML-detection we could have chosen quite a few other problemsto introduce the main ideas behind CLuP. However, given ML-detection’s importance and popularity invarious scientific and engineering communities ranging from the signal processing and information theoryto statistics and machine learning we selected it as a convenient choice to quickly transcendent the CLuP’sbasics across all of these fields. To ensure the easiness of the presentation’s following and the smoothness ofthe connection to the already presented results in [22] here we follow the same pattern and use MIMO MLdetection as the underlying problem of interests. Given that we have already revisited this problem in [22]we will here try to avoid repeating many of the specifics already mentioned in [22] and instead focus on someof the key differences.However, to ensure that we have the needed problem setup properly introduced we do start with a brief ∗ e-mail: [email protected] y = A x sol + σ v , (1)where A ∈ R m × n is the so-called system matrix (typically assumed as known at the output in the so-calledcoherent scenarios which we will assume here), x sol ∈ R n and v ∈ R m are signal and noise vectors and σ is anoise scaling factor. Finally, y ∈ R m is the vector the the output of the system. Many practical systems canbe modelled this way, with the multi-antenna systems probably being the most popular annd well-knownexample in the fields of information theory and wireless communications. Similarly to what we did in [22],we here also consider a statistical scenario where the elements of v and A are assumed to be zero-meanunit-variance i.i.d. Gaussian random variables. Also, when it comes to the system’s dimensions, we willconsider the so-called linear regime, i.e. we will consider the regime where n and m are large but m = αn where α > x = min x ∈X k y − A x k , (2)where, as in [22], we assume the typical binary scenario, which means that X = {− √ n , √ n } n . While we willhere be interested in the binary scenario we do also mention that the above ML problem is very popularin many other considerations (its LASSO/SOCP variants are among the most fundamental problems instatistics, machine learning, and compressed sensing; for more on this see, e.g. [1–3, 10, 13, 25, 26]).Depending if one is interested in solving the optimization problem in (2) exactly or approximately thereare quite a few excellent algorithms that have been developed throughout the literature over the years (see,e.g. [5, 6, 9, 27]). We leave a detailed discussion about the prior work for review papers and here just brieflymention that some of the very best results that in particular relate to the type of the problems that we areinterested in here can be found in e.g. [4, 7, 8, 23, 24]. As in [22] (and earlier in [23, 24]), here the goal willbe to approach the exact solution. To that end [22] introduced the following remarkably simple iterativeprocedure (the above mentioned CLuP): let x (0) ∈ X = {− √ n , √ n } n ( x (0) can be either randomly generatedor designed in a specific way) and consider the following x ( i +1) = x ( i +1 ,s ) k x ( i +1 ,s ) k with x ( i +1 ,s ) = arg min x − ( x ( i ) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n , (3)where of course, as discussed in [22], r is the key parameter, the so-called radius . As was also discussed ingreat detail in [22] the choice for r is of fundamental importance for the success of the algorithm. Figure 1illustrates both the theoretical and simulated CLuP’s performance. Of course, all the details regarding thefigure can be found in [22]. Here we do mention only the key points that we view as of most relevance forthe discussion that we present in this paper. Namely, as one can see from the figure, as r increases fromits minimal possible value r plt (see [22] for details and a precise definition of r plt ), the CLuP’s performancegets closer to the exact ML and already for r = r sc r plt = 1 . r plt it is almost exactly where the predicatedML one is. We should also point out that one should note the appearance of the so-called vertical lineof corrections. While we skip detailing the meaning of this line and refer instead to [22], we do mentionthat in this paper we will be interested in the regimes above this line where no major corrections discussedin [22] are expected to take place. This is to ensure that we can focus only on one problem, computationalcomplexity , at a time and leave all others discussed in [22] aside. Speaking of computational complexity ,we in Table 1 show how the CLuP algorithm really performs through the iterations. We selected in particularthe above mentioned r = r sc r plt = 1 . r plt scenario and chose a rather moderately small n = 800. Already2 / σ in [db] p e rr -4 -3 -2 -1 L i n e o f m il d o r no c o rr ec t i on s CLuP – approaching the exact ML p ( plt ) err – theoryˆ p ( CLuP ) err – theory r sc = 1ˆ p ( CLuP ) err – theory r sc = { . , . , . } ˆ p ( CLuP ) err – theory ultimate CLuPˆ p ( ml ) err – theory (1FL ML (1RSB))ˆ p ( CLuP ) err – simulated r sc = 1 . p ( CLuP ) err – simulated r sc = 1 . p ( CLuP ) err – simulated r sc = 1 . . [db] r sc = . sc = . sc = . Figure 1: p err as a function of 1 /σ ; α = 0 . the fifth decimal level (the meaning of all relevant quantities is rather clear; we just add that ˆ s ( k ) is the value of the CLuP’sobjective after the k -th iteration; it also goes without saying that all quantities given in the table are theexpected values which due to an overwhelming concentration basically represent also the values themselves).In summary the mechanisms behind the CLuP algorithm introduce many remarkable properties. Two ofTable 1: Change in p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . n = 800 k - iteration p ( k ) err ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 0 . . . . . . . . them we will below particulary single out with respect to the MIMO ML (we leave out many other ones forthe discussions regarding a host of other algorithms that we designed utilizing similar mechanisms). Namely,as one of the most fundamentally important problems at the intersection of the information theory, signalprocessing, statistics, machine learning and many other areas, MIMO ML has many great features that aretypically of interest. Regarding the problem in (2) two of them are probably by far the most dominant. MIMO ML’s two most fundamental theoretical/practical needs: To solve the problem in (2) exactly. To solve (2) with the minimal needed complexity (ideally the polynomial one). . 3his of course has been known for a long time as the heart of the matter when it comes to MIMO ML.As the above results and ultimately [22] indicate CLuP manages to do rather well with respect to both ofthese features.
CLuP’s behavior regarding the MIMO ML’s needs: CLuP does approach the exact ML. Not only does the CLuP achieve the optimal performance, it does so through afixed number of the simplest possible quadratic programming iterations. .We will organize the presentation by splitting it into several parts. First we revisit the characterization ofthe algorithms’s first iteration. Then we move to the main part which is the analysis of the second iteration.As it will soon be clear, this is the most important step of the analysis and already this very step contains allthe key conceptual and technical details needed for all other steps that we will briefly touch upon afterwards.We will in parallel also provide a solid set of results obtained through simulations. As we will see, they willbe of use in both, theoretical and practical aspects of the discussion. Towards the end we of course providea few concluding remarks.
To analyze the computational complexity it is of course sufficient to just determine the number of iterationsneeded to run the algorithm to achieve the desired precision. However, we will here take a bit differentand substantially more general approach. Namely, we will characterize the behavior of all critical compo-nents/parameters of the problem in each iteration. To start things off we of course first consider the firstiteration. Many of the results that we present below follow as direct consequences of the main randomduality theory concepts that we developed in a long line of work [12–16, 18, 19]. In fact, some of the pro-cedures will be very closely related to some of the procedures presented in [22]. We will therefore try toskip repeating as many unnecessary details as possible and instead focus on the key differences between theanalysis considered here and the corresponding ones considered in [22]. We also do mention that not onlyare the analyses similar to the one that we present below of interest in [22] but this very same analysis isof interest overthere as well. However, in [22], such an analysis isn’t for the purpose of the discussion of thealgorithm’s computational complexity but is rather utilized to ensure that the algorithm misses a specificstationary point right at the beginning.We recall that the CLuP’s first step (iteration) amounts to determining x (1) as x (1) = x (1 ,s ) k x (1 ,s ) k with x (1 ,s ) = arg min x − ( x (0) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (4)The analysis of the above optimization problem is another simple exercise within the Random DualityTheory (RDT). We will study way more involved problems than this one in one of our companion papers,and we may on occasion revisit this one again overthere in a bit more detail. Here though we will just sketchthe RDT analysis relying on what we presented in [22] and in a host of our earlier papers [11–21]. The aboveproblem is of course very similar to the problems considered in [22] and a majority of the discussion thatapplied overthere will be in place here again. As we are now not focused on all the key parameters that weconsidered in [22] some steps can be done even faster. To that end we first quickly rewrite (4) asmin z − ( x (0) ) T ( x sol − z )subject to k σ v + A z k ≤ r z ∈ (cid:2) , / √ n (cid:3) n , (5)4nd further as min z ( x (0) ) T z subject to k σ v + A z k ≤ r z ∈ (cid:2) , / √ n (cid:3) n . (6)Relying on the same concentration strategy that we introduced in [11–21] and considered in [22], we herealso set k z k = c ,z and ( x ) T z = s . Then the following is the object of interest ξ p, ( α, σ, c ,z , s ) = lim n →∞ √ n E min z k σ v + A z k subject to k z k = c ,z ( x (0) ) T z = s z ∈ (cid:2) , / √ n (cid:3) n . (7)Now we are in position to mimic what we presented in early sections of [22]. However, we do mention righthere that below we will try to avoid as many of the unnecessary repetitive explanations as possible andinstead focus on the key differences. As in [22], we again start by doing the RDT’s first step which amountsto forming the deterministic Lagrange dual.
1. First step – Forming the deterministic Lagrange dual
Again following a large body of our earlier work we have (see, e.g. ( [12–16, 18, 19]))min z max k λ k =1 ,γ,ν λ T (cid:18) [ A v ] (cid:20) z σ (cid:21)(cid:19) + ν (( x (0) ) T z − s ) + γ ( k x k − c ,z )subject to z ∈ (cid:2) , / √ n (cid:3) n . (8)As in [22], given that we are interested in a statistical and large dimensional scenario, ν and γ will concentrateand as scalars can be discretized so that the resulting optimization over these two quantities can be takenoutside max γ,ν min z max k λ k =1 λ T (cid:18) [ A v ] (cid:20) z σ (cid:21)(cid:19) + ν (( x (0) ) T z − s ) + γ ( k x k − c ,z )subject to z ∈ (cid:2) , / √ n (cid:3) n . (9)
2. Second step – Forming the Random dual
Following further the principles of the analysis presented in [22] we again introduce the so-called randomdual . Let ¯ Z = [0 , / √ n ] n . In this particular case the random dual is the following problemmax γ,ν min z ∈ ¯ Z max k λ k =1 λ T g q k z k + σ + k λ k ( h T z + h σ ) + ν (( x (0) ) T z − s ) + γ ( k z k − c ,z ) , (10)where we again have as earlier that the components of the newly introduced m and n dimensional vectors g and h are i.i.d. standard normals and h is yet another standard normal independent of all other randomvariables. One here also observes that the minus sign in front of the second term typically present in some ofthe analysis in [22] is not that important here and we remove it. Let ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) be the followinglim n →∞ √ n E min z ∈ ¯ Z max k λ k =1 λ T g q k z k + σ + k λ k ( h T z + h σ ) + ν (( x (0) ) T z − s ) + γ ( k z k − c ,z ) . (11)
3. Third step – Handling the Random dual
Finally, in the third step we proceed with the analysis of the above random dual following once again5tep by step the strategy outlined in [12–16, 18, 19]. The inner optimization over λ is again very easy andone has min c max γ,ν min z k g k q c ,z + σ + ( h T z + h σ ) + ν (( x (0) ) T z − s ) + γ ( k z k − c ,z )subject to z ∈ (cid:2) , / √ n (cid:3) n . (12)Following again say [12] we define f box, ( h ; c ,z , s ) = max γ,ν min x h T z + ν (( x (0) ) T z − s ) + γ ( k z k − c ,z )subject to z ∈ (cid:2) , / √ n (cid:3) n . (13)With the addition of the s constraint this is now literally identical to the box constrained problem consideredin [12] and we could immediately use the solution given there with a little bit of modification similar to theones presented in [22] to account for s and ν . Basically, instead of (110) from [12] one now has f box, ( h ; c , c ) = max γ,ν √ n n X i =1 f (1) box, ( h i , γ, ν ) ! − νs √ n − γc ,z √ n, (14)where f (1) box, ( h i , γ, ν ) = , h i + ν x (0) i ≥ − ( h i + ν x (0) i ) γ , − γ ≤ h i + ν x (0) i ≤ h i + ν x (0) i ) + 4 γ, h i + ν x (0) i ≤ − γ, (15)and γ and ν are √ n scaled versions of γ and ν from (13) and x (0) i in (15) is also √ n scaled (basically it is justthe sign of the initial x (0) i ). Utilizing further the strategies outlined in [22] one also has for the optimizing z i z i = 1 √ n min max , − h + ν x (0) i γ !! , ! . (16)Assuming that the initial x (0) has ρn components equal to √ n and (1 − ρ ) n components equal to − √ n andafter solving the integrals one has E f (1) box, ( h i , γ, ν ) = ρI , ( γ, ν ) + (1 − ρ ) I , ( γ, − ν ) + ρI , ( γ, ν ) + (1 − ρ ) I , ( γ, − ν ) , (17)where I , ( γ, ν ) = − ( exp ( − . γ + ν ) )( ν − γ ) + p π/ ν + 1)erf(2 √ γ + 1 / √ ν ) − p π/ ν + 1)erf( ν/ √ − exp ( − . ν ) ν ) / (4 √ πγ ) I , ( γ, ν ) = (4 γ + 2 ν ) . γ + ν ) / √ − exp ( − / γ + ν ) ) / √ π. (18)Finally a combination of (10)-(18) gives ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) = √ α q c ,z + σ + E f (1) box, ( h i , γ, ν ) − νs − γc ,z . (19)The following theorem summarizes what we presented above. Theorem 1. (CLuP – RDT estimate – first iteration) Let ξ p, ( α, σ, c ,z , s ) and ξ (1) RD ( α, σ ; c , c , γ, ν ) be asin (7) and (19), respectively. Then ξ p, ( α, σ, c ,z , s ) = max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) . (20)6 onsequently, min c ,z ξ p, ( α, σ, c ,z , s ) = min c ,z max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) . (21) Proof.
Follows from the above derivation and the general RDT concepts presented in [12–16, 18, 19] and thefact that the strong random duality trivially holds.Given that the strong random duality is in place one can continue further and obtain the exact estimatesfor all other relevant quantities. We formalize that below.
Given that the above presentation may have gone a bit too much into a mathematical analysis, we belowpresent a few keys steps one needs to perform to actually calculate pretty much all quantities of interest. Ofcourse, we do emphasize that all of that is possible precisely because of the above analysis.
Summarized formalism to handle the CLuP’s first iteration
First consider the following optimization problem { ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z , ˆ s (1)1 } = arg min s s subject to min ≤ c ,z ≤ max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) = r. (22)Let s x, ( γ, ν ) = − ν/ /γ ( . ν/ √ − . ν + 4 γ ) / √ / /γ/ √ π ( exp ( − ν / − exp ( − (4 γ + ν ) / s xsq, ( γ, ν ) = − I , ( γ, ν ) /γs x, ( γ, ν ) = 2( . γ + ν ) / √ s xsq, ( γ, ν ) = 2 s x, . (23)Utilizing (16) we have √ n E z i = ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ) + ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ) n E z i = ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) + ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) . (24)Given that x i = x sol − z i we finally also have √ n E x i = 1 − ( ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ) + ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) )) n E x i = ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) + ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) )+2 E x i − , (25)and E (( x sol ) T x ) = 1 − ( ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ) + ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) )) E k x k = ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) + ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) )+2 E (( x sol ) T x ) − . (26)Of course, given that the strong random duality is in full power the above quantities are actually theconcentrating values and the concentration is exponential in n . Finally, utilizing (16) one can also get the7stimate for the probability of error p (1) err = 1 − P (cid:18) z i ≤ √ n (cid:19) = 1 − (cid:18) ρ (cid:18)
12 erfc (cid:18) − γ (1) − ˆ ν (1) √ (cid:19)(cid:19) + (1 − ρ ) (cid:18)
12 erfc (cid:18) − γ (1) + ˆ ν (1) √ (cid:19)(cid:19)(cid:19) . (27) In this section we present a set of numerical results that relate to the above analysis. Numerical analysis isneeded for both, the theoretical values discussed above and for their simulated counterparts. We start, withthe theoretical predictions and the numerical evaluations of the critical parameters discussed in the analysispresented above. In Table 2 we show the theoretical values for various system parameters obtained based onTheorem 1 for two different values of SNR, 1 /σ = 10[db] and 1 /σ = 13[db].Table 2: Theoretical values for various system parameters obtained utilizing Theorem 1 /σ [db] ˆ ν (1) ˆ γ (1) ˆ c (1)1 ,z ˆ s (1)1 ξ (1) RD p (1) err k x (1 ,s ) k ( x sol ) T x (1 ,s ) . . . − . . . . . . . . − . . . . . To give a bit of a feeling as to what kind of precision level the random duality theory typically achieves evenfor moderate problem dimensions we in Table 3 show the corresponding simulated values. The simulatedvalues are obtained for α = 0 . n = 400, and r sc = 1 .
3. Given that r plt = . ξ (1) RD = r = r sc r plt = . Theoretical / simulated values for various system parameters obtained based on Theorem 1 /σ [db] ˆ s (1)1 ξ (1) RD p (1) err k x (1 ,s ) k ( x sol ) T x (1 ,s ) − . / . − . / . . / . . / . . / . − . / . − . / . . / . . / . . / . ensure that one circumvents one of the stationary points right at the beginning after the first iteration.Looking at the tables one observes c = 0 . c = 0 . . random duality theory we will notrepeatedly stress throughout the paper but do mention here that whenever we determine certain expectedvalue of a quantity associated with a vector as a whole that will be the concentrating point and the underlyingconcentration is rather overwhelming (exponential in n ).We also in Table 4 show how the results change as the dimension n changes. As we have mentionedabove, already for n = 400 the results are fairly close to the theoretical predictions. For n = 1600 theyare almost identical across all key parameters. Also, to make notation a bit easier in the above derivationwe skip utilizing indices to emphasize that we are discussing the first iteration. In tables we added themas superscripts with a rather obvious meaning. Throughout the paper we will try to maintain the sameapproach and when not necessary we may actually skip adding iteration or other indices. The above analysis is of course related to the first iteration of the algorithm. In this section we discuss thesecond iteration. This is the major, key step in the discussion of the complexity of the entire algorithm and8able 4:
Simulated values for p (1) err , ˆ s (1)1 , k x (1 ,s ) k , and ( x sol ) T x (1 ,s ) ; α = 0 . r sc = 1 . n = { , , , , } n p (1) err ˆ s (1)1 k x (1 ,s ) k ( x sol ) T x (1 ,s )
100 165 . . . .
200 954 . . . .
400 600 . . . .
800 465 . . . . . . . . ∞ – theory − . . . . all other steps are basically just a simple generalization of the foundational concepts needed for this secondstep. Some of the considerations that we present below will be similar to those presented when we discussedthe first iteration. On the other hand some of them will be very different. To ensure the easiness of theexposition we will try to rely on what we presented above as much as possible. However, we will also try toavoid repetitive reexplaining of the already introduced concepts and, as usual, will try to put the emphasison the key differences.We start by recalling that the CLuP’s second iteration amounts to determining x (2) as x (2) = x (2 ,s ) k x (2 ,s ) k with x (2 ,s ) = arg min x − ( x (1) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (28)We then closely follow what was done earlier and first rewrite (28) asmin z − ( x (1) ) T ( x sol − z )subject to k σ v + A z k ≤ r z ∈ (cid:2) , / √ n (cid:3) n , (29)and further as min z ( x (1) ) T z subject to k σ v + A z k ≤ r z ∈ (cid:2) , / √ n (cid:3) n . (30)As in the previous section, we will again rely on the same concentration strategy introduced in [11–21] andconsidered in [22], and set k z k = c ,z and ( x (1 ,s ) ) T z = s . Analogously to (7) one can then view thefollowing optimization problem as the object of interest ξ p, ( α, σ, c ,z , s ) = lim n →∞ √ n E min z k σ v + A z k subject to k z k = c ,z ( x (1 ,s ) ) T z = s z ∈ (cid:2) , / √ n (cid:3) n . (31)This problem of course structurally completely matches the one in (7). So, conceivably, one can thenproceed with an analysis similar to the one presented in the previous section right after (7). That is asan approximation possible but is likely to lead to inaccurate estimates already on the level of the seconditeration. Those potential inaccuracies would have a chance to be even more pronounced after propagations9hrough later iterations (plus one would have to likely face structurally similar problems in later iterationsas well and if similar approximations are to be made again they might introduce another set of inaccuracieson their own that could also become more pronounced after going through the iterations that would follow).So, where is actually the core of the problem? Namely, while the problems in (7) and (31) do look almostidentical, they also have one key difference. Instead of x (0) that one has in (7), in (31) one has x (1) . If x (1) were a constant or randomly generated the approach from the previous section could be used; it is just thatit would have to be slightly adjusted. However, the problem one faces here is much bigger. It is not justthat x (1) is different because of the way how it is generated, its randomness actually depends on the problemstructure and can not in principle be separated from it as it was in the previous section for x (0) .We will eventually provide a way to estimate ξ p, ( α, σ, c ,z , s ). However, a lot of work will be neededbefore reaching the point to be able to do that. So, instead of jumping directly to (31) one can rewrite (30)in the following way min z , z (1) ( x (1) ) T z subject to k σ v + A z k ≤ r z ∈ (cid:2) , / √ n (cid:3) n k σ v + A z (1) k ≤ r k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , (32)where the bottom portion of the constraints is pretty much artificially added for the completeness. From theanalysis in the previous section it is clear that there is not really much of freedom to optimize over z (1) . Onecan now proceed with the standard RDT steps. As usual, the first one amounts to forming the deterministicLagrange dual.
1. First step – Forming the deterministic Lagrange dual
Also as usual, we once again follow a large body of our earlier work and obtain (see, e.g. [12–16, 18, 19])min z , z (1) max γ,γ ≥ ( x (1) ) T z + γ ( k σ v + A z k − r ) + γ ( k σ v + A z (1) k − r )subject to z ∈ (cid:2) , / √ n (cid:3) n k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , (33)After a couple of cosmetic changes one also hasmin z , z (1) max γ,γ ≥ max k λ k =1 , k λ k =1 ( x (1) ) T z + γ z sc λ T (cid:2) A v (cid:3) (cid:20) z σ (cid:21) / z sc − γr + γ z (1) sc λ T (cid:2) A v (cid:3) (cid:20) z (1) σ (cid:21) / z (1) sc − γ r subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , z (1) sc = q k z (1) k + σ . (34)10elying on the concentration of γ and γ one further hasmax γ,γ ≥ min z , z (1) max k λ k =1 , k λ k =1 ( x (1) ) T z + γ z sc λ T (cid:2) A v (cid:3) (cid:20) z σ (cid:21) / z sc − γr + γ z (1) sc λ T (cid:2) A v (cid:3) (cid:20) z (1) σ (cid:21) / z (1) sc − γ r subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , z (1) sc = q k z (1) k + σ . (35)
2. Second step – Forming the Random dual
Continuing to follow further the analysis presented in [22] and the principles introduced earlier in e.g.[12–16, 18, 19]) we can again introduce the so-called random dual . However, things are much more subtlethis time. Let ¯ Z = [0 , / √ n ] n . One considers then the following object f RD = ( x (1) ) T z + γ z sc f RD, − γr + γ z (1) sc f RD, − γ r, where f RD, = λ T ( q (1) g + q − ( q (1) ) g (1) ) + (( p (1) h + q − ( p (1) ) h (1) ) z / z sc +( p (1) h + q − ( p (1) ) h (1)0 ) σ/ z sc ) , (36)and f RD, = λ T g + h T z (1) / z (1) sc + h σ/ z (1) sc , (37)and as usual, the components of all g , h , g (1) , and h (1) and h (1)0 are i.i.d. standard normals. It is not thathard to guess that the corresponding random dual is thenmin q (1) max p (1) max γ,γ ≥ min z , z (1) max k λ k =1 , k λ k =1 f RD subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , z (1) sc = q k z (1) k + σ (cid:20) z σ (cid:21) T (cid:20) z (1) σ (cid:21) / z sc / z (1) sc = q (1) λ T λ = p (1) . (38)There are a couple of things we should now add. If one looks solely at f RD, that does seem perfectly fineon its own. It in fact is exactly the portion of the random dual that corresponds to the z (1) portion of theobjective in (35) (an easy comparison with what was done in the previous section would quickly confirmthat). Analogously, one then expects that similar object should be the portion of the random dual thatcorresponds to the z portion of the objective in (35). That is of course exactly f RD, . Everything wouldbe rather smooth if there were no p (1) , q (1) . The question is of course where these come from. That wouldbe obviously very hard to understand right now and even way harder to explain without going into heavymachinery of the underlying fundamentals of the random duality theory itself. As such a discussion wouldovertake over the main topic of this paper, which is the complexity analysis of the CLuP algorithm, we leave11t for a separate paper where we will revisit some of the random duality theory fundamentals. Here though,we just briefly mention that the meaning of the p (1) , q (1) parameters is the following: q (1) is roughly speakingthe presumed concentrating value of the the so-called optimal achieving (cid:20) z σ (cid:21) -cross-overlap, i.e. q (1) ≈ (cid:20) z σ (cid:21) T (cid:20) z (1) σ (cid:21) / z sc / z (1) sc , (39)and p (1) is roughly speaking the presumed concentrating value of the the so-called optimal achieving λ -cross-overlap, i.e. p (1) ≈ λ T λ . (40)Of course, as one may guess, things are actually way more complicated since the above mentioned con-centrating values are not just over the standard randomness but also over certain the so-called Gibbsianmeasures randomness as well (both z ’s and both λ ’s above in such situations are running over all allowed z ’s and λ ’s). Those types of measures and their randomness appear within the random duality theory assome of the most crucial objects and are way harder mathematical concepts than the regular Gaussian onesdiscussed above. As mentioned above, to avoid being sidetracked with all these mathematical complicationswe leave more detailed discussions in these directions for separate companion papers.One can then proceed with handling (38). That is in principle simple if one follows what we did inthe previous section and earlier in [22] and ultimately in e.g. [12–16, 18, 19]). However, just looking at theproblem one quickly observes that there are quite a few variables to optimize over. So, instead of directlyworking with (38) we will in a way emulate what we did in [22] and try to work through a few shortcuts. Inequation (8) of [22] we essentially had the type of problem that we have here in (38). Instead of solving itdirectly we in [22] created a shortcut mechanism starting from equation (9) and continuing further throughSection 2.1.1 (later on, in Section 3.2.2, we revisited the problem from equation (8) and solved it directly aswell). Here we will try to mimic the idea from equation (9) and Section 2.1.1 of [22]. We do mention thoughthat while in [22] this turned out to be the best mechanism here one may be able to find even better ones.However, as we will see later on even the mechanism that we present below works very well. There will bethe two main steps in the analysis process.
3. Third step – Rehandling the Random dual of the first iteration
To start things off we will first separately handle the following problemmin z (1) max k λ k =1 z (1) sc f RD, subject to k z (1) k = ˆ c (1)1 ,z ( x (0) ) T z (1) = ˆ s (1)1 z (1) ∈ (cid:2) , / √ n (cid:3) n , z (1) sc = q k z (1) k + σ , (41)and utilize the obtained z (1) sc . Not only does this emulate what we did in [22], it in a way also emulates thenatural flow of the CLuP algorithm. It is now beyond trivial to recognize that the solution of (41) is exactlywhat we obtained in the previous section. To be more precise, from (16) one actually has z (1) i = 1 √ n min max , − h i + ν x (0) i γ !! , ! , (42)and x (1 ,s ) i = 1 − z (1) i = 1 √ n − min max , − h + ˆ ν (1) x (0) i γ (1) !! , !! , (43)where from this point on to lighten writing we actually instead of x sol assume its value of all ones scaled by √ n . 12 . Fourth step – Handling the real Random dual of the second iteration Finally we are in position to complete the last piece of magic. That amounts to solving the followingproblem min z max k λ k =1 z sc f RD, subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ . (44)Relying again on the concentration concept discussed on many occasions in this and the previous section (aswell as in [22]) and earlier in [12–16, 18, 19]) and connecting to (31)min z max k λ k =1 z sc f RD, subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x (1 ,s ) ) T z = s . (45)The above is in principle the core of the mechanism. However, to ensure that we can track the behaviorof all critical parameters we will also add another concentrating constraint x Tsol z = T z / √ n = s ( is ofcourse the n -dimensional column vector of all ones) so that we actually havemin z max k λ k =1 z sc f RD, subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x (1 ,s ) ) T z = s √ n T z = s λ T ˆ λ = p (1) , (46)where ˆ λ = g / k g k is obtained trivially from (37) and (41). One should also keep in mind that q (1) = (cid:20) z σ (cid:21) T (cid:20) z (1) σ (cid:21) / z sc / z (1) sc = s − s + σ p c ,z + σ p ˆ c ,z + σ . (47)Finally, after plugging back the value for f RD, from (36) one hasmin z max k λ k =1 z sc λ T ( q (1) g + q − ( q (1) ) g (1) ) + ( p (1) h + q − ( p (1) ) h (1) ) z subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x (1 ,s ) ) T z = s T z = √ ns λ T g = k g k p (1) , (48)where the last term in f RD, in (36), ( p (1) h + p − ( p (1) ) h (1)0 ) σ , is neglected. Choosingˆ λ = p (1) g + p − ( p (1) ) g (1) k p (1) g + p − ( p (1) ) g (1) k , (49)13nd averaging over g and g (1) we from (48) then obtainmin z √ αn q k z k + σ (cid:18) q (1) p (1) + q − ( q (1) ) q − ( p (1) ) (cid:19) + ( p (1) h + q − ( p (1) ) h (1) ) z subject to z ∈ (cid:2) , / √ n (cid:3) n k z k = c ,z ( x (1 ,s ) ) T z = s T z = √ ns . (50)After writing the Lagrange dual one obtains a problem very similar to (12)max γ,ν,ν min z L ( γ, ν, ν )subject to z ∈ (cid:2) , / √ n (cid:3) n , (51)where L ( γ, ν, ν ) = √ αn q k z k + σ (cid:18) q (1) p (1) + q − ( q (1) ) q − ( p (1) ) (cid:19) + h (1 ,p ) z + γ ( k z k − c ,z ) + ν (( x (1 ,s ) ) T z − s + ν ( T z − √ ns )) , (52)with h (1 ,p ) = ( p (1) h + p − ( p (1) ) h (1) ). Similarly to (12), we refer to the expected value of the √ n scaledversion of the above objective as ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ). Then analogously to (13) (andfollowing into the footsteps of say [12]) we define f box, ( h (1 ,p ) ; c ,z , s , s ) = max γ,ν,ν min z h (1 ,p ) z + γ ( k z k − c ,z ) + ν (( x (1 ,s ) ) T z − s + ν ( 1 √ n T z − s ))subject to z ∈ (cid:2) , / √ n (cid:3) n . (53)The only thing that is different now compared to (13) is that we now have an extra constraint related to s .This though changes nothing with respect to the methodology that we applied after (13) and we could utilizethe solution obtained there with a slight modification to account for s and ν . That essentially means thatinstead of (14) (and earlier (110) from [12]) one now has f box, ( h (1 ,p ) ; c ,z , s , s ) = max γ,ν √ n n X i =1 f (1) box, ( h (1 ,p ) i , γ, ν ) ! − νs √ n − νs √ n − γc ,z √ n, (54)where f (1) box, ( h (1 ,p ) i , γ, ν, ν ) = , h (1 ,p ) i + ν x (0) i ≥ − ( h (1 ,p ) i + ν x (1 ,s ) i + ν ) γ , − γ ≤ h (1 ,p ) i + ν x (1 ,s ) i + ν ≤ h (1 ,p ) i + ν x (1 ,s ) i + ν ) + 4 γ, h (1 ,p ) i + ν x (1 ,s ) i + ν ≤ − γ, (55)with the same scaling discussion regarding γ , ν , ν , and x (0) i as after (15). Following further what was donein the previous section (and earlier outlined in [22]) one can also determine the optimizing z i and x (2 ,s ) i z (2) i = 1 √ n min max , − h (1 ,p ) i + ν x (1 ,s ) i + ν γ !! , ! x (2 ,s ) i = 1 √ n − z (2) i = 1 √ n − min max , − h (1 ,p ) i + ν x (1 ,s ) i + ν γ !! , !! , (56)14here we also recall from (43) x (1 ,s ) i = 1 − z (1) i = 1 √ n − min max , − h + ˆ ν (1) x (0) i γ (1) !! , !! . (57)As earlier, if we assume that the initial x (0) has ρn components equal to √ n and (1 − ρ ) n components equalto − √ n we have for the objective E f (1) box, ( h i , h (1) i , γ, ν, ν ) = ρI (2)1 ( γ, ν, ν , ˆ ν (1) , ˆ γ (1) ) + (1 − ρ ) I (2)1 ( γ, ν, ν , − ˆ ν (1) , ˆ γ (1) ) , (58)where I (2)1 ( γ, ν, ν , ˆ ν (1) , ˆ γ (1) ) = Z Z (( h (1 ,p ) i + ν x (1 ,s ) + ν ) z (2) i + γ (cid:16) z (2) i (cid:17) ) exp − (cid:16) h (1) i (cid:17) + h i d h (1) i d h i π , (59)and for γ < ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) = √ α q c ,z + σ (cid:18) q (1) p (1) + q − ( q (1) ) q − ( p (1) ) (cid:19) + E f (1) box, ( h i , h (1) i , γ, ν, ν ) − νs − ν s − γc ,z . (60)Now we are in position to give a sort of a brief summary of the entire formalism that we presented above.It is essentially analogous to what we discussed after Theorem 1. While we will try to emulate all the ideasfrom the previous section, there are quite a few new elements here that need to be incorporated and we willtry to emphasize all of that in the formalism below. Similarly to what we did in Section 2.1, we below present critical steps needed to actually calculate all thequantities of interest. Of course, the above analysis is at the core of all the underlying mechanisms thatbasically enable performing these steps.
Summarized formalism to handle the CLuP’s second iteration
Differently from Section 2.1, here we will split the presentation into several separate parts.
I) First part – Handling the first iteration
The first part of the formalism essentially reflects on the above analysis by recognizing that what iteffectively accomplished was rehandling the CLuP’s first iteration. That basically means that one first solvesthe following problem φ (1) a = { ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z , ˆ s (1)1 } = arg min s s subject to min ≤ c ,z ≤ max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) = r. (61)to obtain set of parameters { ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z , ˆ s (1)1 } that enter the second iteration. Recalling on (23) s x, ( γ, ν ) = − ν/ /γ ( . ν/ √ − . ν + 4 γ ) / √ / /γ/ √ π ( exp ( − ν / − exp ( − (4 γ + ν ) / s xsq, ( γ, ν ) = − I , ( γ, ν ) /γs x, ( γ, ν ) = 2( . γ + ν ) / √ s xsq, ( γ, ν ) = 2 s x, . (62)15ne then has from (26) the first iteration values for the first two critical parameters related to thepropagation of the vector x through the CLuP algorithm that we particularly keep track ofˆ d (1)1 , E (( x sol ) T x (1 ,s ) ) = 1 − ( ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) )+ ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ))ˆ d (1)2 , E k x (1 ,s ) k = ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) )+ ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) + 2 E (( x sol ) T x (1 ,s ) ) − . (63)Moreover, from (42) and (43) we also have z (1) i = 1 √ n min max , − h i + ˆ ν x (0) i γ !! , ! , (64)and x (1 ,s ) i = 1 − z (1) i = 1 √ n − min max , − h + ˆ ν (1) x (0) i γ (1) !! , !! , (65)and from (27) the third critical parameter that we keep a track of through iterations, the probability oferror p (1) err = 1 − P (cid:18) z (1) i ≤ √ n (cid:19) = 1 − (cid:18) ρ (cid:18)
12 erfc (cid:18) − γ (1) − ˆ ν (1) √ (cid:19)(cid:19) + (1 − ρ ) (cid:18)
12 erfc (cid:18) − γ (1) + ˆ ν (1) √ (cid:19)(cid:19)(cid:19) . (66)The fourth critical parameter is of course the value of the objective and after a cosmetic change it isˆ s (1) , ˆ s (1)1 . One can then basically say that in addition to the solution after the first iteration, x (1 ,s ) , thefollowing set of critical plus auxiliary parameters is the output of the first iteration: φ (1) = { p (1) err , ˆ s (1) , ˆ d (1)2 , ˆ d (1)1 , ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z } , (67)where for simplicity we also emphasize in wording p (1) err − probability of error after the first iterationˆ s (1) = E (( x (0) ) T x (1 ,s ) ) − objective value after the first iterationˆ d (1)2 = E k x (1 ,s ) k − squared norm after the first iterationˆ d (1)1 = E x Tsol x (1 ,s ) − inner product with x sol after the first iteration . (68)It is probably not necessary to reemphasize but for the completeness we add that the last three quantitiesare viewed as expected values and since we are interested in large dimensional scenarios they are due tooverwhelming concentrations also the concentrating points. II) Second part – Handling the second iteration
Once the first iteration is handled one utilizes its parameters to basically run the second iteration. Thestrategy is conceptually to a degree similar to what we presented above for the first iteration. One startswith first solving the following optimization problem φ (2) a = arg min s ,s s − ˆ d (1)1 q ˆ d (1)2 subject to min q (1) max p (1) min ≤ c ,z ≤ max γ,ν,ν ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) = r, (69)where φ (2) a = { ˆ p (1) , ˆ q (1) , ˆ ν (2) , ˆ ν (2)2 , ˆ γ (2) , ˆ c (2)2 ,z , ˆ s (2)2 , ˆ s (2)3 } . (70)16ith a few exceptions the above seems rather natural extension of (61). The main changes are the read-justment of the objective and the appearance of p (1) and q (1) . Besides this the strategy in (69) essentiallyremains the same as in (61). This practically means that one wants to minimize the objective while keepingthe optimized ξ (2) RD ( α, σ ; c ,z , s , s , γ, ν, ν ) below r . The difference is that now the objective is not the s aswe had before but a rather different object that we explain below.Now, looking carefully at (28)-(31) and everything that followed afterwards, one can observe that insteadof the real objective − ( x (1) ) T ( x sol − z ) its a bit more convenient version s = ( x (1 ,s ) ) T z was utilized. Toreadjust for this we simply note that from (63) one easily has − ( x (1 ,s ) ) T ( x sol − z ) k x (1 ,s ) k = − ˆ d (1)1 + s q ˆ d (1)2 . (71)To be a bit more in alignment with what was done in the first part (in particular in (61)) one may rewrite(69) in the following way φ (2) a = arg min s,s ,s s subject to min q (1) max p (1) min ≤ c ,z ≤ max γ,ν,ν ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) = rs = ˆ d (1)1 + s q ˆ d (1)2 . (72)Now, carefully observing further (72) one can also note that what it basically does is that instead of aparameter s (which is natural to the above discussion), it actually reintroduces parameter s (the value ofthe objective) as a probably more natural object for the following of the algorithm’s flow. One can actuallycontinue that way with the second iteration analogues to the other two critical parameters that we mentionedabove in the summary of the first iteration’s formalism. Namely, if one defines analogously to (63)ˆ d (2)1 , E (( x sol ) T x (2 ,s ) ) = 1 − ˆ s (2)3 ˆ d (2)2 , E k x (2 ,s ) k = ˆ c (2)2 ,z + 2 E (( x sol ) T x (2 ,s ) ) − , (73)then (72) can be repositioned as φ (2) b = arg min s,d (2)1 ,d (2)2 s subject to min q (1) max p (1) min ≤ c ,z ≤ max γ,ν,ν ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) = rs = d (1)1 + s q d (1)2 s = 1 − d (2)1 c ,z = d (2)2 − d (2)1 + 1 , (74)where φ (2) b = { ˆ p (1) , ˆ q (1) , ˆ ν (2) , ˆ ν (2)2 , ˆ γ (2) , ˆ s (2) , ˆ d (2)2 , ˆ d (2)1 } . (75)Finally, if one for a moment recalls (47) then (74) can also be rewritten as φ (2) b = arg min s,d (2)1 ,d (2)2 s subject to max p (1) min ≤ c ,z ≤ max γ,ν,ν ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) = rs = d (1)1 + s q d (1)2 s = 1 − d (2)1 ,z = d (2)2 − d (2)1 + 1 q (1) = s − s + σ p c ,z + σ p ˆ c ,z + σ . (76)We also recall that ξ (2) RD ( α, σ ; p (1) , q (1) , c ,z , s , s , γ, ν, ν ) is given through (57)-(60) and that is where theremaining auxiliary parameters from the first iteration, ˆ ν (1) and ˆ γ (1) , come into the play as well.Similarly to (66) one also has p (2) err = 1 − ( ρp cor (ˆ ν (1) ) + (1 − ρ ) p cor ( − ˆ ν (1) )) , (77)where p cor (ˆ ν (1) ) = Z Z ((sign( x (2 ,s ) ) + 1) / exp − (cid:16) h (1) i (cid:17) + h i d h (1) i d h i π . (78)For the completeness we also add thatˆ d (2)2 , + (ˆ ν (1) ) = Z Z (( x (2 ,s ) i ) exp − (cid:16) h (1) i (cid:17) + h i d h (1) i d h i π ˆ d (2)1 , + (ˆ ν (1) ) = Z Z (( x (2 ,s ) i ) exp − (cid:16) h (1) i (cid:17) + h i d h (1) i d h i π ˆ s (2)2 , + (ˆ ν (1) ) = Z Z (( x (1 ,s ) i ) z (2) i exp − (cid:16) h (1) i (cid:17) + h i d h (1) i d h i π , (79)and ˆ d (2)2 = ρ ˆ d (2)2 , + (ˆ ν (1) ) + (1 − ρ ) ˆ d (2)2 , + ( − ˆ ν (1) )ˆ d (2)1 = ρ ˆ d (2)1 , + (ˆ ν (1) ) + (1 − ρ ) ˆ d (2)1 , + ( − ˆ ν (1) )ˆ s (2)2 = ρ ˆ s (2)2 , + (ˆ ν (1) ) + (1 − ρ )ˆ s (2)2 , + ( − ˆ ν (1) ) . (80)Finally, similarly to the end of the summary of the first part, here one can also say that in addition tothe solution after the second iteration, x (2 ,s ) , the following set of critical plus auxiliary parameters is theoutput of the second iteration: φ (2) = { p (2) err , ˆ s (2) , ˆ d (2)2 , ˆ d (2)1 , ˆ ν (2) , ˆ ν (2)2 , ˆ γ (2) , ˆ p (1) , ˆ q (1) , ˆ c (1)2 ,z , ˆ s (2)2 , ˆ s (2)3 } , (81)where we again for simplicity use the wording to emphasize p (2) err − probability of error after the second iterationˆ s (2) = E (( x (1) ) T x (2 ,s ) ) − objective value after the second iterationˆ d (2)2 = E k x (2 ,s ) k − squared norm after the second iterationˆ d (2)1 = E x Tsol x (2 ,s ) − inner product with x sol after the second iteration . (82)It goes again without much of a discussion that the last three quantities are viewed as expected/concentratingvalues. 18 .2 Numerical results – second iteration To follow into the footsteps of the discussion regarding the analysis of the CLuP’s first iteration, we in thissection present a set of numerical results that relate to the above analysis, in particular to the CLuP’s seconditeration. As in Section 2.2 a numerical analysis is needed for both, the theoretical and simulated values.We again start with the theoretical predictions. To that end, we first actually recall that the input to theanalysis of the second iteration is φ (1) = { p (1) err , ˆ s (1) , ˆ d (1)2 , ˆ d (1)1 , ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z } = { . , − . , . , . , . , . , . } . (83)From (81) for the second iteration’s output set of parameters we have φ (2) = { p (2) err , ˆ s (2) , ˆ d (2)2 , ˆ d (2)1 , ˆ ν (2) , ˆ ν (2)2 , ˆ γ (2) , ˆ p (1) , ˆ q (1) , ˆ c (1)2 ,z , ˆ s (2)2 , ˆ s (2)3 } . (84)We show the theoretical values for some of the system parameters obtained based on the above analysis forSNR, 1 /σ = 13[db], α = 0 .
8, and r sc = 1 . Theoretical values for various system parameters obtained utilizing the above analysis /σ [db] ˆ ν (2) ˆ ν (2)2 ˆ γ (2) ˆ p (1) ˆ s (2) ξ (2) RD p (2) err k x (2 ,s ) k ( x sol ) T x (2 ,s ) . − . . . − . . . . . easily obtain the remaining parameters from φ (2) , i.e. { ˆ q (1) , ˆ c (1)2 ,z , ˆ s (2)2 , ˆ s (2)3 } . In Table 6 we show the resultsobtained utilizing both, the equality constraints in (76) as well as (79) and (80).Table 6: Theoretical values for { ˆ q (1) , ˆ c (1)2 ,z , ˆ s (2)2 , ˆ s (2)3 } obtained utilizing (76) ( bold ) as well as (79) and (80)( purple bold ) /σ [db] ˆ s (2)2 = ˆ d (1)1 + ˆ s (2) q ˆ d (1)2 ˆ s (2)3 = 1 − ˆ d (2)1 ˆ c (2)2 ,z = ˆ d (2)2 − d (2)1 + 1 ˆ q (1) = ˆ s − ˆ s + σ √ ˆ c ,z + σ √ ˆ c ,z + σ − . / − . . / . . / . . / . Similarly to what we did in Section 2.2.1, we below provide a set of results obtained through numericalsimulations. In Table 7 we show the simulated values that correspond to the above theoretical predictions.We obtained these values for α = 0 . n = 1600, and r sc = 1 .
3. As earlier, since r plt = . ξ (2) RD = r = r sc r plt = . Theoretical / simulated values for various system parameters obtained based on the above analysis /σ [db] ˆ s (2) ξ (2) RD p (2) err k x (2 ,s ) k ( x sol ) T x (2 ,s ) − . / − . . / . . / . . / . . / . kind of effect the change of the problem dimension n has on all the key parameters. The simulated valuesare again very close to the theoretical predictions. In particular, already when n = 1600 all parametersare almost equal to the theoretical estimates. One can also observe that as n grows almost all parametersare getting closer to the theoretical values. Finally, in Table 9 we show the progress through the first twoiterations of all the critical parameters and their simulated values (as in Table 7, α = 0 . r sc = 1 .
3, and n = 1600). 19able 8: Simulated values for p (2) err , − ˆ s (2) , k x (2 ,s ) k , and ( x sol ) T x (2 ,s ) ; α = 0 . r sc = 1 . n = { , , , , } n p (2) err − ˆ s (2) ˆ d (2)2 = k x (2 ,s ) k ˆ d (2)1 = ( x sol ) T x (2 ,s )
100 165 . . . .
200 954 . . . .
400 600 . . . .
800 465 . . . . . . . . ∞ – theory − . . . . Table 9:
Theoretical / simulated values for key system parameters through the first two iterations k /σ [db] − ˆ s ( k ) ξ ( k ) RD p ( k ) err ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . ( k + 1) -th iteration performance analysis Given that the above discussion demonstrated that one can handle the algorithm’s first two iterations onenaturally wonders can it be extended so that it eventually covers all iterations. The answer is yes. Thatis indeed in principle possible. Moreover, not much more needs to be added to the already introducedtechnical/strategic components of the analysis. However, the number of the running parameters starts torapidly increase. That doesn’t change much when it comes to conceptual handling all of them. What doesbecome affected though are the numerical evaluations. We will below briefly sketch how one can extend theabove analysis and then show a couple of shortcuts regarding the numerical evaluations. In the first part wewill essentially closely follow the presentation of the previous section. Instead of discussing all the detailswe will focus on the final results.We of course start by restating the CLuP’s k + 1-th iteration underlying optimization problem. Namely,it boils down to finding x ( k +1) in the following way x ( k +1) = x ( k +1 ,s ) k x ( k +1 ,s ) k with x ( k +1 ,s ) = arg min x − ( x ( k ) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (85)To handle this problem we of course rely on RDT and what we presented in Section 3.
1. First step – Forming the deterministic Lagrange dual
Following closely what was done in Section 3 one can arrive to the following analogue of (35)max γ,γ j − ≥ min z , z ( j ) max k λ k =1 , k λ ( j − k =1 ( x ( j ) ) T z + γ z sc λ T (cid:2) A v (cid:3) (cid:20) z σ (cid:21) / z sc − γr + f k subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z ( j ) k = ˆ c ( j )2 ,z ( x ( j − ) T z ( j ) = ˆ s ( j ) z ( j ) ∈ (cid:2) , / √ n (cid:3) n , z ( j ) sc = q k z ( j ) k + σ , (86)20here f k = k X j =1 (cid:18) γ j − z ( j ) sc ( λ ( j − ) T (cid:2) A v (cid:3) (cid:20) z ( j ) σ (cid:21) / z ( j ) sc − γ j − r (cid:19) , (87)and for completeness we also introduced ˆ c (1)2 ,z = ˆ c (1)1 ,z .
2. Second step – Forming the Random dual
To introduce the random dual we consider the following analogues to (36)-(38). First we look at thefollowing object f RD = ( x (1) ) T z + γ z sc f RD,k +1 − γr + k X j =1 γ j − z (1) sc f RD,j − γ j − r, where f RD,k +1 = λ T g ( k,q ) + ( h ( k,p ) ) T z / z sc + h ( k,p )0 σ/ z sc , (88)and f RD,j = ( λ ( j − ) T g ( j − ,q ) + ( h ( j − ,p ) ) T z ( j ) / z ( j ) sc + h ( j − ,p )0 σ/ z ( j ) sc , (89)and for each i Q ( k +1) = E h g (0 ,q ) i g (1 ,q ) i . . . g ( k,q ) i i T h g (0 ,q ) i g (1 ,q ) i . . . g ( k,q ) i i P ( k +1) = E h h (0 ,p ) i h (1 ,p ) i . . . h ( k,p ) i i T h h (0 ,p ) i h (1 ,p ) i . . . h ( k,p ) i i . (90) h doesn’t really play much of role but one can for the completeness assume that it is an extension of h indexed by 0 so that formally (90) holds for h as well. Also, as expected, the components of all g and h are i.i.d. standard normals (the independence is over index i ; also, g and h are independent of each otherfor any set of indices). Now one can define Z ( k +1) = (cid:20)(cid:20) z (1) σ (cid:21) z (1) sc (cid:20) z (2) σ (cid:21) z (2) sc . . . (cid:20) z ( k ) σ (cid:21) z ( k ) sc (cid:20) z σ (cid:21) z sc (cid:21) . (91)and Λ ( k +1) = (cid:2) λ (0) λ (1) . . . λ ( k − λ (cid:3) . (92)One then has for the random dual that corresponds to (38) the followingmin Q ( k +1) max P ( k +1) max γ,γ j ≥ min z , z ( j ) max k λ k =1 , k λ j k =1 f RD subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z ( j ) k = ˆ c ( j )2 ,z ( x ( j − ) T z ( j ) = ˆ s ( j ) z ( j ) ∈ (cid:2) , / √ n (cid:3) n , z ( j ) sc = q k z ( j ) k + σ ( Z ( k +1) ) T Z ( k +1) = Q ( k +1) (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) . (93)Where the remarks similar to (39) and (40) remain in place. In other words, the elements of matrices Q ( k +1) and P ( k +1) are basically the predicated concentrating points of all possible the so-called optimal achieving (cid:20) z σ (cid:21) - and λ -cross-overlaps, respectively. Similarly to what we had in the previous section, the concentratingvalues are not just over the standard randomness but also over certain the so-called Gibbsian measures21andomness as well.
3. Third step – Rehandling the Random dual of the first k iterations One starts with handling the first iterationmin z (1) max k λ k =1 z (1) sc f RD, subject to k z (1) k = ˆ c (1)2 ,z ( x (0) ) T z (1) = ˆ s (1) z (1) ∈ (cid:2) , / √ n (cid:3) n , z (1) sc = q k z (1) k + σ , (94)then moves to the second and so on. The key output quantities after the k -th iteration (some of which arealso needed for the ( k + 1)-th iteration) are x ( j,s ) i , z ( j ) i , λ ( j − , ≤ j ≤ k, (95)and φ ( k ) = { p ( k ) err , ˆ s ( k ) , ˆ d ( k )2 , ˆ d ( k )1 , ˆ ν ( k ) , ˆ ν ( k )2 , ˆ γ ( k ) , ˆ P ( k ) , ˆ Q ( k ) , ˆ c ( k )2 ,z , ˆ s ( k )2 , ˆ s ( k )3 } , (96)where we point out that ˆ ν ( k ) and ˆ s ( k )2 are ( k −
4. Fourth step – Handling the real Random dual of the ( k + 1) -th iteration Following into the footsteps of what was done earlier, we finally have the following optimization problem(essentially the ( k + 1)-th iteration analogue to the second iteration’s (46))min z max k λ k =1 z sc f RD,k +1 subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x ( j,s ) ) T z = s ,j , ≤ j ≤ k √ n T z = s ( Z ( k +1) ) T Z ( k +1) = Q ( k +1) (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) . (97)where the first k columns of both Z ( k +1) and Λ ( k +1) are obtained after the k -th iteration. Analogously to(47) one should here also keep in mind that Q ( k +1) k +1 ,j = (cid:20) z σ (cid:21) T (cid:20) z ( j ) σ (cid:21) / z sc / z ( j ) sc = s − s ,j + σ p c ,z + σ q ˆ c ( j )2 ,z + σ . (98)Taking f RD,k +1 from (88) and plugging it back in (97) we havemin z max k λ k =1 z sc ( λ T g ( k,q ) + ( h ( k,p ) ) T z / z sc + h ( k,p )0 σ/ z sc )subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x ( j,s ) ) T z = s ,j , ≤ j ≤ k √ n T z = s Z ( k +1) ) T Z ( k +1) = Q ( k +1) (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) . (99)Neglecting the last term in the objective we finally have the following analogue to (48)min z max k λ k =1 z sc λ T g ( k,q ) + ( h ( k,p ) ) T z subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x ( j,s ) ) T z = s ,j , ≤ j ≤ k √ n T z = s (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) . (100)We will also denote f ( k +1) sph = 1 √ m E max k λ k =1 λ T g ( k,q ) subject to (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) , (101)and then rewrite (100) as min z max k λ k =1 √ m z sc f ( k +1) sph + ( h ( k,p ) ) T z subject to z ∈ (cid:2) , / √ n (cid:3) n , z sc = q k z k + σ k z k = c ,z ( x ( j,s ) ) T z = s ,j , ≤ j ≤ k √ n T z = s , (102)which for all practical purposes is an analogue to (50). Now one can proceed as in the analysis of the seconditeration right after (50) and write the resulting Lagrange dual to obtain a problem structurally similar to(12) max γ,ν,ν min z L ( γ, ν, ν )subject to z ∈ (cid:2) , / √ n (cid:3) n , (103)where L ( γ, ν, ν ) = √ αn q k z k + σ f ( k +1) sph + h ( k,p ) z + γ ( k z k − c ,z ) + k X j =1 ˜ ν j (( x ( j,s ) ) T z − s ,j ) + ν ( T z − √ ns ) . (104)Following closely what we did earlier, we use ξ ( k ) RD ( α, σ ; P ( k +1) , Q ( k +1) , c ,z , s ,j , s , γ, ˜ ν j , ν ) to denote theexpected value of the above objective after it is scaled by √ n . One can then follow further the machinery ofsay [12]) and analogously to (53) (and earlier (13)) define f box,k ( h ( k,p ) ; c ,z , s ,j , s ) = max γ, ˜ ν j ,ν min z h ( k,p ) z + γ ( k z k − c ,z ) + k X j =1 ˜ ν j (( x ( j,s ) ) T z − s ,j )23 ν ( 1 √ n T z − s )subject to z ∈ (cid:2) , / √ n (cid:3) n . (105)Now, here is the key point. When one compares (53) to (13) the difference is an extra constraint relatedto s . On the other hand, when one compares (105) to (53) the difference is that instead of one constraintrelated to s in (53) one here has k constraints related to s ,j . However, the same conclusion made after(53) applies here as well. In other words, one can still utilize the solution obtained after (53) with a fewmodifications to account for s ,j and ˜ ν j . One then effectively replaces (54) (and earlier (14) and ultimately(110) from [12]) with f box,k +1 ( h ( k,p ) ; c ,z , s ,j , s ) = max γ,ν √ n n X i =1 f (1) box,k +1 ( h (1 ,p ) i , γ, ˜ ν j , ν ) ! − k X j =1 ˜ ν j s √ n − ν s √ n − γc ,z √ n, (106)where f (1) box,k +1 ( h ( k,p ) i , γ, ˜ ν j , ν ) = , h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν ≥ − ( h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν ) γ , − γ ≤ h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν ≤ h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν ) + 4 γ, h (1 ,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν ≤ − γ, (107)with the usual scaling discussion that we had after (55) being applicable here to γ , ˜ ν j , ν , and x (0) i as well.Analogously to (56) and (57) one then also has for the optimizing z i and x ( k +1 ,s ) i z ( k +1) i = 1 √ n min max , − h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν γ !! , ! x ( k +1 ,s ) i = 1 √ n − z (2) i = 1 √ n − min max , − h ( k,p ) i + P kj =1 ˜ ν j x ( j,s ) i + ν γ !! , !! , (108)where x ( j,s ) i , ≤ j ≤ k are obtained after the k -th iteration as stated in (95). Analogously to (58) and (59)we then have E f (1) box,k +1 ( h ( k,p ) i , γ, ˜ ν j , ν ) = ρI ( k +1)1 ( γ, ν, ν , ˆ ν (1) ) + (1 − ρ ) I ( k +1)1 ( γ, ν, ν , − ˆ ν (1) ) , (109)where I ( k +1)1 ( γ, ν, ν , ˆ ν (1) ) = E (( h ( k,p ) i + k X j =1 ˜ ν j x ( j,s ) i + ν ) z ( k +1) i + γ (cid:16) z ( k +1) i (cid:17) ) , (110)and for γ < ξ ( k +1) RD ( α, σ ; P ( k +1) , Q ( k +1) , c ,z , s ,j , s , γ, ˜ ν j , ν ) = √ α q c ,z + σ f ( k +1) sph + E f (1) box,k +1 ( h ( k,p ) i , γ, ˜ ν j , ν ) − k X j =1 ˜ ν j s ,j − ν s − γc ,z . (111)Below we present a brief summary of the above analysis. Since it conceptually closely follows the summariesthat we presented after the analysis of the first and the second iteration we will try to make this summaryas short as possible and basically rely on many ideas already introduced in earlier sections.24 .1 CLuP – summary of the ( k + 1) -th iteration performance analysis We split the summary into two parts. The first one that is basically trivial and the second one that containsthe key components of the analysis.
Summarized formalism to handle the CLuP’s ( k + 1) -th iteration As mentioned above there are two parts that we recognize as critical in understanding the whole analysismechanism.
I) First part – Handling the first k iterations This basically assumes just a simple recognition that the whole mechanism is in a way inductive in naturefor any k >
2. So, to start the induction one then assumes that the first k iterations are doable (for k = 1and k = 2 we have already shown that this is indeed the case) and continues further. To continue furtherone also recognizes the conclusion of the third step in the above discussion. That essentially amounts torecognizing that the key output quantities after the k -th iteration are x ( j,s ) i , z ( j ) i , λ ( j − , ≤ j ≤ k, (112)and φ ( k ) = { p ( k ) err , ˆ s ( k ) , ˆ d ( k )2 , ˆ d ( k )1 , ˆ ν ( k ) , ˆ ν ( k )2 , ˆ γ ( k ) , ˆ P ( k ) , ˆ Q ( k ) , ˆ c ( k )2 ,z , ˆ s ( k )2 , ˆ s ( k )3 } , (113)where ˆ ν ( k ) and ˆ s ( k )2 are ( k − ν ( k ) is essentially the vector of the optimal ˜ ν j , ≤ j ≤ k − k -th iteration and analogously, ˆ s ( k )2 is the vector of the optimal s ,j , ≤ j ≤ k − k -th iteration). As usual, a particular emphasis is on p ( k ) err − probability of error after the k -th iterationˆ s ( k ) = E (( x ( k − ) T x ( k,s ) ) − objective value after the k -th iterationˆ d ( k )2 = E k x ( k,s ) k − squared norm after the k -th iterationˆ d ( k )1 = E x Tsol x ( k,s ) − inner product with x sol after the k -th iteration . (114) II) Second part – Handling the ( k + 1) -th iteration We start with writing analogously to (69) and (70) φ ( k +1) a = arg min s ,j ,s s ,k − ˆ d ( k )1 q ˆ d ( k )2 subject to min Q ( k +1) max P ( k +1) min ≤ c ,z ≤ max γ, ˜ ν j ,ν ξ ( k +1) RD ( α, σ ; P ( k +1) , Q ( k +1) , c ,z , s ,j , s , γ, ˜ ν j , ν ) = r, (115)where φ ( k +1) a = { ˆ P ( k +1) , ˆ Q ( k +1) , ˆ ν ( k +1) , ˆ ν ( k +1)2 , ˆ γ ( k +1) , ˆ c ( k +1)2 ,z , ˆ s ( k +1)2 , ˆ s ( k +1)3 } , (116)and obviously ˆ ν ( k +1) is the k -dimensional vector of the optimal ˜ ν j , ≤ j ≤ k and ˆ s ( k +1)2 is the k -dimensionalvector of the optimal s ,j , ≤ j ≤ k . One can then repeat all the steps between (70) and (76) to arrive at φ ( k +1) b = arg min s,d ( k +1)1 ,d ( k +1)2 ,s ,j s subject to max P ( k +1) min ≤ c ,z ≤ max γ,ν,ν ξ ( k +1) RD ( α, σ ; P ( k +1) , Q ( k +1) , c ,z , s ,j , s , γ, ˜ ν j , ν ) = rs ,k = ˆ d ( k )1 + s q ˆ d ( k )2 s = 1 − d ( k +1)1 c ,z = d ( k +1)2 − d ( k +1)1 + 125 ( k +1) k +1 ,j = s − s ,j + σ p c ,z + σ q ˆ c ( j )2 ,z + σ , (117)where φ ( k +1) b = { ˆ P ( k +1) , ˆ Q ( k +1) , ˆ ν ( k +1) , ˆ ν ( k +1)2 , ˆ γ ( k +1) , ˆ s ( k +1) , ˆ d ( k +1)2 , ˆ d ( k +1)1 , ˆ s ( k +1)2 } . (118)Following (77) and (78) one also has p ( k +1) err = 1 − ( ρp cor (ˆ ν (1) ) + (1 − ρ ) p cor ( − ˆ ν (1) )) , (119)where p cor (ˆ ν (1) ) = E ((sign( x ( k +1 ,s ) ) + 1) / . (120)We skip rewriting the trivial analogues/adjustments of the considerations (79) and (80) and instead focusat the output of the ( k + 1)-th iteration. Besides the solution, x ( k +1 ,s ) , the following set of critical plusauxiliary parameters is the output of the ( k + 1)-th iteration: φ ( k +1) = { p ( k +1) err , ˆ s ( k +1) , ˆ d ( k +1)2 , ˆ d ( k +1)1 , ˆ ν ( k +1) , ˆ ν ( k +1)2 , ˆ γ ( k +1) , ˆ P ( k +1) , ˆ Q ( k +1) , ˆ c ( k +1)2 ,z , ˆ s ( k +1)2 , ˆ s ( k +1)3 } , (121)where we again for simplicity use the wording to emphasize p ( k +1) err − probability of error after the ( k + 1)-th iterationˆ s ( k +1) = E (( x ( k ) ) T x ( k +1 ,s ) ) − objective value after the ( k + 1)-th iterationˆ d ( k +1)2 = E k x ( k +1 ,s ) k − squared norm after the ( k + 1)-th iterationˆ d ( k +1)1 = E x Tsol x ( k +1 ,s ) − inner product with x sol after the ( k + 1)-th iteration . (122) ( k + 1) -th iteration Looking carefully at the above analysis one quickly observes that in principle all the quantities of interest canbe determined. However, there are quite a few of them that need to be optimized and there are quite a fewnumerical integrations that may have to be performed along the lines of such optimizations. In a separatepaper we will present a systematic way to determine all these parameters. To avoid being sidetracked withsuch a large number of numerical considerations in the introductory paper where the goal is to present thekey concepts behind the complexity analysis, we here provide estimates obtained in a simpler and muchfaster way. Namely, instead of systematically handling all of the above parameters, for the third and higheriterations we utilized the random dual itself. Given that the random dual is a much simpler program thanthe original primal one can run it on much larger dimensions. We have done so and manually estimatedmatrices P , Q , and vectors d and d throughout the process. An estimate for the P and Q matrices thatwe obtained is the following P (5) = . . . . .
70 1 . . . . .
95 1 . . . . .
99 1 . . . . .
996 1 (123) Q (5) = . . . . .
825 1 . . . . .
955 1 . . . . .
995 1 . . . . .
999 1 . (124)Moreover, in Table 10 we give the estimated values for vectors d and d . In Table 11, we complement thesevalues for ˆ d ( k )1 and ˆ d ( k )2 with the estimated values for p ( k ) err and ˆ s ( k ) as well. Since all these rely on somemanual estimates they are a little bit different from the values that can be obtained from a more precisesystematic numerical analysis. Another thing that we should point out is that from (101) one has26able 10: Estimates for d and d ; α = 0 . r sc = 1 . k d ( k )1 . . . . . ˆ d ( k )2 . . . . . Table 11: Change in p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . k p ( k ) err − ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . . . . . . . . . . . . . . . . . . . . limit . . . . f ( k +1) sph = max k λ k =1 λ T g ( k,q ) subject to (Λ ( k +1) ) T Λ ( k +1) = P ( k +1) . (125)Given the statistics of g and h and their connection to P ( k +1) and Q ( k +1) from (90) one can through a littlebit of work repose the above problem so that it becomes deterministic and basically a function of P ( k +1) and Q ( k +1) and then solve it either numerically or in some cases even in a closed form. However, we foundfurther detailing the explanations of this procedure as unnecessary, since it turns out that the structure ofthe optimal matrices P ( k +1) and Q ( k +1) in the case that we consider here is such that the resulting value for f ( k +1) sph will already for k = 2 be very close to 1. Namely, as mentioned above, for k = 1 one has from (49)and (50) f (2) sph = (cid:18) q (1) p (1) + q − ( q (1) ) q − ( p (1) ) (cid:19) = (cid:18) Q (2)1 , P (2)1 , + q − ( Q (2)1 , ) q − ( P (2)1 , ) (cid:19) = 0 . . (126)For k = 2 one doesn’t even need to be as precise as above. Instead a trivial ad-hoc choice that in a wayresembles the one that gives the above equation A q = Q (3)1 , and B q = Q (3)2 , − Q (3)1 , Q (3)1 , q − ( Q (3)1 , ) A p = P (3)1 , and B p = P (3)2 , − P (3)1 , P (3)1 , q − ( P (3)1 , ) , (127)gives C q = q − A q − B q and C p = q − A p − B p , (128)and finally f (3) sph ≥ A q A p + B q B p + C q C p = 0 . . (129)27 .2.1 Simulations – ( k + 1) -th iteration As in earlier sections, we below provide a collection of results obtained through simulations. In Table 12we show the CLuP’s simulated performance over the first 6 iterations for α = 0 . n = 400, and r sc = 1 . r plt = . ξ ( k ) RD = r = r sc r plt = . p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . n = 400 k p ( k ) err − ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . . . . . . . . . . . . . . . . . . . . . . . . limit . . . . very small number of iterations (basically just 6) the simulated performance approaches the theoretical one(which actually allows for any number of iterations). Finally, in Table 13 we show how these results compareto the above discussed random duality theory predictions not only on the ultimate limiting level but alsothrough a much more challenging per iteration level. We again observe a very strong agreement. Also, forTable 13: Change in p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . n = 400; theory–computed / theory–estimated / simulated k p ( k ) err − ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . limit . . . . the matrices P and Q we obtained through the simulations the following P (5) = . . . . . . . . . . . . . . . . . . . . . . . . . . (130)and Q (5) = . . . . . . . . . . . . . . . . . . . . . . . . . . (131)Given a rather small problem size n = 400 the agreement with the theoretical predictions is again very good. r effect In Table 12 we showed the CLuP’s simulated performance for r sc = 1 .
3. It turned out that only 6 iterationswere enough to get very close to the ultimate theoretical predictions (such a prediction does not imposethe limit on the number of iterations). In Table 14 we recall on the results from Table 1 and show the28LuP’s simulated performance for r sc = 1 .
5. To achieve a bit better concentrations we chose n = 800 (hereas well as everywhere where we discussed the numerical results, all quantities are assumed as averaged, i.e.when we for example write in Table 14 k x ( k,s ) k what it really means is E k x ( k,s ) k ; given the overwhelmingconcentrations the two are basically the same thing). Looking at the results shown in the table, one observesthat already after 10 iterations all key parameters achieve values that are almost identical to the theoreticalpredictions (the difference is basically on the fifth decimal ). We do also emphasize that while we increased n from 400 to 800, it is still a fairly small number and it is rather remarkable how powerful/exact RDTis when it comes to providing performance characterization estimates. Finally, one can also observe thatTable 14: Change in p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . n = 800 k p ( k ) err − ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 0 . . . . . . . . when r sc = 1 . n increased from 400 to 800 but ratherbecause of the increase in r . In fact, actually quite opposite is true. Namely, as n increases the numberof iterations goes down. For example, our results for n = 400 indicate that 6 iterations should be enough.However, it is very likely that for n large enough 5 or possibly even 4 iterations would suffice to get anexcellent performance. In Table 15 we actually show the simulated results obtained for n = 800 and all otherparameters as in the scenario that corresponds to Table 13. Comparing Table 15 to Table 13 one can nowTable 15: Change in p ( k ) err , ˆ s ( k ) , k x ( k,s ) k , and ( x sol ) T x ( k,s ) as k grows; α = 0 . r sc = 1 . n = 800; theory–computed / theory–estimated / simulated k p ( k ) err − ˆ s ( k ) ˆ d ( k )2 = k x ( k,s ) k ˆ d ( k )1 = ( x sol ) T x ( k,s ) . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . . / . limit . . . . see that when n = 400 CLuP achieves in 6 iterations roughly the same level of precision that it achievesin 5 iterations when n = 800. Finally, we also mention that from Figure 1 one can also observe that when n = 800 with just 10 iterations the CLuP is already super close not only to its own optimum but also to theML. Before reaching the conclusion of the paper there are a couple of things that we would like to particularlyemphasize.
I) Full characterization of each iteration versus the number of iterations scaling behavior
Namely, the analysis that we presented above is substantially different from typical complexity analyses.The typical complexity analyses usually try to determine an estimate for the number of iterations before the29lgorithm can terminate. Also, more often than not, such estimates do not insist on full exactness and insteadfocus on the so-called scaling behavior. In the above analysis we chose a different but way more general (andmuch harder) approach that essentially circumvents both of the above points (it doesn’t count how manyiterations will be run before the algorithm terminates and it doesn’t use the “estimating scalings” type ofanalysis). Instead, we determine all the key performance characterization parameters for any iteration andthen simply observe in what iteration they come close to the limiting values. This is of course, what ideallyshould be the ultimate goal of any performance characterization analysis, but due to its hardness it is rarely(if ever) possible. Instead it is often then approximated through the above “determining the scaling behaviorof the number of iterations” approach.
II) CLuP’s structural simplicity – the ultimate goal
Besides the above mentioned very small number of iterations that characterizes the discussed CLuP’sbehavior, one should also observe that the CLuP’s overall structural simplicity is rather significant as well.Keeping this in mind and recalling on the two main goals when it comes to the MIMO ML ( exact solution and low complexity ) one arrives at the following so to say MIMO ML grand-challenge:
MIMO ML – ultimate grand-challenge • Can one design an algorithm that is structurally simpler than CLuP, runs in asmaller number of iterations, and achieves (or beats) the exact MIMO ML inpolynomial time? .Of course, at this point it seems rather inconceivable how one can hope to design a structure simpler anditerations wise faster than CLuP. Still many miracles are possible within mathematics, so one should notrule out option that one day this kind of miracle turns out to be possible as well.
In our companion paper [22] we introduced a very powerful and relatively simple polynomial method forachieving the exact
ML-detection performance at the receiving end of MIMO systems. We referred tothe method and all technical concepts that are behind it as the Controlled Loosening-up (CLuP). Alreadythrough the introduction of the method in [22] it was relatively easy to see that it has a large number ofrather remarkable features, achieved through an incredibly simple underlying structure. Since all of thesefeatures have fairly deep mathematical roots that are behind them, in the prototype paper [22] we onlyprovided a set of so to say first glance observations and left a sequence of more thorough discussions for aseries of companion papers. This paper is one of such papers and deals with a particularly important featureof the CLuP algorithm, the computational complexity.As already mentioned in [22], one of the very best features of the CLuP algorithm is its computationalcomplexity. Since CLuP is an iterative procedure with structurally fairly simple iterations, the main con-tributing factor to its overall complexity becomes the number of running iterations. What was also observedin [22], was that the number of iterations is not only acceptable (say polynomial) but it actually fairly oftenbehaves way better than that. Namely, it turned out that it is an incredibly small number that often in theregimes of interest is not even going above 10 no matter what the problem dimensions are. In this paperwe provided a careful analysis of the algorithm’s behavior through the iterations. The analysis that weprovided is different from the standard complexity analyses that typically determine just the overall numberof iterations and the complexity per iteration. Instead of going through such a rather standard route weactually approached it in a substantially more general way and analyzed each iteration of the algorithmseparately. As a result of such an approach we determined a full characterization of all algorithm’s criticalparameters for any of the iterations.The presented analysis is deeply rooted in some of the most crucial concepts within the Random DualityTheory (RDT). In a connection with RDT, we first presented the analysis on the level of the algorithm’sfirst iteration and then showed how one can transfer from the first to the second iteration. This is of coursethe key point as we were then able to show how this particular iteration transfer can be utilized in all latertransfers from k -th to ( k + 1)-th iteration (for any k > References [1] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Sparsity oracle inequalities for the lasso.
ElectronicJournal of Statistics , 1:169–194, 2007.[2] S.S. Chen and D. Donoho. Examples of basis pursuit.
Proceeding of wavelet applications in signal andimage processing III , 1995.[3] D. Donoho, A. Maleki, and A. Montanari. The noise-sensitiviy thase transition in compressed sensing.available online at http://arxiv.org/abs/1004.1218 .[4] U. Fincke and M. Pohst. Improved methods for calculating vectors of short length in a lattice, includinga complexity analysis.
Mathematics of Computation , 44:463–471, April 1985.[5] M. Goemans and D. Williamnson. Improved approximation algorithms for maximum cut and satisfia-bility problems using semidefinite programming.
Journal of ACM , 42(6):1115–1145, 1995.[6] G. Golub and C. Van Loan.
Matrix Computations . John Hopkins University Press, 3rd edition, 1996.[7] B. Hassibi and H. Vikalo. On the sphere decoding algorithm. Part I: The expected complexity.
IEEETrans. on Signal Processing , 53(8):2806–2818, August 2005.[8] J. Jalden and B. Ottersten. On the complexity of the sphere decoding in digital communications.
IEEETrans. on Signal Processing , 53(4):1474–1484, August 2005.[9] L. Lovasz M. Grotschel and A. Schriver.
Geometric algorithms and combinatorial optimization . NewYork: Springer-Verlag, 2nd edition, 1993.[10] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data.
Ann. Statist. , 37(1):246270, 2009.[11] M. Stojnic. Block-length dependent thresholds in block-sparse compressed sensing. available online at http://arxiv.org/abs/0907.3679 .[12] M. Stojnic. Discrete perceptrons. available online at http://arxiv.org/abs/1306.4375 .[13] M. Stojnic. A framework for perfromance characterization of
LASSO algortihms. available online at http://arxiv.org/abs/1303.7291 .[14] M. Stojnic. A performance analysis framework for
SOCP algorithms in noisy compressed sensing.available online at http://arxiv.org/abs/1304.0002 .[15] M. Stojnic. A problem dependent analysis of
SOCP algorithms in noisy compressed sensing. availableonline at http://arxiv.org/abs/1304.0480 .[16] M. Stojnic. Regularly random duality. available online at http://arxiv.org/abs/1303.7295 .[17] M. Stojnic. Upper-bounding ℓ -optimization weak thresholds. available online at http://arxiv.org/abs/1303.7289 . 3118] M. Stojnic. Various thresholds for ℓ -optimization in compressed sensing. available online at http://arxiv.org/abs/0907.3666 .[19] M. Stojnic. Recovery thresholds for ℓ optimization in binary compressed sensing. ISIT, IEEE Inter-national Symposium on Information Theory , pages 1593 – 1597, 13-18 June 2010. Austin, TX.[20] M. Stojnic. Box constrained ℓ optimization in random linear systems – asymptotics. 2016. availableonline at http://arxiv.org/abs/1612.06835 .[21] M. Stojnic. Box constrained ℓ optimization in random linear systems – finite dimensions. 2016. availableonline at http://arxiv.org/abs/1612.06839 .[22] M. Stojnic. Controlled loosening-up (CLuP) – achieving exact MIMO ML in polynomial time. 2019.available online at arxiv.[23] M. Stojnic, Haris Vikalo, and Babak Hassibi. A branch and bound approach to speed up the spheredecoder.
ICASSP, IEEE International Conference on Acoustics, Signal and Speech Processing , 3:429–432, March 2005.[24] M. Stojnic, Haris Vikalo, and Babak Hassibi. Speeding up the sphere decoder with H ∞ and SDP inspired lower bounds.
IEEE Transactions on Signal Processing , 56(2):712–726, February 2008.[25] R. Tibshirani. Regression shrinkage and selection with the lasso.
J. Royal Statistic. Society , B 58:267–288, 1996.[26] S. van de Geer. High-dimensional generalized linear models and the lasso.
Ann. Statist. , 36(2):614–645,2008.[27] H. van Maaren and J.P. Warners. Bound and fast approximation algorithms for binary quadraticoptimization problems with application on MAX 2SAT.