Controlled Loosening-up (CLuP) -- achieving exact MIMO ML in polynomial time
aa r X i v : . [ c s . I T ] S e p Controlled Loosening-up (CLuP) – achieving exact
MIMO ML inpolynomial time
Mihailo Stojnic ∗ Abstract
In this paper we attack one of the most fundamental signal processing/informaton theory problems, widelyknown as the MIMO ML-detection. We introduce a powerful Random Duality Theory (RDT) mechanismthat we refer to as the Controlled Loosening-up (CLuP) as a way of achieving the exact
ML-performancein MIMO systems in polynomial time. We first outline the general strategy and then discuss the rationalebehind the entire concept. A solid collection of results obtained through numerical experiments is presentedas well and found to be in an excellent agreement with what the theory predicts. As this is the introductorypaper of a massively general concept that we have developed, we mainly focus on keeping things as simpleas possible and put the emphasis on the most fundamental ideas. In our several companion papers wepresent various other complementary results that relate to both, theoretical and practical aspects and theirconnections to a large collection of other problems and results that we have achieved over the years inRandom Duality.
Index Terms: ML - detection; MIMO systems; Algorthms; Random duality theory . The MIMO ML-detection is one of the most fundamental open problem at the intersection of a variety ofscientific fields, most notably, the information theory, signal processing, statistics, and algorithmic optimiza-tion. Due to its enormous popularity it basically needs no introduction and we will consequently try to skipas much of unnecessary repetitive introductive detailing as possible and focus only on the key points. Tothat end we start with a MIMO linear system which is typically modeled in the following way: y = A x sol + σ v . (1)In (1) x sol ∈ R n is the input vector of the system, A ∈ R m × n is the system matrix, v ∈ R m is the noisevector at the output of the system scaled by a factor σ , and y ∈ R m is the output vector of the system.For example, in information theory a multi-antenna system is typically modelled through (1). In such asystem n is the number of the transmitting antennas, m is the number of the receiving antennas, x sol is thetransmitted signal vector, A is the channel matrix, v is the noise vector at the receiving antennas, and y isthe vector that is finally received. In this paper we will focus on a statistical and large dimensional MIMOsetup which is also very typical in various applications in communications, control, and information theory.Namely, we will assume that the elements of both, A and v , are i.i.d. standard normals and that both, n and m , are large so that m = αn where α > coherent scenario where the matrix A is known at the receiving end and one wonders how the transmittedsignal x can be estimated given y and A . In such a scenario one then typically relies on the so-called MLestimate which, due to the Gaussianity of v , effectively boils down to solving the following problemˆ x = min x ∈X k y − A x k , (2) ∗ e-mail: [email protected] X stands for the set of permissible x . For the simplicity of the exposition we will assume the standardbinary scenarios, i.e. X = {− √ n , √ n } n . However, we do mention that the mechanisms that we presentbelow can easily be adapted to fit various other scenarios as well.The optimization in (2) is of course well known and belongs to the class of the least-squares problems oftenseen in many fields ranging from say statistics and machine learning to information theory, communications,and control. Over last several decades in many of these fields various different techniques have been developedto attack these problems. The level of difficulty of these problems is different from one field to another and itdepends to a large degree on the structure of set X . The experienced reader will immediately recognize thatthe above assumed structure of X makes the problem that we will consider in this paper notoriously hardand quite likely among the hardest widely popular simple to state algorithmic problems. The key difficulty ofcourse comes from the fact that the set X is discrete and known continuous optimization techniques that runin an acceptable computational time (say polynomial) typically fail to solve such problems exactly . Still,even this particular version of the problem has been the subject of an extensive research over last severaldecades and there has been a lot of great work that was done to improve its general understanding. Weleave the details of all the prior work to survey papers and here focus only on a couple of papers that aremost directly related.As an alternative to continuous heuristics that typically solve the problem only approximatively (thereforeinducing an additional residual error in the estimated ˆ x ), in our own line of work initiated in [23, 24] weapproached the problem looking for the exact solutions. We designed a branch-and-bound procedure thatsubstantially improved over the state of the art so-called Sphere-decoder (SD) algorithm of [4, 7, 8]. As atree-search algorithm, it had as its best feature the ability to prune the search tree way more significantlythan the original SD. That of course substantially dropped the computational complexity and brought it tobe close to polynomial in a wide range of systems parameters. Still, breaking the exponential/polynomialbarrier remained as an unreachable goal. This barrier is precisely what we attack below. However, as itwill soon become clear, some of the ideas will have certain connections to the roots of the main ideas thatwe introduced in [23, 24] but the key components are actually completely different and will in fact mostlyrely on the very powerful concept called Random Duality Theory (RDT) that we designed for handlinga large class of optimization problems, among many of them the well-known LASSO/SOCP variants of (2)(see, e.g. [13–15]), typically seen in various settings in statistics, compressed sensing, and machine learning(see also, e.g. [1–3, 10, 25, 26]).The presentation below will be split into several main parts. We will first introduce the main algorithmthat will be utilized for solving (2). In the second part we will discuss its performance and the rationalebehind the algorithm’s structure. Finally, in the third part we will provide a substantial set of numericalresults, both theoretical and practical, that will demonstrate the full power of the introduced concepts.
Let x (0) be a randomly generated vector from X = {− √ n , √ n } n . We consider the following iterativeprocedure to solve (2): x ( i +1) = x ( i +1 ,s ) k x ( i +1 ,s ) k with x ( i +1 ,s ) = arg min x − ( x ( i ) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n , (3)where r is a carefully chosen radius . We will refer to the above procedure as Controlled Loosening-up(CLuP). The procedure looks incredibly simple and one immediately wonders why it would have any chanceto be successful. The general answer is very complicated but here we will just briefly hint at why it actuallymay be a good idea to use the above procedure. First, in the constraint set of the inner optimization onerecognizes a problem that in a way resembles the so-called polytope relaxation that we actually introducedas a first step (and later a lower bounding technique) in the branch-and-bound mechanism in [23, 24]. Oneshould of course immediately note a couple of important points. First, it is not the polytope relaxation2 lgorithm 1 Controlled Loosening-up (CLuP – achieving exact
ML in polynomial time)
Input:
Received vector y ∈ R m , system matrix A ∈ R m × n , radius r , starting unknown vector x (0) ∈ R n ,set of additional (convex) constraints A ( x ) (empty set is fine as well), maximum number of iterations i max ,desired converging precision δ min .[CLuP( y , A, r, x (0) , A ( x ) , i max , δ )] Output:
Estimated vector x ( i ) ∈ R n and its discretized variant x ( CLuP ) .[ x ( i ) , x ( CLuP ) ] Initialize the convergence gap and the iteration counter, δ ← and i ← Set c (0)2 ← δ while i + 1 ≤ i max and/or δ ≥ δ min do Obtain x ( i +1 ,s ) as the optimal solution of the following convex optimization problem x ( i +1 ,s ) = arg min x − ( x ( i ) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n A ( x ) . Set x ( i +1) = x ( i +1 ,s ) k x ( i +1 ,s ) k . Set c ( i +1)2 ← ( − ( x ( i ) ) T x ( i +1 ,s ) ) Set δ ← | q c ( i +1)2 − q c ( i )2 | Update the iteration counter i ← i + 1 end while x ( CLuP ) ← √ n sign( x ( i ) ).itself but rather a specifically constrained problem that has the discrete n -cube vertices set relaxed to apolytope (basically a full n -cube). Second, when we introduced the branch-and-bound mechanism in [23, 24]we immediately recognized that the polytope relaxation is a nice heuristic but on its own essentially hopelesswhen it comes to finding the exact solution of (2). Here though the idea is completely different. The discreteset is relaxed to a convex continuous one so that the optimization in (3) can be solved quickly in polynomialtime. The key point is in carefully choosing r and hoping that such a careful choice may eventually lead toan ML solution.Before, moving further with the discussion related to the choice of r we in Figure 1 highlight the perfor-mance of the CLuP algorithm introduced above. We chose, α = 0 .
8. The experienced reader will alreadyhere recognize that with this choice we are already getting into the regimes where the MIMO ML-detectionproblem starts to become very difficult and where the know techniques might start having problems tryingto reach not only the exact solution but even a good approximate one. It is probably needless to say that as α decreases the problem becomes harder and harder and for α → exact level this very sameMIMO ML-detection problem. Although it is very well known we recall that: 1) the Ball heuristic relaxes X to the unit n -dimensional ball, 2) the Polytpe heuristic relaxes X to the unit cube, and 3) the SDPheuristic relaxes (cid:20) x / √ n (cid:21) (cid:20) x / √ n (cid:21) T to a full rank n -scaled unit diagonal positive semi-definite matrix. Of3ourse, there are many other more sophisticated relaxations that one can quickly design. As this paperdoes not have heuristic type of approach as its main topic we selected the above three as historically andconceptually probably the most relevant ones. With the appearance of the Random Duality Theory / σ in [db] p e rr -6 -5 -4 -3 -2 -1 L i n e o f m il d o r no c o rr ec t i on s Comparison of probability of errors
Ball SDPPolytope10 . [db] Figure 1: Comparison of p err as a function of 1 /σ ; α = 0 . r It is rather simple to see that the above CLuP procedure will converge. To simplify writing we will assumethat the converging solution is x and look at the structure of the resulting ending optimizationmin x −k x k subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (4)To characterize the performance of the above optimization we of course rely on the Random Duality Theory(RDT) that we have developed in a long line of work [11–21]. Before formally redoing the RDT steps wenote that (4) is structurally the same problem as the one in [13–15, 20, 21] with a tiny change in the set ofconstraints. Moreover, the same set of constraints we have already considered in [12, 19]. As was the casein [12–15,19–21] we will again without a loss of generality assume that x sol has a particular structure. Here,4e will say that its all components are equal to √ n . We will also set, c = k x k c = ( x sol ) T x , (5)and rewrite (4) in the following way min x −k x k subject to k [ A v ] (cid:20) x sol − x σ (cid:21) k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (6)The Lagrange dual of the above problem can be written asmin x max γ −k x k + γ (cid:18) max k λ k =1 λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) − r (cid:19) x ∈ (cid:20) − √ n , √ n (cid:21) n , (7)and relying on the concentration of γ asmax γ min x max k λ k =1 −k x k + γ λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) − γ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (8)One can then apply the RDT and proceed in a standard fashion that we outlined in [11–21]. For the timebeing we will skip doing that and defer such a discussion for one of the later sections. Here, we will insteadrely on the results that we have already created and quickly establish the solution by maximizing c = k x k so that the objective of min k x k = c max k λ k =1 λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) subject to x ∈ (cid:20) − √ n , √ n (cid:21) n , (9)remains below r . The main point is that the optimization in (9) is virtually identical to the one alreadyconsidered in [12]. For the purpose of tracking all the relevant quantities we will actually make it slightlydifferent by adding the above mentioned c = ( x sol ) T x constraint to obtainmax c min c min x max k λ k =1 λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) subject to x ∈ (cid:20) − √ n , √ n (cid:21) n ( x sol ) T x = c k x k = c . (10)Before proceeding with the RDT details we will also find it convenient to define ξ p ( α, σ ; c , c ) , lim n →∞ √ n E min x max k λ k =1 λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) x ∈ (cid:20) − √ n , √ n (cid:21) n ( x sol ) T x = c k x k = c . (11) What we will present below is basically a simple exercise within RDT and many steps can be done substan-tially faster. However, as it can be done through the utilization of a host of the results that we have alreadycreated we will take a moment and do it in a systematic way described in [11–21].
1. First step – Forming the deterministic Lagrange dual
We start with the first step which is just simple forming of the standard deterministic Lagrange dual ofthe optimization problem in (10) (see, e.g. ( [12–16, 18, 19]))max c min c min x max k λ k =1 ,γ,ν λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) + ν (( x sol ) T x − c ) + γ ( k x k − c )subject to x ∈ (cid:20) − √ n , √ n (cid:21) n . (12)As we are interested in a statistical and large dimensional scenario ν and γ will concentrate and as scalarscan be discretized and the resulting optimization over these two quantities can be taken outsidemax c min c max γ,ν min x max k λ k =1 λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) + ν (( x sol ) T x − c ) + γ ( k x k − c )subject to x ∈ (cid:20) − √ n , √ n (cid:21) n . (13)
2. Second step – Forming the Random dual
In the second step we introduce the auxiliary program, the so-called random dual to the above primal(see, e.g. ( [12–16, 18, 19])). Let ¯ X = h − √ n , √ n i n . Then the random dual is the following problemmax c min c max γ,ν min x ∈ ¯ X max k λ k =1 λ T g q k x sol − x k + σ − k λ k ( h T ( x sol − x )+ h σ )+ ν (( x sol ) T x − c )+ γ ( k x k − c ) , (14)where the components of g and h are m and n dimensional vectors, respectively with i.i.d. standard normalcomponents and h is yet another standard normal independent of all other random variables. The minussign in front of the second term is irrelevant due to rotational symmetry of h and it is introduced to havewhat follows as similar as possible to some of our earlier results. Similarly to (11), let ξ RD ( α, σ ; c , c , γ, ν )be the followinglim n →∞ √ n E min x ∈ ¯ X max k λ k =1 λ T g q k x sol − x k + σ − k λ k ( h T ( x sol − x ) + h σ ) + ν (( x sol ) T x − c ) + γ ( k x k − c ) . (15)
3. Third step – Handling the Random dual
In the third step we analyze the above random dual . We follow again step by step the strategy outlinedin [12–16, 18, 19]. It effectively boils down to the Lagrangianization and the concentration of the introducedLagrangian slack variables. We do mention though, that in the problem at hand the first step could havebeen skipped as we mentioned earlier; however we have done it for the completeness as it is generally needed.Instead, one could have applied the Lagrangianization right now to arrive at (14). Since we have alreadydone it we then proceed with the remaining steps. Now, we first observe that the inner optimization over λ
6s trivial and one getsmax c min c max γ,ν min x k g k p − c + c + σ − ( h T ( x sol − x ) + h σ ) + ν (( x sol ) T x − c ) + γ ( k x k − c )subject to x ∈ (cid:20) − √ n , √ n (cid:21) n . (16)One can then follow say [12] and define f box ( h ; c , c ) = max γ,ν min x h T x + ν (( x sol ) T x − c ) + γ ( k x k − c )subject to x ∈ (cid:20) − √ n , √ n (cid:21) n . (17)Had we not introduced c constraint with a simple shift this would be literally identical to the box constrainedproblem considered in [12] and we could immediately use the solution given there. However, as mentionedearlier, here we are choosing a bit more complicated route to emphasize the structure of some of the importantquantities utilized in CLuP. Still, the optimization problem in (17) is very similar to the one in (109) in [12].The solution of (17) is consequently very similar to (110) in [12] with a very small change to account for c and ν . Basically, instead of (110) from [12] one now has f box ( h ; c , c ) = max γ,ν √ n n X i =1 f (1) box ( h i , γ, ν ) ! − νc √ n − γc √ n, (18)where f (1) box ( h i , γ, ν ) = −| h i + ν | + γ, h i ≤ − γ − ν − ( h i + ν ) γ , − γ − ν ≤ h i ≤ γ − ν −| h i + ν | + γ, h i ≥ γ − ν, (19)and γ and ν are √ n scaled versions of γ and ν from (17). Moreover, the optimizing x i is x i = 1 √ n min (cid:18) max (cid:18) − , − (cid:18) h + ν γ (cid:19)(cid:19) , (cid:19) . (20)After solving the integrals one has E f (1) box ( h i , γ, ν ) = I − I + I , (21)where I = 0 . ν + γ )erfc(( ν + 2 γ ) / √ − exp ( − . ν + 2 γ ) ) / √ πI = ( p π/ ν + 1)erf((2 γ − ν ) / √
2) + p π/ ν + 1)erf((2 γ + ν ) / √
2) + exp ( − . ν + 2 γ ) )( ν − γ ) − exp ( − . ν − γ ) )( ν + 2 γ )) / (4 √ πγ ) I = − . ν − γ )(erf(( ν − γ ) / √
2) + 1) − exp ( − . ν − γ ) ) / √ π. (22)Finally a combination of (14)-(22) gives ξ RD ( α, σ ; c , c , γ, ν ) = √ α p − c + c + σ + I − I + I − νc − γc . (23)The following theorem summarizes what we presented above. Theorem 1. (CLuP – RDT estimate) Let ξ p ( α, σ ; c , c ) and ξ RD ( α, σ ; c , c , γ, ν ) be as in (11) and (23),respectively. Then ξ p ( α, σ ; c , c ) ≥ max γ,ν ξ RD ( α, σ ; c , c , γ, ν ) . (24)7 onsequently, min c ξ p ( α, σ ; c , c ) ≥ min c max γ,ν ξ RD ( α, σ ; c , c , γ, ν ) . (25) Proof.
Follows from the above derivation and the general RDT concepts presented in [12–16, 18, 19].Moreover, the inequalities in the above theorem are replaced with equalities when the strong randomduality holds. As shown in [12–16] this certainly happens when the strong deterministic duality holds. r The above analysis can be utilized to do both, 1) complete the design of the CLuP and 2) characterize itsperformance. To complete the design of CLuP one needs to adequately choose the radius r . That is ingeneral very hard task and depends on the system parameters α and σ at the very least. Moreover, thedependence can be very complicated. In this introductory paper, we will try to keep things as simple andelegant as possible and will discuss only the simplest possible choices.First, we have a firm lower bound on r . It is given through the following optimization r plt , lim n →∞ √ n E min x k y − A x k subject to x ∈ (cid:20) − √ n , √ n (cid:21) n . (26)This is of course nothing but the simple polytope relaxation of the original ML problem from (2). To makeresults easily presentable we will define r , r sc r plt , (27)where r sc will be the so-called scaling radius or the multiple of the minimal possible one. As mentionedearlier, the CLuP’s performance can be estimated through the above mechanism relying onmax c ∈ [0 , min c ∈ [0 , (1+ c ) / max γ,ν ξ RD ( α, σ ; c , c , γ, ν ) ≤ r sc r plt . (28)Moreover, when underlying functions behave nicely, one can further follow [12–16] and estimate various otherperformance features. For example, let ˆ ν ( CLuP ) be the optimal ν in (28), then the probability of error ˆ p ( clup ) err is easy to obtain based on (20) ˆ p ( clup ) err = P ( x i ≥
0) = 1 −
12 erfc (cid:18) ˆ ν ( CLuP ) √ (cid:19) . (29)We should also add, that it is then relatively easy to see that the polytope relaxation is a trivial special caseof the above formalism since for r sc = 1 one has r = r plt and r plt = min c ∈ [0 , min c ∈ [0 , (1+ c ) / ξ p ( α, σ ; c , c ) . (30)Moreover, since (26) is a convex optimization problem one trivially has that the strong deterministic dualityis in place which then according to [12–16] implies that the strong random duality holds and consequentlyone has the exact equalities in (24) and (56). This then implies that r plt = min c ∈ [0 , min c ∈ [0 , (1+ c ) / max γ,ν ξ RD ( α, σ ; c , c , γ, ν ) . (31)and analogously to (29) p ( plt ) err = P ( x i ≥
0) = 1 −
12 erfc (cid:18) ˆ ν plt √ (cid:19) , (32)8here ˆ ν plt is the optimal ν in (31). Of course, if one is solely interested in r plt and p ( plt ) err they can be obtainedtrivially combining [12–16] and in particular as an immediate consequence of the results in [12], most notablyits equations (109) and (110). / σ in [db] p e rr -4 -3 -2 -1 CLuP – approaching the exact ML p ( plt ) err – theoryˆ p ( CLuP ) err – theory r sc = 1ˆ p ( CLuP ) err – theory r sc = { . , . , . } ˆ p ( CLuP ) err – theory ultimate CLuPˆ p ( ml ) err – theory (1FL ML (1RSB)) r sc = . sc = . sc = . Figure 2: p err as a function of 1 /σ ; α = 0 . p err . We of course attack the hard regimes where traditionaltechniques are typically hopeless in getting anywhere close to ML. That in first place means the cases where α <
1. The plots in Figure 2 are obtained for moderately small α = 0 .
8. One first observes that the curvesare moving from the polytope one to the ML one as r sc grows. This is of course the key point. However,things are not as simple. For example, just the ML prediction itself is a notoriously hard thing to obtain.Also, at a second glance one sees that for different r sc the curves seem to exhibit so to say a finite domain onthe left side. Moreover, the dotted green line, which will be discussed later on, appears as well and standsfor the so-called ultimate level of CLuP’s calculated performance. This and many other phenomena that areactually hidden behind these plots may not be easy to understand right now. In the next section we givesome hints as to what is happening. However, given that this is the introductory paper on this subject wewant to keep things as simple as possible and will leave more complete discussions for some of our companionpapers. We start the discussion by first noting that the upper-bound on r is not as trivial as the lower bound. Infact, it seems to be strongly related to the ML curve. To fully understand this it seems that one would haveto have a pretty solid understanding of the ML curve itself. This is of course one of the most challengingproblems at the intersection of the signal processing and information theory. Below we start things off byfirst sketching what kind of estimates one obtains regarding the ML curve directly from the RTD. We first recall that ˆ x = arg min x ∈X k y − A x k . (33)9erging the three RDT steps we have the following as a simple exercise. ML RDT – three steps merged – Forming and handling deterministic and random duals
Since now x ∈ X we have c = 1 and analogously to (12) we have as the primal version of (33)min c min x max k λ k =1 ,ν λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) + ν (( x sol ) T x − c )subject to x ∈ X . (34)Analogously to (11) we will also define ξ ( ml ) p ( α, σ ; c ) , lim n →∞ √ n E min x max k λ k =1 ,ν λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) + ν (( x sol ) T x − c )subject to x ∈ X . (35)Following further what was done earlier we have analogously to (15) ξ ( ml ) RD ( α, σ ; c , ν ) = lim n →∞ √ n E min x ∈X max k λ k =1 λ T g q k x sol − x k + σ −k λ k ( h T ( x sol − x )+ h σ )+ ν (( x sol ) T x − c ) . (36)Optimizing over λ and x we further have ξ ( ml ) RD ( α, σ ; c , ν ) = √ α p − c + σ − E | h i + ν | − νc , (37)where ν is √ n scaled version of ν from (36). Moreover, one has for the optimizing x i x i = − sign( h i + ν ) . (38)After solving the integral one obtains ξ ( ml ) RD ( α, σ ; c , ν ) = √ α p − c + σ − νc − ( ν erf( ν/ √ − p /πexp ( − ν / . (39)Taking the derivative over ν gives dξ ( ml ) RD ( α, σ ; c , ν ) dν = − c − erf( ν/ √
2) = 0 , and finally ˆ ν = √ − c ) . Plugging this back in (39) we have ξ ( ml ) RD ( α, σ ; c ) = √ α p − c + σ + p /πexp ( − ( √ − c )) / . (40)The following theorem is in a way an ML analogue to Theorem 1. Theorem 2. (ML – RDT estimate) Let ξ ( ml ) p ( α, σ ; c ) and ξ ( ml ) RD ( α, σ ; c , ν ) be as in (35) and (36) (or (40)),respectively. Then ξ ( ml ) p ( α, σ ; c ) ≥ max ν ξ ( ml ) RD ( α, σ ; c , ν ) = √ α p − c + σ + p /πexp ( − ( √ erfinv ( − c )) / . (41) Consequently, min c ξ ( ml ) p ( α, σ ; c ) ≥ min c max ν ξ ( ml ) RD ( α, σ ; c , ν ) = min c √ α p − c + σ + p /πexp ( − ( √ erfinv ( − c )) / . (42)10 roof. Follows from the above derivation and the general RDT concepts presented in [12–16, 18, 19].One also easily has the following estimate for the probability of error p ( ml ) err = (1 − ˆ c ) / , (43)where ˆ c is the optimal c in (42). In Figure 3, we show a set of results that can be obtained through theabove theorem. We again focus on the probability of error p err and attack the same α = 0 . / σ in [db] p e rr -7 -6 -5 -4 -3 -2 -1 RDT ML p err estimates ˆ p ( ml ) err – theory (RDT – 0FL ML (0RSB))ˆ p ( ml ) err – theory (RDT – 1FL ML (1RSB)) . [db] . [db] Figure 3: p ( ml ) err as a function of 1 /σ ; α = 0 . . db ]. This of course signals that certaincorrections might be needed to the estimates that one obtains using the above theorem. We introduce thesecorrections through the so-called 1FL RDT (first level of full lifted random duality) and plot them as adashed blue curve. These results are obtained through a general lifting random duality formalism that wewill discuss in a separate paper. As the final results are very involved we here only draw the plot to indicatethat the glitch in the original curve may indeed have to be corrected. We also mention in passing that inthe companion paper we will also design a particular way of the statistical physics replica theory. It willturn out that its 1RSB version will fully match the 1FL RDT. Needless to say that the 0RSB will match theRDT prediction given above, to which we will sometimes also refer as a 0FL RDT (zeroth level of full liftedrandom duality, or basically just the random duality itself).Another interesting thing is the appearance of a second vertical line around 10 . db ]. While it seemsobvious that the corrections might be needed for 1 /σ ≤ . db ] there is of course no guarantee that theymay not be needed (say on a smaller scale) for the values of 1 /σ above 9 . db ]. The line at 10 . db ] mayin fact be the critical value of 1 /σ for which mild corrections are needed. Namely, analyzing the function ξ ( ml ) RD ( α, σ ; c ) given in (40) one finds that it starts having multiple local minima at 1 /σ = 10 . db ]. Infact, in Figure 4, we show the behavior of ξ ( ml ) RD ( α, σ ; c ) at 1 /σ = 10 . db ]. As can be seen, in additionto the global minimum at c = 0 . c = 0 . c = 0 . p ( ml ) err = 0 . c = 0 . p ( ml ) err = 0 . /σ = 9 . db ]. Namely, as 1 /σ moves furtherbelow 10 . db ] this local minimum becomes more and more pronounced. As Figure 5 indicates, it finallyovertakes as the global minimum and one indeed has a very strong discontinuity. While we leave the details11 ξ ξ as a function of c ;1 / σ = 10 . Global minimum ξ = . c = . ν = − . p err = . Local minimum ξ = . c = . ν = . . p err = . Figure 4: ξ ( ml ) RD as a function of c ; α = 0 .
8; 1 /σ = 10 . db ] (RDT - 0FL (0RSB))of the 1FL RDT for the companion papers we do mention here that it substantially smoothens the glitch.Still, we do believe that higher levels of lifting actually achieve the exact performance (in fact, the secondlevel is probably already getting close enough that visually distinguishing further improvements would bevirtually impossible). However, from the practical viewpoint (and as we will see later on when we discussthe numerical results) the corrections at 1FL RDT are already very close to the simulated values. In fact,for 1 /σ = 10[ db ] the correction does exist but it is fairly small (virtually invisible in Figure 3). On theother hand already for 1 /σ = 11[ db ] we were not able to find any noticeable corrections. This may indicatethat the line of mild or no corrections might indeed be somewhere between 10 − db ] (as mentioned above,quite possibly maybe not even far away from 10 . db ]). From this small discussion one can already seethat the whole story is way more complicated compared to how it may initially seem from the nice plots.This type of discussion is basically provided just as a hint as to what kind of miracles might be happeningand how they may be related to CLuP which is the main interest of this paper. We of course leave morethorough discussions regarding the ML performance for one of our companion papers. Now that we did get a bit of a feeling as to what happens with ML performance we will get back to theCLuP itself. We recall that the plots in Figure 2 seem to exhibit a finite domain on the left side, meaningthat below certain values of SNR 1 /σ , ceratin scalings of r plt might not be possible. We also recall theexistence of a dashed green curve in Figure 2. These things are to a large degree connected to the MLperformance and we will discuss them in a bit more detail below. However, before doing so, we also observeseveral properties of CLuP ξ RD function that in a way may also be connected to the above discussed MLperformance. ξ RD local optima We will focus on the SNR regime where the above discussion indicates that the ML corrections might beneeded. So, we first start with 1 /σ = 11[ db ] (this is actually slightly above the above discussed 10 . db ]line but it is a good starting point). We select a particular value c = 0 . ξ RD changes as a function of c . As it turns out there is no anemerging local optimum (based on the shape of the curve, one might hypothetically assume that there mightbe some saddle points; however, given how complicated the underlining functions are this may seem rather12 ξ ξ as a function of c ;1 / σ = 9 .
989 [db]
Global/local minimum ξ = . . c = . ν = − . p err = . Local/global minimum ξ = . . c = . ν = − . p err = . Local maximum ξ = . c = . ν = − . p err = . Figure 5: ξ ( ml ) RD as a function of c ; α = 0 .
8; 1 /σ = 9 . db ] (RDT - 0FL (0RSB))unlikely). This is of course only a particular choice of c which will correspond to a particular choice of r sc and consequently r . However, we found no c where a local optimum over c emerges. One can now notethat this is in a nice agreement with the above ML discussion.On the other hand, things are a little different as one moves the SNR down to 10[ db ]. In Figure 7 weshow how ξ RD changes as a function of c for c = 0 .
985 and observe the emergence of a local minimum.Based on the optimizing values one again notes a very sharp difference in the estimated probabilities of error.However, we found no values for c where the emerging local optimum overtakes and becomes the globalminimum. This might indicate that since it is an iterative algorithm, CLuP may have problems getting tothe global optimum but with a careful strategy might be able to avoid local traps as well.As one moves the SNR further down to 9[ db ] things become even more different. First, in Figure 8 weshow the behavior for c = 0 .
951 and observe the emergence of a local minimum. Then, in Figure 9 we showthe behavior for c = 0 .
96 and observe that the local minimum overtakes as the global. This actually mightpose a serious problem for success of CLuP. Finally, in Figure 10 we show the behavior for c = 0 .
975 andobserve the disappearance of a desired local minimum which might put CLuP in a position of no success.These are some interesting properties of ξ RD . One should keep in mind though that the choice of r might besuch that the optimal c is not into the range where the above discussed properties of ξ RD happen. Plus, oneshould of course always keep in mind that this is in the regime below the above discussed line of correctionswhere various miracles are possible which can cause the properties of ξ RD to change. ξ RD stationary points While the above discussion goes into tiny details to understand particular role of all key parameters, herewe would like to emphasize that for the completeness we have also proceeded in the standard RDT fashionmentioned right after (8). As one recalls, in (8) we hadmax γ min x max k λ k =1 −k x k + γ λ T (cid:18) [ A v ] (cid:20) x sol − x σ (cid:21)(cid:19) − γ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (44)13 ξ ξ as a function of c ; c = . / σ = 11 [db] Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 6: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 11[ db ]; c = 0 . ξ RD,γ ( α, σ ; c , c , γ, ν ) = −√ c + γ ( √ α p − c + c + σ + I − I + I − νc − γc ) − γ r, (45)where I , I , and I are as given in (22). One can then utilize the following set of equations dξ RD,γ ( α, σ ; c , c , γ, ν ) dc = 0 dξ RD,γ ( α, σ ; c , c , γ, ν ) dc = 0 dξ RD,γ ( α, σ ; c , c , γ, ν ) dν = 0 dξ RD,γ ( α, σ ; c , c , γ, ν ) dγ = 0 dξ RD,γ ( α, σ ; c , c , γ, ν ) dγ = 0 . (46)After solving over ν and γ things can be a bit simplified since ν = − √ α/ / p − c + c + σ γ = 1 / / √ c / ( − ν/ − γ ) . (47)Finally, after solving over c , c , and γ we find the following two solutions for 1 /σ = 10[db] ξ RD = 0 . , c = 0 . , c = 0 . , ν = − . , γ = 1 . , γ = − . ξ RD = 0 . , c = 0 . , c = 0 . , ν = − . , γ = 0 . , γ = 0 . , (48)and the following two for 1 /σ = 9[db] ξ RD = 0 . , c = 0 . , c = 0 . , ν = − . , γ = 1 . , γ = − . ξ ξ as a function of c ; c = . / σ = 10 [db] Local minimum ξ = . c = . c = . ν = − . γ = . p err = . Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 7: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 10[ db ]; c = 0 .
985 (RDT - 0FL (0RSB)) ξ RD = 0 . , c = 0 . , c = 0 . , ν = − . , γ = 0 . , γ = 0 . . (49)We have found no other stationary points and the above two actually exactly correspond to the two shownlater on in Figures 14 and 15. r through objective values Now we finally get to address the existence of a finite domain barrier on the left side of different r sc plotsin Figure 2. Basically as plots indicate, below certain values of SNR 1 /σ , ceratin scalings of r plt might notbe possible. This is of course directly related to the above discussion about the ML performance. Namely,as r sc (and consequently r ) grows, the CLuP optimal c grows as well. Due to the CLuP’s structure c can not grow above 1. This in turn effectively imposes the limit on r and r sc (of course, both, r and r sc are basically without upper limits; however, raising them above ceratin values may be useless for the wholeCLuP concept). What one might expect is that when optimal c = 1 is such that the achieving r is matchingthe optimal ξ ( ml ) p ( α, σ ) (obtained after the optimization over c ) then CLuP’s performance matches ML.Or alternatively, when one switches to the RDT terrain, one might expect that when optimal c = 1 issuch that the achieving r is matching the optimal ξ ( ml ) RD ( α, σ ) (obtained again after the optimization over c ) then CLuP’s performance matches ML. This though may not even be the best one can do. However,before getting to this we first in Figure 11 show the limiting upper values that r sc can take based on theabove reasoning. Namely, as the above suggests one can have as the upper limit r sc = ξ ( ml ) p /r plt . On theother hand, as we have mentioned when discussing the ML performance, in certain range of SNR one mightneed to correct the values for ξ ( ml ) p . For such a correction we utilize the values obtained through the 1FLRDT and refer to them as ξ (1 F LML ) p . It is interesting to note that based on Figure 11 some values of r sc might be restricted, but all the three values discussed earlier in Figure 2 remain permissible. In the followingsubsection we discuss a different limiting strategy. r through minimal p err The above choice of limiting r seems reasonable (in fact when it comes to achieving ML may be the mostreasonable). However, one can ignore ML for a moment and wonder what would be the best way to designCLuP so that it achieves the best possible performance. The immediate question would be what would be15 ξ ξ as a function of c ; c = . / σ = 9 [db] Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Local minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 8: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 9[ db ]; c = 0 .
951 (RDT - 0FL (0RSB))the criteria to determine what the best possible performance is. There are of course many criteria that onecan consider but if we just stick with the probability of error p ( CLuP ) err then seemingly the most natural waywould be to choose r sc as to minimize p ( CLuP ) err . Recalling on (29), this essentially means that one shouldchoose r sc so that ˆ ν ( CLuP ) is minimized. The results that we obtained following this strategy are shown inFigures 11 and 12 (the choice c = 0 . p ( CLuP ) err is exactly the dashed curve in Figure 2 to which we refer as the ultimate CLuP calculatedperformance. As Figure 2 is mainly concerned with the effect of changing r sc rather than with this type ofsubtlety, we below in Figure 13 show once again this curve together with the ML one obtained through 1FLRDT. Since the curves are close to each other we also provide some of the numerical values in Table 1. Thecomparison of the values is not so interesting in the regime where 1 /σ ≤ . p ( ml ) err (we do not believe that they are significant but, as we will see later on whendiscussing results obtained from numerical experiments, they are likely to push ˆ p ( ml ) err a bit below the valuesgiven for ˆ p ( CLuP ) err ) and ˆ p ( CLuP ) err may need to be readjusted as discussed below depending on the way howthe appearance of local optima is handled. It is interesting though that for 1 /σ ≥ . p ( CLuP ) err remains below ˆ p ( ml ) err .Table 1: Numerical values for ˆ p ( ml ) err and ˆ p ( CLuP ) err that correspond to the data in Figure 13 /σ [db] 8 9 10 11 12 13 14 15ˆ p ( ml ) err . e −
02 2 . e −
02 4 . e −
03 9 . e −
04 2 . e −
04 3 . e −
05 3 . e −
06 2 . e − p ( CLuP ) err . e −
02 1 . e −
02 3 . e −
03 9 . e −
04 1 . e −
04 3 . e −
05 3 . e −
06 2 . e − r through appearance of local/global optima In Figures 14 and 15 we present ξ as a function of c . One can now clearly see the appearance of the localoptima which for 1 /σ = 9 even overtake as global optima. Moreover, one can restrict r sc so that theseregimes are not reached. 16 ξ ξ as a function of c ; c = .
96; 1 / σ = 9 [db] Local maximum ξ = . c = . c = . ν = − . γ = . p err = . Global/local minimum ξ = . c = . c = . ν = − . γ = . p err = . Local/global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 9: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 9[ db ]; c = 0 .
96 (RDT - 0FL (0RSB))
As earlier calculations and Figures 14 and 15 indicate there are clearly two stationary points that might beof interest when looking at the CLuP’s performance. One is obviously interested only in the one that is tothe right in both figures. That immediately of course raises the question as to how the CLuP performs whenit comes to avoiding the so-called lower stationary point. The key to understanding that is the CLuP’s firststep (iteration) which amounts to determining x (1) as x (1) = x (1 ,s ) k x (1 ,s ) k with x (1 ,s ) = arg min x − ( x (0) ) T x subject to k y − A x k ≤ r x ∈ (cid:20) − √ n , √ n (cid:21) n . (50)This is exactly the same problem that we considered in great detail in [22]. Moreover, after a bit of jugglingone arrives at its a more relevant version ξ p, ( α, σ, c ,z , s ) = lim n →∞ √ n E min z k σ v + A z k subject to k z k = c ,z ( x (0) ) T z = s z ∈ (cid:2) , / √ n (cid:3) n . (51)The following theorem is proven in [22]. Theorem 3. (CLuP – RDT estimate – first iteration [22]) Set I (1) box, ( γ, ν ) = ρI , ( γ, ν ) + (1 − ρ ) I , ( γ, − ν ) + ρI , ( γ, ν ) + (1 − ρ ) I , ( γ, − ν ) , (52) where I , ( γ, ν ) = − ( exp ( − . γ + ν ) )( ν − γ ) + p π/ ν + 1) erf (2 √ γ + 1 / √ ν )17 ξ ξ as a function of c ; c = . / σ = 9 [db] Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 10: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 9[ db ]; c = 0 .
975 (RDT - 0FL (0RSB)) − p π/ ν + 1) erf ( ν/ √ − exp ( − . ν ) ν ) / (4 √ πγ ) I , ( γ, ν ) = (4 γ + 2 ν ) . erfc ((4 γ + ν ) / √ − exp ( − / γ + ν ) ) / √ π. (53) Moreover, set ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) = √ α q c ,z + σ + I (1) box, ( γ, ν ) − νs − γc ,z . (54) Let ξ p, ( α, σ, c ,z , s ) be as in (51). Then ξ p, ( α, σ, c ,z , s ) = max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) . (55) Consequently, min c ,z ξ p, ( α, σ, c ,z , s ) = min c ,z max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) . (56) Proof.
Follows from the general RDT concepts presented in [12–16, 18, 19], the discussion presented in [22],and the fact that the strong random duality trivially holds.The analysis in [22] proceeds further and by utilizing the strong random duality characterizes the exactestimates for all other quantities that may be of interest. To do so it first solves the following optimizationproblem { ˆ ν (1) , ˆ γ (1) , ˆ c (1)1 ,z , ˆ s (1)1 } = arg min s s subject to min ≤ c ,z ≤ max γ,ν ξ (1) RD ( α, σ ; c ,z , s , γ, ν ) , (57)and then defines s x, ( γ, ν ) = − ν/ /γ ( . ν/ √ − . ν + 4 γ ) / √ / /γ/ √ π ( exp ( − ν / − exp ( − (4 γ + ν ) / s xsq, ( γ, ν ) = − I , ( γ, ν ) /γ / σ in [db] r s c r sc as a function of 1 / σ ξ (1 FLML ) p /r plt minimal p err Figure 11: r sc = ξ (1 F LML ) p /r plt or is chosen so that p ( CLuP ) err is minimal and given as a function of 1 /σ ; α = 0 . s x, ( γ, ν ) = 2( . γ + ν ) / √ s xsq, ( γ, ν ) = 2 s x, , (58)to finally obtain E (( x sol ) T x ) = 1 − ( ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) ) + ρs x, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s x, (ˆ γ (1) , − ˆ ν (1) )) E k x k = ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) ) + ρs xsq, (ˆ γ (1) , ˆ ν (1) ) + (1 − ρ ) s xsq, (ˆ γ (1) , − ˆ ν (1) )+2 E (( x sol ) T x ) − . (59)Moreover, [22] also gets the estimate for the probability of error p err, = 1 − P (cid:18) z i ≤ √ n (cid:19) = 1 − (cid:18) ρ (cid:18)
12 erfc (cid:18) − γ (1) − ˆ ν (1) √ (cid:19)(cid:19) + (1 − ρ ) (cid:18)
12 erfc (cid:18) − γ (1) + ˆ ν (1) √ (cid:19)(cid:19)(cid:19) . (60)The theoretical values obtained based on Theorem 1 in [22] for various system parameters are shown inTable 2 for two different values of SNR, 1 /σ = 10[db] and 1 /σ = 13[db]. As the level of precision thatTable 2: Theoretical values for various system parameters obtained based on Theorem 3 /σ [db] ˆ ν ˆ γ ˆ c ,z ˆ s ξ (1) RD p err, k x k ( x sol ) T x . . . − . . . . . . . . − . . . . . the Random duality theory achieves is often very impressive even for moderate problem dimensions thecorresponding simulated values are shown in Table 3. We choose α = 0 . n = 400. It is relatively easyto observe a very strong agreement between the theoretical and simulated values. Also, as the value in thetable indicates, one has c = 0 . c = 0 . . p e rr × -3 p err as a function of c ; 1 / σ = 11 [db] Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 12: p err as a function of c ; 1 /σ = 11[db]; α = 0 . Theoretical / simulated values for various system parameters obtained based on Theorem 3 /σ [db] ˆ s ξ (1) RD p err, k x k ( x sol ) T x − . / . − . / . . / . . / . . / . − . / . − . / . . / . . / . . / . FL RDT to FL RDT
As we have mentioned earlier, the results that we presented utilizing RDT for ML are expected to need somecorrections in the low SNR regimes. We earlier showed a set of results that one can obtain utilizing the so-called 1FL RDT. They were related to p err . In Figure 16 we show an analogous set of results for ξ . As can beseen the value of the objective ξ is substantially lifted through the 1FL RDT mechanism. More importantlythe relatively low value of c = . ξ minimum for 0FL RDT is now replaced by asignificantly larger one c = . In this section we present some of the numerical results to complement the above theoretical considerations.
We start with the ML performance. Since the original problem (2) is obviously hard we implemented a fastbit-flipping algorithmic heuristic to solve it. While there is no guarantee that the solutions that we havefound are optimal the results presented in Figure 17 indicate that they may actually be very close to theoptimal ones. We should also add that despite their excellent performance the analysis of these types ofalgorithms remains a challenge. We also complement Figure 17 with Table 4 where some of the numerical20 / σ in [db] p e rr -7 -6 -5 -4 -3 -2 -1 Ultimate CLuP calculated performance ˆ p ( ml ) err – theory (1FL ML (1RSB))ˆ p ( CLuP ) err – theory ultimate CLuP × -4 Figure 13: r sc chosen such that p ( CLuP ) err is minimal and given as a function of 1 /σ ; α = 0 . /σ would virtually disappear on the second level of RDT, 2FL RDT).Table 4: Theoretical / simulated values for ξ and p err ; the p err values correspond to the data in Figure 17 /σ [db] ξ (0 F LML ) RD ξ (1 F LML ) RD ξ –simulated p (0 F LML ) err p (1 F LML ) err p err –simulated8 . −
01 3 . −
01 3 . −
01 1 . −
01 9 . −
02 4 . − . −
01 3 . −
01 3 . −
01 1 . −
01 2 . −
02 1 . − . −
01 2 . −
01 2 . −
01 4 . −
03 4 . −
03 3 . − . −
01 2 . −
01 2 . −
01 9 . −
04 9 . −
04 8 . − . −
01 2 . −
01 2 . −
01 2 . −
04 2 . −
04 2 . − . −
01 2 . −
01 1 . −
02 3 . −
05 3 . −
05 4 . − We now switch to the CLuP’s performance. In Figure 18 we present results obtained from numericalexperiments for all the three choices of r sc that we considered earlier, i.e. for r sc = { . , . , . } . We mostlyfocus on the SNR regime above the first line of corrections where, based on the above analysis, one is toexpect a good performance. The results for r sc = { . , . } were obtained using n = 400. To get a bitbetter concentrations closer to the ML for r sc = 1 . n = 800. We complement these results withthe numerical values in Tables 5, 6, and 7. In addition to the probabilities of error we in tables presentresults for two key CLuP parameters c and c as well. We again observe an excellent agreement betweenthe theoretical predictions and the results obtained through numerical experiments. In particular, alreadyfor rather moderately small value n = 800 the results are almost identical to the theoretical predictions.21 ξ R D ξ RD as a function of c Global minimumLocal minimum r sc = 1 . , ξ RD = 0 . r sc = 1 . , ξ RD = 0 . c ξ R D ξ RD as a function of c – augmented Global minimumLocal minium
Local minimum ξ = . c = . c = . ν = − . γ = . p err = . Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 14: ξ RD as a function of c ; 1 /σ = 10 db a) left - full range of c , b) right - appearance of localoptima c ξ R D ξ RD as a function of c Global minimumLocal minimum r sc = 1 . , ξ RD = 0 . r sc = 1 . , ξ RD = 0 . r sc = 1 . , ξ RD = 0 . c ξ R D ξ RD as a function of c Global minimumLocal minimum
Local minimum ξ = . c = . c = . ν = − . γ = . p err = . Local/global minimum ξ = . c = . c = . ν = − . γ = . p err = . Global minimum ξ = . c = . c = . ν = − . γ = . p err = . Figure 15: ξ RD as a function of c ; 1 /σ = 9 db a) left - full range of c , b) right - appearance of local optima As the mechanisms that we presented in previous sections are a somewhat complex interplay of many factorswe in this section provide a brief summary of the key points. However, as we have mentioned on multipleoccasions, this is the introductory paper and we try to stay away from technical complications as much aspossible.To start things off we in Figure 19 give the summary of the main performance feature discussed in theprevious sections. That feature is of course the probability of error, p err , and the figure itself is of course thehighlighting Figure 1. We first have the 1FLML and the ultimate CLuP’s calculated performance curves thatessentially characterize attacking the ML problem on the so-called exact level. We also plot the standardpolynomial heuristics based on the (convex) relaxations of the given discrete X set. These include, the ball,polytope, and the SDP heuristics. As mentioned earlier, as these are convex problems their performance isrelatively easy to derive through the random duality . In fact, we have seen earlier that the polytope oneis essentially trivial and a direct consequence of many results that we have already created, most notablythose from [12–16]. The analysis of the ball relaxation is even more trivial and we show the plot withouteven bothering to explain all these trivialities. The SDP is also relatively easy to derive, however the finalresults are a bit more involved and we will present them in a separate paper. All these three heuristics wereintroduced essentially as first steps in the branch-and-bound mechanism designed in [23, 24]. At that timeit was observed that they substantially trail the designed branch-and-bound mechanism and on their ownare essentially of no use when it comes to solving the problem exactly. With the appearance of the random22 ξ ξ as a function of c ;1 / σ = 9 [db] – 0FL ML and 1FL ML ξ – 0FL ML (0RSB) ξ – 1FL ML (1RSB Global minimum ξ = . . c = . err = . Global minimum ξ = . c = . err = . Figure 16: ξ RD as a function of c ; α = 0 .
8; 1 /σ = 9; – 0FL ML and 1FL ML RDTTable 5: Theoretical / simulated values for c , c , and p ( CLuP ) err ; the p ( CLuP ) err values correspond to the datain Figure 18; α = 0 . r sc = 1 . n = 400 /σ [db] c c –simulated c c –simulated p ( CLuP ) err p ( CLuP ) err –simulated10 . −
01 8 . −
01 8 . −
01 8 . −
01 1 . −
02 2 . − . −
01 8 . −
01 8 . −
01 8 . −
01 7 . −
03 8 . − . −
01 8 . −
01 9 . −
01 9 . −
01 2 . −
03 3 . − . −
01 8 . −
01 9 . −
01 9 . −
01 8 . −
04 1 . − . −
01 8 . −
01 9 . −
01 9 . −
01 2 . −
04 3 . − . −
01 8 . −
01 9 . −
01 9 . −
01 3 . −
05 7 . − duality theory these observations are also very simple to precisely mathematically characterize. As therandom duality based theoretical characterizations in Figure 19 confirm, all three of these heuristics indeedsubstantially trail the ML and CLuP results. In fact, it is actually a known thing that as α gets smallerthe failure of typical convexity type of techniques gets more pronounced. This is not necessarily particularlytrue only for the problem at hand but for many similar ones as well. Basically, as α gets smaller the problembecomes much harder and for α → α getssmaller even CLuP can occasionally experience difficulties. However, there are ways to remedy that. Theyare based on designing a bit more sophisticated CLUP’s variants that we will discuss in separate papers.We do however mention right here that the performance gain over the standard convex techniques becomeseven more substantial as α gets smaller.Another thing that was clear from the above considerations is that there is a σ -dependent breaking pointwhere one may need to modify the original CLuP setup. There are many ways how this can be done and wewill consider both, simple and highly advances modifications in separate papers. As this is the introductorypaper we insist on the simplest possible structure. In passing we just mention that a couple of simplemodifications typically useful in the lower SNR regimes include restarting the algorithms a few times with adifferent x as well as constraining additionally with ˆ x T x ≥ ˆ c to avoid local minima over c , where ˆ x andˆ c are estimates for x and c . These can be obtained in various ways; one of them, for example, would be toutilize one of the above convex relaxation heuristics, say the polytope one. In particular, one can run say j / σ in [db] p e rr -7 -6 -5 -4 -3 -2 -1 RDT ML p err estimates ˆ p ( ml ) err – theory (0FL ML (0RSB))ˆ p ( ml ) err – theory (1FL ML (1RSB))simulated . [db] . [db] Figure 17: p ( ml ) err as a function of 1 /σ ; α = 0 . Theoretical / simulated values for c , c , and p ( CLuP ) err ; the p ( CLuP ) err values correspond to the datain Figure 18; α = 0 . r sc = 1 . n = 400 /σ [db] c c –simulated c c –simulated p ( CLuP ) err p ( CLuP ) err –simulated11 . −
01 9 . −
01 9 . −
01 9 . −
01 2 . −
03 2 . − . −
01 9 . −
01 9 . −
01 9 . −
01 7 . −
04 9 . − . −
01 9 . −
01 9 . −
01 9 . −
01 1 . −
04 1 . − . −
01 9 . −
01 9 . −
01 9 . −
01 2 . −
05 3 . − times the following [ x ( i ) , x ( CLuP,j ) ] = [CLuP( y , A, r, x (0 ,j ) , ∅ , i max , δ )]for j different x (0 ,j ) generated either randomly or in specific way and then choose the one that converges tothe highest objective, i.e. produces the largest k x ( CLuP,j ) k . Or if one wants to be a bit more specific aboutavoiding particular c local optima, one can first obtain x (0) through a convex heuristic. For simplicity, sayone again chooses the polytope one, i.e. sets x (0) = x plt and then runs[ x ( i ) , x ( CLuP ) ] = [CLuP( y , A, r, x plt , { x Tplt x ≥ c ( plt )1 } , i max , δ )] , (where the estimate for c ( plt )1 is obtained through the above RDT polytope relaxation characterization), toobtain a good starting point x ( CLuP ) that can potentially help avoiding the local c optima in the secondrunning of the standard CLuP[ x ( i ) , x ( CLuP ) ] = [CLuP( y , A, r, x ( CLuP ) , ∅ , i max , δ )] . In fact, in Figure 18 for r sc = 1 . /σ = 10[db] we have implemented this strategy as there werearound 10% instances where the CLuP’s performance wouldn’t be as expected. Since 1 /σ = 10[db] is inthe regime below one of the critical lines this is in a way to be expected, if for no other reason then at leastbecause the dimensions are finite and it may happen that one runs into bad instances where big dimensionsmay be needed for everything to kick in (of course, the other reasons that are way more likely to causethe problems we have discussed earlier). However, with these modifications the performance got back to24 / σ in [db] p e rr -4 -3 -2 -1 L i n e o f m il d o r no c o rr ec t i on s CLuP – approaching the exact ML p ( plt ) err – theoryˆ p ( CLuP ) err – theory r sc = 1ˆ p ( CLuP ) err – theory r sc = { . , . , . } ˆ p ( CLuP ) err – theory ultimate CLuPˆ p ( ml ) err – theory (1FL ML (1RSB))ˆ p ( CLuP ) err – simulated r sc = 1 . p ( CLuP ) err – simulated r sc = 1 . p ( CLuP ) err – simulated r sc = 1 . . [db] r sc = . sc = . sc = . Figure 18: p err as a function of 1 /σ ; α = 0 . Theoretical / simulated values for c , c , and p ( CLuP ) err ; the p ( CLuP ) err values correspond to the datain Figure 18; α = 0 . r sc = 1 . n = 800 /σ [db] c c –simulated c c –simulated p ( CLuP ) err p ( CLuP ) err –simulated11 . −
01 9 . −
01 9 . −
01 9 . −
01 1 . −
03 9 . − . −
01 9 . −
01 9 . −
01 9 . −
01 2 . −
04 3 . − . −
01 9 . −
01 9 . −
01 9 . −
01 5 . −
05 6 . − match exactly what the theory predicts. We also mention that for r sc = 1 . /σ = 10[db] but it was needed for 1 /σ = 9[db]. Still, it certainly wouldn’thurt to utilize it anyway.Another important thing that we haven’t touched upon until now is of course the overall complexity ofthe algorithm. The reason of course is that we will have a whole lot more to say on this topic and it will infact be the key topic in several of our companion papers. Here we just briefly mention that in the favorableregime (above the first line of corrections) the number of iterations needed for algorithm to get to a 10 − level of convergence (the objective difference between two successive iterates) was rarely over 20. However,this is a huge overestimate, as the typical number might in some scenarios be even less than 10. We shouldalso emphasize that this is actually independent of n and it depends almost exclusively on r sc and σ . This ofcourse ultimately means that overall complexity is basically matching the complexity of solving a quadraticprogram which is clearly polynomial.Also, we should add that here we consider the simplest possible version of the algorithm. As we havejust discussed above, for example, instead of starting with x that is randomly generated one can choose itas a solution of one of the convex/polynomial heuristics. We will also analyze these scenarios in one of ourcompanion papers in details. Here we just briefly mention that choosing carefully the starting x can bebeneficial for both, handling the hard regimes below the lines of corrections as mentioned above but also forlowering the complexity (basically the number of running iterations).Various other options are possible as well. For example, one can choose a way more sophisticated iterativeupdating strategies that include further modifying the objective or the constraints set. Moreover, one canalso do multiple runs of CLuP with different radius. A particularly successful strategy that we have found isto successively increase the radius until one reaches the level close to ML. This type of strategy can then also25 / σ in [db] p e rr -6 -5 -4 -3 -2 -1 L i n e o f m il d o r no c o rr ec t i on s Comparison of probability of errors
Ball SDPPolytope10 . [db] Figure 19: Comparison of p err as a function of 1 /σ ; α = 0 . r fixed, one can have r i . Such a modification can substantially improve even a single running ofCLuP in any regime. A relevant technical problem that we looked at is how to carefully design the sequence r i so that the complexity is minimal and the overall performance optimal. A substantial improvement canbe achieved though such a consideration as well. As we have stated on numerous occasions, since this is theintroductory paper we navigated the presentation accordingly and tried to stick with the simplest possiblestructure of the algorithm. Obviously, we designed a way more advanced ones and we will discuss them inseparate companion papers.Finally, as it is probably obvious from the entire presentation, the mechanism that we presented in thispaper is in no way restricted to the MIMO ML problem discussed here. We selected this problem to bethe one where we will showcase the concepts due to its enormous importance/relevance in many scientificfields, starting with information theory and signal processing, then moving to statistics, machine learning,and many others. We have already applied it in all of these fields on a very large collection of problems.All of the above discussion and summarizing points apply to all of these problems as well and quite a fewadditional features emerge due to problems specifics in various different fields. We will systematically presentall of these results in a large collection of companion papers. In this paper we introduce a simple yet very powerful concept for achieving in polynomial time the exact
MLperformance in MIMO systems. We refer to the concept as the Controlled Loosening-up (CLuP). It turnsout that CLuP performs remarkably well. In particular, not only does it achieve an excellent performance interms of accuracy, it actually does so rather quickly with a very small number of iterations. In fact, not onlycan an excellent performance be achieved through a number of iterations that is polynomial but actually avery small fixed number (basically independent of the problem dimensions) of iterations suffices as well.While the structure of the algorithm is very simple and the performance is excellent, the rationale andtechnical foundation behind all of it do require a very careful discussion. We provided some of the keypoints that give a general picture as to how/why the entire mechanism actually functions. In particular, weobserved that it is naturally connected to the ML performance itself. Consequently, quite a few technicalfeatures that relate to the ML do seem to find their role in the analysis and functioning of the CLuP as well.26o be able to fully understand the underlying connection we first provided a brief Random DualityTheory (RDT) based technical analysis of the ML and then switched to the corresponding one related tothe CLuP. While there are many elements of the analysis that are of great interest, we will here single outone that might be among the most important ones. In particular, there seems to be ceratin (SNR) regimeswhere the ML performance (or its a very close approximation) might be easier to obtain compared to howdifficult it is to obatin the similar one in other regimes. We provided a theoretical characterization of theseregimes as well as a discussion how they may relate to the CLuP’s performance.We also discussed various ways as to how the CLuP’s performance can be reinforced in the hard regimesas well. Finally, we provided a solid set of results obtained through numerical experiments and observedthat they are in a very strong agreement with what the theory predicts.Since this is the introductory paper on this subject we limit ourselves only to the simplest possiblestructure of the algorithm. However, we did hint on multiple occasions that we have already explored quitea few other possibilities for building further. All these we will present in great details in a collection ofcompanion papers.
References [1] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Sparsity oracle inequalities for the lasso.
ElectronicJournal of Statistics , 1:169–194, 2007.[2] S.S. Chen and D. Donoho. Examples of basis pursuit.
Proceeding of wavelet applications in signal andimage processing III , 1995.[3] D. Donoho, A. Maleki, and A. Montanari. The noise-sensitiviy phase transition in compressed sensing.available online at http://arxiv.org/abs/1004.1218 .[4] U. Fincke and M. Pohst. Improved methods for calculating vectors of short length in a lattice, includinga complexity analysis.
Mathematics of Computation , 44:463–471, April 1985.[5] M. Goemans and D. Williamnson. Improved approximation algorithms for maximum cut and satisfia-bility problems using semidefinite programming.
Journal of ACM , 42(6):1115–1145, 1995.[6] G. Golub and C. Van Loan.
Matrix Computations . John Hopkins University Press, 3rd edition, 1996.[7] B. Hassibi and H. Vikalo. On the sphere decoding algorithm. Part I: The expected complexity.
IEEETrans. on Signal Processing , 53(8):2806–2818, August 2005.[8] J. Jalden and B. Ottersten. On the complexity of the sphere decoding in digital communications.
IEEETrans. on Signal Processing , 53(4):1474–1484, August 2005.[9] L. Lovasz M. Grotschel and A. Schriver.
Geometric algorithms and combinatorial optimization . NewYork: Springer-Verlag, 2nd edition, 1993.[10] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data.
Ann. Statist. , 37(1):246270, 2009.[11] M. Stojnic. Block-length dependent thresholds in block-sparse compressed sensing. available online at http://arxiv.org/abs/0907.3679 .[12] M. Stojnic. Discrete perceptrons. available online at http://arxiv.org/abs/1306.4375 .[13] M. Stojnic. A framework for perfromance characterization of
LASSO algortihms. available online at http://arxiv.org/abs/1303.7291 .[14] M. Stojnic. A performance analysis framework for
SOCP algorithms in noisy compressed sensing.available online at http://arxiv.org/abs/1304.0002 .2715] M. Stojnic. A problem dependent analysis of
SOCP algorithms in noisy compressed sensing. availableonline at http://arxiv.org/abs/1304.0480 .[16] M. Stojnic. Regularly random duality. available online at http://arxiv.org/abs/1303.7295 .[17] M. Stojnic. Upper-bounding ℓ -optimization weak thresholds. available online at http://arxiv.org/abs/1303.7289 .[18] M. Stojnic. Various thresholds for ℓ -optimization in compressed sensing. available online at http://arxiv.org/abs/0907.3666 .[19] M. Stojnic. Recovery thresholds for ℓ optimization in binary compressed sensing. ISIT, IEEE Inter-national Symposium on Information Theory , pages 1593 – 1597, 13-18 June 2010. Austin, TX.[20] M. Stojnic. Box constrained ℓ optimization in random linear systems – asymptotics. 2016. availableonline at http://arxiv.org/abs/1612.06835 .[21] M. Stojnic. Box constrained ℓ optimization in random linear systems – finite dimensions. 2016. availableonline at http://arxiv.org/abs/1612.06839 .[22] M. Stojnic. Complexity analysis of the controlled loosening-up (CLuP) algorithm. 2019. available onlineat arxiv.[23] M. Stojnic, Haris Vikalo, and Babak Hassibi. A branch and bound approach to speed up the spheredecoder. ICASSP, IEEE International Conference on Acoustics, Signal and Speech Processing , 3:429–432, March 2005.[24] M. Stojnic, Haris Vikalo, and Babak Hassibi. Speeding up the sphere decoder with H ∞ and SDP inspired lower bounds.
IEEE Transactions on Signal Processing , 56(2):712–726, February 2008.[25] R. Tibshirani. Regression shrinkage and selection with the lasso.
J. Royal Statistic. Society , B 58:267–288, 1996.[26] S. van de Geer. High-dimensional generalized linear models and the lasso.
Ann. Statist. , 36(2):614–645,2008.[27] H. van Maaren and J.P. Warners. Bound and fast approximation algorithms for binary quadraticoptimization problems with application on MAX 2SAT.