Large systems of random linear equations with non-negative solutions: Characterizing the solvable and unsolvable phase
LLarge systems of random linear equations with non-negative solutions:Characterizing the solvable and unsolvable phase
Stefan Landmann ∗ and Andreas Engel Institute of Physics, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany (Dated: March 2, 2020)Large systems of linear equations are ubiquitous in science. Quite often, e.g. when consideringpopulation dynamics or chemical networks, the solutions must be non-negative. Recently, it hasbeen shown that large systems of random linear equations exhibit a sharp transition from a phase,where a non-negative solution exists with probability one, to one where typically no such solutionmay be found. The critical line separating the two phases was determined by combining Farkas’lemma with the replica method. Here, we show that the same methods remain viable to characterizethe two phases away from criticality. To this end we analytically determine the residual norm ofthe system in the unsolvable phase and a suitable measure of robustness of solutions in the solvableone. Our results are in very good agreement with numerical simulations.
I. INTRODUCTION
Systems of linear equations are fundamental objects ofstudy in linear algebra. They play an important role inmany fields, ranging from physics to ecology and financialmarket analysis. In many situations, e.g. when examin-ing the stability of stationary states of systems with alarge number of degrees of freedom, one encounters sys-tems with many equations.In the study of large complex systems it is often notknown in detail how the microscopic parts interact witheach other. Fortunately, the macroscopic properties fre-quently do not depend on all of the microscopic details,and it has proven successful to model them by randomvariables. This is a sensible approach if so-called self-averaging quantities exist that depend only on the pa-rameters of the distributions and not on the individualrealizations. A classic example is spin-glass theory [1, 2],but also problems from computational complexity [3], in-formation theory [4], and artificial neural networks [5]have been analyzed along these lines.A natural question occurring when studying large sys-tems of random linear equations concerns their solvabil-ity. In some cases this question can be answered easily.Consider a system of linear equations ˆ a T x = b for thevariables x µ ∈ R , µ = 1 , . . . , S with a real N × S matrixˆ a T and a real inhomogeneity vector b ∈ R N . In gen-eral, this system has a solution if the rank of the matrix, r (ˆ a T ), is the same as the rank of the augmented matrix, r (ˆ a T | b ). If the entries of the matrix are drawn indepen-dently from a continuous probability distribution it hasfull rank with probability one [6] and the system is al-most surely solvable for S ≥ N , i.e. if there are at leastas many variables as equations.In many situations, e.g. when considering mod-els describing population dynamics [7–9], chemical net-works [10, 11], financial markets [12], or game-theoreticsettings [13] the searched-for variables are concentrations ∗ Correspondence; [email protected] or probabilities and must therefore be non-negative. Thequestion whether a solution x of the system of linearequations exists with all components x µ non-negative isnon-trivial if S ≥ N .In [14] it was shown that this problem may be mappedonto a dual one by using Farkas’ lemma [15]. For indepen-dent random entries a µi and b i and S = O ( N ) the dualproblem is amenable to a replica analysis. Remarkably,in the limit N, S → ∞ the system is characterized by asharp transition from a phase in which a non-negative so-lution exists with probability one, to one where typicallyno such solution can be found. The transition line onlydepends on the statistical properties of the system andis therefore self-averaging.In [14] the mapping to the dual problem was used solelyto determine the critical line between the two phases. Inthe present paper we show that it remains valuable also tocharacterize the system away from criticality, i.e. deeplyinside the solvable and unsolvable phase, respectively. Inthis way we get analytic expressions for the remainingvariability of the system in the solvable phase as well asfor the residual error in the unsolvable one.The paper is organized as follows. In Section II theproblem is defined, relevant notation is fixed, and suit-able quantities to characterize the two different phasesare introduced. Section III contains an intuitive expla-nation of Farkas’ lemma, which is central to this paper.In Section IV we sketch the determination of the criticalline as performed in [14]. We then turn to the charac-terization of the two different phases of the problem andstart in Section V with the unsolvable phase. Section VIcontains the corresponding analysis of the solvable phase.Finally, we summarize our findings in Section VII.
II. PROBLEM AND NOTATIONA. Large systems of random linear equations
We study a system of N random linear equationsˆ a T x = b , x ≥ , (1) a r X i v : . [ c ond - m a t . d i s - nn ] F e b for S unknowns x µ where the crucial qualification x ≥ stands for x µ ≥ µ = 1 , . . . , S . We are henceonly looking for solution vectors with non-negative com-ponents. The S × N matrix ˆ a has independent randomentries a µi drawn from a Gaussian distribution with av-erage A and variance σ , (cid:104) a µi (cid:105) = A, (cid:104) ( a µi − A ) (cid:105) = σ , (2)and ˆ a T denotes its transpose. The components b i , i =1 , . . . , N of the inhomogeneity vector b are drawn from aGaussian distribution with average B and variance γ /N , (cid:104) b i (cid:105) = B, (cid:104) ( b i − B ) (cid:105) = γ N . (3)Here and in the following the brackets (cid:104) . . . (cid:105) denote theaverage over the a µi and b i . We are always interested inthe large system limit N → ∞ with S = αN and α = O (1). Note the scaling of the variance (cid:104) ( b i − B ) (cid:105) ∼ /N with the system size. It is important for a controlledlimit N → ∞ in the subsequent calculations. B. Solvable and unsolvable phase
For N → ∞ the system (1) exhibits a phase transi-tion from a phase where a solution almost always existsto one where typically no such solution can be found.This line depends on the ratio α between the number ofvariables and the number of equations and on the param-eters A, B, σ, γ of the probability distributions involved.The analytic determination of the transition line was per-formed in [14]. To make this paper self-contained and toset the stage for the ensuing calculations the main stepsof the corresponding calculations are sketched in Sec. IV.For more details the reader is referred to the original pa-per.The central aim of the present work is to characterizethe two phases of system (1). To this end we first needto find quantities able to describe relevant properties ofthe system away from criticality.In the unsolvable phase no solution x to (1) exists andit is intuitive to ask “how far” the system is from having asolution. This somewhat vague concept can be quantifiedby the minimal residual norm: r := min x , x ≥ (cid:13)(cid:13) ˆ a T x − b (cid:13)(cid:13) , (4)where (cid:107) . . . (cid:107) denotes the usual quadratic norm of vectors.As long as no solution x exists r is non-zero. By ap-proaching the solvable phase r should decrease and even-tually tend to zero at the transition. We are interestedin the detailed behavior of r as a function of α, A, B, σ, and γ .In the solvable phase the situation is complementary.Now a solution always exists. It is then natural to ask forthe robustness or flexibility of the system: How strongly FIG. 1. The geometry of problem (1) in the solvable phase.The inhomogeneity vector b (blue) belongs to the conespanned by non-negative linear combinations (5) of the rowvectors a µ (black). Correspondingly, system (1) has a solution x . For the situation in the unsolvable phase see Fig. 4. can it be changed or perturbed while still remaining solv-able? There are several possibilities to phrase this ideamathematically. One is to determine the minimal num-ber of rows a µ and corresponding variables x µ by whichthe system (1) has to be reduced such that it is ren-dered unsolvable. We will discuss related measures insection VI. III. FARKAS’ LEMMA
It is very useful to map the original problem (1) to adual one with the help of Farkas’ lemma [15]. To thisend we consider what is called the non-negative cone ofrow vectors a µ of matrix ˆ a . This cone is spanned by alllinear combinations c a + c a + · · · + c αN a αN (5)of these row vectors with non-negative coefficients c µ ,cf. Fig. 1. Clearly, if b falls into this cone, (1) has a solu-tion whereas no such solution exists when b lies outsidethe cone. In the latter case, however, a hyperplane mustexist that separates b from the cone. If we denote thenormal of this hyperplane by y ∈ R N we may hence statethat either a solution x toˆ a T x = b with x ≥ , (6)exists or we may find a vector y satisfyingˆ a y ≥ and b · y < . (7)This duality is the pivotal point of our analysis. As wewill show in the next sections (7) may be analyzed usingthe replica method. IV. DETERMINATION OF THE CRITICALLINE
Let us first see how we can determine the critical lineof the system by studying the inequalities (7). If a vector y exists which fulfills all inequalities the original system(1) does not have solution. If there is no such vector y anon-negative solution x exists.To get rid of the trivial degeneracy of solutions y im- plied by y → ω y for any positive ω we first require (cid:107) y (cid:107) = (cid:88) i y i = N. (8)Next we define the fractional volume Ω(ˆ a, b ) of all vectors y which fulfill (7) for given ˆ a and b :Ω(ˆ a, b ) := (cid:82) ∞−∞ (cid:81) i dy i δ (cid:0)(cid:80) i y i − N (cid:1) Θ (cid:16) − √ N (cid:80) i b i y i (cid:17) (cid:81) µ Θ (cid:16) √ N (cid:80) i a µi y i (cid:17)(cid:82) ∞−∞ (cid:81) i dy i δ ( (cid:80) i y i − N ) . (9) b a a FIG. 2. Volume of allowed solutions y of (7) (gray shadedarea). All vectors y lie on a sphere of radius √ N . The in-equality y · b < b . At the same time, they needto have a positive scalar product with all row vectors a µ . Here Θ denotes Heaviside-functions and their argumentshave been scaled such that their typical values remain O (1) for N → ∞ .The geometric interpretation of Ω is depicted in Fig-ure 2. Due to (8) all vectors y lie on a sphere with radius √ N . The inequality − y · b < b . At the sametime, all y need to have a non-negative scalar productwith every row vector of ˆ a . The more variables the sys-tem (1) has, the more constraints need to be fulfilled by y . Therefore, the fractional volume of solutions (grayshaded area) shrinks, as the number of variables is in-creased. When the fractional volume becomes zero, thereis no y satisfying all the inequalities. This implies thata solution to the system (1) exists.The value of Ω(ˆ a, b ) depends on the specific choice ofthe matrix ˆ a and the vector b . Rather than to pinpointthe fractional volume for each individual realization ofthe randomness we are interested in its typical value un-der the distributions defined in section II. Since Ω is dom-inated by a product of many independent random terms,we cannot expect its average to give a good estimate forits typical value. Instead, this typical value is given byΩ typ (cid:39) exp ( (cid:104) log Ω (cid:105) ) . (10) For given statistical parameters A, B, σ, γ the criticalline of the transition is determined by finding the valueof α for which the typical value of the volume Ω becomeszero. Therefore, our quantity of interest is the entropy: S ( α, A, B, σ, γ ) := lim N →∞ N (cid:104) log Ω(ˆ a, b ) (cid:105) . (11)For its determination we use the replica trick [16] whichis based on the identity: (cid:104) log Ω (cid:105) = lim n → (cid:104) Ω n (cid:105) − n . (12)The calculation is feasible for n ∈ N . The result has thento be continued to the real n in order to accomplish thecrucial limit n →
0. For n ∈ N we haveΩ n (ˆ a, b ) = (cid:90) ∞−∞ (cid:89) i,a dy ai √ πe (cid:89) a δ (cid:32)(cid:88) i ( y ai ) − N (cid:33)(cid:89) µ,a Θ (cid:32) √ N (cid:88) i a µi y ai (cid:33) (cid:89) a Θ (cid:32) − √ N (cid:88) i b i y ai (cid:33) . (13)Here, a is the replica index, which runs from 1 . . . n , andthe denominators √ πe account for the denominator in(9). Replacing the δ - and Θ-functions by their integralrepresentations the arising integrals can be decoupled byintroducing the order parameters: m a = 1 √ N (cid:88) i y ai and q ab = 1 N (cid:88) i y ai y bi for a < b. (14)In the limit N → ∞ the integrals may then be cal-culated by the saddle-point method. Since the solutionspace of (7) is connected the replica-symmetric ansatz forthe saddle-point is expected to yield correct results: m a = m, q ab = q, for a < b. (15)Some of the resulting saddle-point equations are algebraicand can be used to eliminate the corresponding variables.With the abbreviations Dt := dt √ π e − t / , H ( x ) := (cid:90) ∞ x Dt (16)as well as κ := mAσ , λ := σBγA (17)we finally arrive at [14] S ( α, λ ) = extr q,κ (cid:104)
12 log(1 − q ) + q − q ) − λ κ − q + α (cid:90) Dt log H (cid:16) √ q t − κ √ − q (cid:17)(cid:105) . (18)The order parameter q characterizes the typical overlapbetween two different solutions y from the solution space.It varies from q = 0 for α = 0 to q → q →
1. Inthis limit we have (cid:90) Dt log H (cid:16) √ q t − κ √ − q (cid:17) ∼ − − q ) (cid:90) ∞ κ Dt ( t − κ ) =: − − q ) I ( κ ) . (19)and correspondingly get S ( α c , λ ) ∼ extr q,κ (cid:104) − q ) − λ κ (1 − q ) − α c − q ) I ( κ ) (cid:105) . (20)The critical value α c of α is determined by the remain-ing saddle-point equations for q and κ :1 − λ κ = α c I ( κ ) , (21) α c H ( κ ) = 1 . (22)Note that in this result the statistical properties of a µi and b i as specified by the parameters A, B, σ, γ only arisein form of λ defined in (17) which gives the scaled ratioof the relative variances of a µi and b i .Figure 3 shows the critical line (black) given by (21),(22) in comparison with simulation results. The colorindicates the fraction of randomly drawn systems forwhich a non-negative solution could be found using a non-negative least squares solver [17]. There is good agree-ment between the analytical prediction and the numerics. V. THE UNSOLVABLE PHASE
In the unsolvable phase of problem (1) no solution vec-tor x may be found. In section II we proposed the resid-ual norm r defined in (4) as a measure of how far thesystem is from having a solution. We will show now howto determine the typical value of r from the dual problem(7). To this end it is useful to first give r a geometricalinterpretation, see Fig. 4. In the unsolvable phase the F r a c t i o n o f r e a li z a t i o n s w i t h s o l u t i o n FIG. 3. The transition from the unsolvable to the solv-able phase in dependence of the statistical properties of therandom variables described by λ and the ratio α betweenthe number of variables and the number of equations. Thecolor indicates the fraction of randomly drawn systems (1)for which a non-negative solution could be found numerically.The black line shows the analytical prediction of the criticalline given by Eqs. (21),(22). Parameters: N = 300, everydata point shows the average over 50 realizations. b Φ FIG. 4. In the unsolvable phase the inhomogeneity vec-tor b lies outside the cone of possible non-negative linearcombinations ˆ a T x = (cid:80) Sµ x µ a µ . The minimal residual norm r = min x , x ≥ (cid:13)(cid:13) ˆ a T x − b (cid:13)(cid:13) , is realized by a linear combinationwhich lies on the face of the cone closest to b . Therefore, itis completely determined by the smallest angle φ min betweenthe cone and b . inhomogeneity vector b lies outside the cone of possiblenon-negative linear combinations ˆ a T x . The best approx-imation to b by a vector ˆ a T x of the cone lies on a faceof this cone and realizes the smallest possible angle φ min between b and the surface of the cone. We hence get r = b sin( φ min ) with b = (cid:107) b (cid:107) .It remains to find a way to determine φ min from thedual problem involving the vectors y . Since there is nosolution to equation (1) there is at least one and in factseveral hyperplanes separating the cone from b . The an-gle between these hyperplanes and b ranges from 0 (whenthe hyperplane contains b ) to φ min (when the hyperplanecontains the face of the cone closest to b ). Any hyper-plane having an angle with b larger than φ min violates b a a FIG. 5. Solution volume Ω (gray shaded) as in Fig. 2 togetherwith the locus of vectors y making a definite angle with b (blue line), cf. Eq. (25). At the maximum value of this anglethe line hits the “southern tip” of the gray area. at least one of the constraints in ˆ a y ≥ . Thus, wecan determine φ min by finding the hyperplane fulfillingthe constraints given by Farkas’ lemma and making the largest possible angle with b . In terms of normal vectors y this is equivalent to r = max y (cid:18) − √ N y · b (cid:19) (23)where the maximum is over all vectors y fulfillingˆ a y ≥ , b · y < , (cid:107) y (cid:107) = N. (24)This formulation allows to determine the typical valueof r by a slight extension of the methods described insection IV, see also [18] for a related situation. The pro-cedure is best explained by comparing Figs. 2 and 5. Inorder to determine r we have to find the maximal value η max of η := − √ N y · b (25)in the shaded solution space Ω. We proceed as follows:We first fix a value of η thereby constraining the vectors y to lie on a given latitude shown by the dotted blue linein Fig. 5. The part of this line inside Ω (full blue line)represents those vectors y that belong to Ω and makethe required angle with b . As η increases the full blueline gets shorter and shorter. It shrinks to a point atthe “most southern tip” of the solution space when themaximal possible value η max of η has been reached. Wemay therefore determine η max by calculating the modifiedfractional volume ˜Ω(ˆ a, b , η ) including the constraint (25)and looking for the value of η for which it goes to zero.The modified fractional volume ˜Ω(ˆ a, b , η ) is obtainedfrom Ω(ˆ a, b ) as defined in (9) by the mere replacementΘ (cid:32) − √ N (cid:88) i b i y i (cid:33) −→ δ (cid:32) − √ N (cid:88) i b i y i − η (cid:33) . (26) The explicit calculations are hence rather similar to Sec-tion IV. They yield the following expression for the cor-responding entropy˜ S ( α, λ, η/γ ) = lim N →∞ N (cid:104) log ˜Ω(ˆ a, b , η ) (cid:105) =extr q,κ (cid:110)
12 log(1 − q ) + q − q ) − ( η/γ + κλ ) − q )+ α (cid:90) Dt log H (cid:16) √ q t − κ √ − q (cid:17) + C (cid:111) , (27)where C is an irrelevant constant arising from the differ-ent normalizations of Ω(ˆ a, b ) and ˜Ω(ˆ a, b , η ).The limit ˜Ω → q → S ( α, λ, η max /γ ) = extr q,κ (cid:110) − q ) − ( η max /γ + κλ ) − q ) − α − q ) I ( κ ) (cid:111) , (28)with the corresponding saddle-point equations1 − ( η max /γ + κλ ) = α I ( κ ) ,κλ ( η max /γ + κλ ) = α ( H ( κ ) − I ( κ )) . (29)Here H and I denote the functions defined in (16) and(19), respectively. From the numerical solutions to (29)we get our final result r = η max .Figure (6) compares results for r obtained in this waywith numerical simulations for systems of size N = 1000.The data points were generated by averaging over 10 re-alizations of the randomness. For each realization theminimal residual norm r of the random system Eq. (1)was determined using a least squares solver [17]. Thereis very good agreement between numerics and analyticalresults. The qualitative behavior is as expected: In theunsolvable phase, α < α c , the minimal residual norm isnonzero. It monotonically decreases as the system ap-proaches α c to eventually reach zero at α = α c . VI. THE SOLVABLE PHASE
Similar to the previous section we now try to character-ize the solvable phase of (1) by using the dual picture (7).In the solvable phase, it is not possible to find a vector y which fulfills all inequalities in (7) simultaneously. Wemay then ask for the best y that violates the inequalitiesˆ a y ≥ the least. To formalize the idea let us define acost function v ( y ) = (cid:88) µ E (∆ µ ) , ∆ µ := 1 √ N (cid:88) i a µi y i (30)which assigns an ’energy’ E to each violated constraint.Two plausible choices for such a cost function are com-pared in Fig. 7. The first one derives from the step func-tion E (∆ µ ) = Θ ( − ∆ µ ) (31) r FIG. 6. Comparison of analytical results and numerical sim-ulations for the residual error r in the unsolvable phase. Thestandard deviations of the numerical results are smaller thanthe symbol size. Parameters: A = B = γ = 1, σ = 1 , , N = 1000, averaged over 10 realizationsfor each data point. and simply counts the number of violated constraints. Itsminimum over all choices of y hence corresponds to theminimal number of row vectors a µ that have to be elimi-nated from ˆ a such that a solution y exists. Going back tothe original problem it therefore gives the minimal num-ber of row vectors a µ and corresponding variables x µ bywhich the system (1) has to be reduced such that it doesnot possess a non-negative solution anymore. In view ofFig. 1 this number gives an estimate for the number of“crucial” row vectors a µ that are indispensable for thecone to contain b .The second cost function builds on the ramp function E (∆ µ ) = − Θ ( − ∆ µ ) ∆ µ (32)and is sensitive not only to the violation of a constraintper se but also to the value of ∆ µ , i.e. to the strengthof such a violation [19]. Referring again to Fig. 1 itsminimal value characterizes the sensitivity of the system(1) to small variations in ˆ a and b because it indicateswhether b lies “well in the middle” of the non-negativecone of the a µ or rather near to its boundary.It is again possible to calculate the typical minimalvalues of both cost functions within the replica approachoutlined above. To this end we introduce an auxiliaryinverse temperature β and define the partition function Z (ˆ a, b , β ) = (cid:82) (cid:81) i dy i δ (cid:0)(cid:80) i y i − N (cid:1) Θ (cid:16) − √ N (cid:80) i b i y i (cid:17) exp {− βv ( y ) } (cid:82) (cid:81) i dy i δ ( (cid:80) i y i − N ) . (33)It is instructive to compare this expression with thefractional volume, Eq. (9), which was used to examinethe phase transition and the minimal residual norm in E () FIG. 7. The two cost functions studied in this section: Thestep function defined in (31) (blue) and the ramp function(32) (orange). the unsolvable phase. There, only those y contributedto the integral which do not violate any constraint atall in ˆ a y ≥ , i.e., only those with v ( y ) = 0. The en-tropy (11) was hence similar to a microcanonical groundstate entropy. Complementary, Eq. (33) is like a canoni-cal partition function to which all y contribute, also thoseviolating the constraints, albeit suppressed by the Boltz-mann factor exp {− βv ( y ) } . The role of the entropy in themicrocanonical approach is now taken by the free energy F ( α, λ, β ) := − lim N →∞ αN β (cid:104) log Z (ˆ a, b , β ) (cid:105) . (34)The minimal possible average cost per equation (cid:104) E min (cid:105) corresponds to the ground state energy and is given asthe low temperature limit of the free energy: (cid:104) E min (cid:105) = (cid:104) v min (cid:105) αN = lim β →∞ F ( α, λ, β ) . (35)Its calculation is yet another variation of the onesketched in section IV and proceeds along the lines of[20, 21]. The same order parameters are defined as in(14) and under the assumption of replica symmetry, wefind F ( α, λ, β ) = extr q,κ (cid:110) − log(1 − q )2 αβ − q αβ (1 − q )+ λ κ αβ (1 − q ) − β G E (cid:111) , (36)where G E = (cid:90) Dt log (cid:90) d ∆exp (cid:26) − βE (∆) − (∆ /σ − κ + t √ q ) − q ) (cid:27) . (37) E m i n E m i n FIG. 8. The lines show the typical minimal energy (cid:104) E min (cid:105) for the step cost function (left) and for the ramp cost function (right)as given by Eq. (44) and (49), respectively. Parameters: A = B = γ = 1, and σ = 1 , , N = 300 and averaged over 25 realizations. Theerror bars show the standard deviation. The limit β → ∞ to project out the ground state isaccompanied by q → x := β (1 − q ) remains of O (1). In this limit, therefore, the role of the saddle-pointvariable q is taken over by x . In this way G E becomes G E = β (cid:90) Dt (cid:18) − E (∆ ) − (∆ /σ − κ + t ) x (cid:19) , (38)where∆ ( t ) = arg min ∆ (cid:20) E (∆) + (∆ /σ − κ + t ) x (cid:21) . (39)For the step cost function (31) the minimum is realizedby ∆ ( t ) = σ ( κ − t ) if t < κ κ ≤ t ≤ κ + √ xσ ( κ − t ) if t > κ + √ x. (40)The typical minimal cost per equation is then given by (cid:104) E min (cid:105) = extr x,κ (cid:110) − αx + λ κ αx + H (cid:16) κ + √ x (cid:17) + 12 x (cid:90) κ + √ xκ Dt ( t − κ ) (cid:111) . (41)This expression is similar to the one for the minimal frac-tion of misclassified patterns of a perceptron [20, 21].Here, however, we have the additional term λ κ x and theadditional extremization in κ .The saddle point equations corresponding to (41) read α (cid:90) κ + √ xκ Dt ( t − κ ) =1 − λ κ (42) α (cid:90) κ + √ xκ Dt ( t − κ ) = λ κ (43) and (41) simplifies to (cid:104) E min (cid:105) = H ( κ + √ x ) . (44)Note that κ and x in this expression have to be deter-mined from the numerical solution of the saddle-pointequations (42) and (43).Analytical results for (cid:104) E min (cid:105) in case of the step costfunction as function of α are shown in the left part ofFigure 8. The overall behavior is as expected: For α be-low the critical value α c , the system is in the unsolvablephase, all constraints in (7) can be fulfilled and (cid:104) E min (cid:105) iszero. For α > α c some of the constraints can no longer befulfilled and (cid:104) E min (cid:105) starts to monotonically grow. Sincethe determination of (cid:104) E min (cid:105) by simulations is computa-tionally very demanding and not indispensable for thiswork we have refrained from doing so.For the ramp cost function (32) the calculation is sim-ilar. Eq. (40) is modified to∆ ( t ) = σ ( κ − t ) if t < κ κ ≤ t ≤ κ + σxσ ( xσ + κ − t ) if t > κ + σx and G E acquires the form G E β = − x (cid:90) κ + xσκ Dt ( κ − t ) + σ (cid:16) κ + xσ (cid:17) H ( κ + xσ ) − σ √ π exp (cid:26) − ( κ + xσ ) (cid:27) . (45)The typical minimal cost per equation becomes (cid:104) E min (cid:105) = extr x,κ (cid:40) − αx + λ κ αx − σ (cid:16) κ + xσ (cid:17) H ( κ + xσ )+ 12 x (cid:90) κ + xσκ Dt ( κ − t ) + σ √ π exp (cid:26) − ( κ + xσ ) (cid:27) (cid:41) , (46)
10 5 0 5 100.00.10.20.30.4 p ()
10 5 0 5 100.00.10.20.30.4 p () FIG. 9. Left: Distribution of violation strengths ∆ for the step function (left) and the ramp function (right). The height ofthe line at ∆ = 0 gives the prefactor of the respective δ function. Parameters: A = B = γ = 1, σ = 2 , α = 3. complemented by the saddle point equations1 − λ κ = ασ x H ( κ + σx ) + α (cid:90) κ + xσκ Dt ( κ − t ) , (47) λ κ = αxσH ( k + σx ) − α (cid:90) κ + xσκ Dt ( κ − t ) . (48)The final result for the typical minimal cost per equa-tion assumes the form (cid:104) E min (cid:105) = − σ ( κ + xσ ) H ( κ + xσ ) + σ √ π e − ( κ + xσ ) / , (49)where again the values of the saddle-point variables κ and x as determined from the numerical solution of (47)and (48) have to be plugged into this expression.In the right part of Figure 8 analytical and numericalresults for (cid:104) E min (cid:105) for the ramp cost function are com-pared. In this case, it is straightforward to get numericalresults since the minimal value of v ( y ) can be determinedusing standard tools as the scipy.optimize.minimize func-tion in Python. Very good agreement between analyticaland numerical results is found.The results on the minimal cost per equation shownin Fig. 8 can be specified in more detail by determiningthe distribution p min (∆) of the ”violation strengths” ∆ µ defined in (30) in the limit β → ∞ . In [21] it is shownthat this distribution is given by p min (∆) = (cid:90) Dt δ (∆ − ∆ ( t )) . (50)In this way we find for the step cost function p min (∆) = δ (∆) (cid:90) κ + √ xκ Dt + (cid:16) Θ(∆) + Θ( − ∆ − σ √ x ) (cid:17) √ πσ e − ( κ − ∆ σ ) / (51) and for the ramp function p min (∆) = δ (∆) (cid:90) κ + σxκ Dt + Θ(∆) 1 √ πσ e − ( κ − ∆ σ ) / + Θ( − ∆) 1 √ πσ e − ( − ∆ σ + σx + κ ) / . (52)Figure 9 compares these two distributions in the inter-esting regime α > α c . The left part shows p min for thestep function. The δ -peak at ∆ = 0 is caused by con-straints that are fulfilled as equalities whereas the partof the Gaussian for ∆ > σ √ x in the negative part of the distribution. This gapis due to the fact that the step cost function does not dif-ferentiate between heavily violated constraints and thoseonly slightly violated. It therefore prefers few “gross mis-takes” to many tiny ones. In this way it selects the rowvectors a µ that are “most crucial” for the cone to con-tain the vector b as discussed at the beginning of thissection. In contrast, the distribution p min (∆) for theramp cost function is gapless, see the right part of Fig-ure 9. Since the ramp function punishes strong violationsof constraints more severely than slight violations its op-timal value is realized with many but small violations. Ittherefore characterizes some average distance of b fromthe boundary of the cone. No gap in the distributionis therefore to be expected. Note that the typical mini-mal cost per equation is recovered from the distributions p min (∆) by: (cid:104) E min (cid:105) = (cid:90) d ∆ E (∆) p min (∆) . (53) VII. CONCLUSION
Large systems of random linear equations exhibit in-triguing properties if their variables are constrained tobe non-negative. These systems show a sharp transitionfrom a phase where a solution can be found with prob-ability one to a phase where typically no such solutionexists. The transition line depends only on the statisticalproperties of the systems and is hence self-averaging.In the present paper we showed that the mapping usedin [14] to determine the critical line is also the clue tocharacterize the two phases away from criticality. Tothis end we introduced suitable quantities describing thephases and determined their typical values as functionof the statistical parameters of the random ensemblesand the ratio between the number of variables and thenumber of equations. The unsolvable phase can be char-acterized by the typical residual norm of the system ofequations. It measures how ’far away’ the system is frombeing solvable. In the solvable phase two measures for therobustness of solutions were introduced and analyzed.Our analytical investigations became possible by firstmapping the problem onto a dual one using Farkas’lemma, and then analyzing this dual problem with the help of the replica method. Technically, our results de-pend on the correctness of the assumption of replica sym-metry. In the unsolvable phase, α ≤ α c , the solutionspace of the dual problem is connected and it is reason-able to expect replica symmetry to hold. On the otherhand, the solvable phase is characterized by a discon-nected solution space of the dual problem and replicasymmetry breaking may come into play [20–22]. ACKNOWLEDGMENTS
We would like to thank Mattes Heerwagen and Sebas-tian Rosmej for fruitful discussions and Imre Kondor forinteresting correspondence. Financial support from theGerman Science Foundation under project EN 278/10-1is gratefully acknowledged. [1] S. F. Edwards and P. W. Anderson, Journal of PhysicsF: Metal Physics , 965 (1975).[2] D. Sherrington and S. Kirkpatrick, Physical Review Let-ters , 1792 (1975).[3] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman,and L. Troyansky, Nature , 133 (1999).[4] M. Mezard and A. Montanari, Information, Physics, andComputation (Oxford University Press, 2009).[5] A. Engel and C. Van den Broeck,
Statistical mechanicsof learning (Cambridge University Press, 2001).[6] X. Feng and Z. Zhang, Applied Mathematics and Com-putation , 689 (2007).[7] R. M. May, Nature , 413 (1972).[8] M. Eigen and P. Schuster, Naturwissenschaften , 7(1978).[9] M. Tikhonov and R. Monasson, Physical Review Letters , 048103 (2017).[10] M. Polettini and M. Esposito, The Journal of chemicalphysics , 024117 (2014).[11] J. Schnakenberg, Journal of Theoretical Biology , 389(1979).[12] I. Kondor, G. Papp, and F. Caccioli, Journal of Statis-tical Mechanics: Theory and Experiment , 123402(2017). [13] J. Berg and A. Engel, Physical Review Letters , 4999(1998).[14] S. Landmann and A. Engel, Physica A: Statistical Me-chanics and its Applications , 122544 (In Press).[15] J. Farkas, Journal f¨ur die reine und angewandte Mathe-matik , 1 (1902).[16] M. M´ezard, G. Parisi, and M. Virasoro, Spin glass the-ory and beyond (World Scientific Publishing Company,1987).[17] We used the non-negative least squares solver nnls fromthe scipy.optimize package in Python.[18] A. Engel and C. Van den Broeck, Physical Review Letters , 1772 (1993).[19] Both functions have an equivalent in the theory of neuralnetworks: The step function corresponds to the Gardner-Derrida cost function [20], the ramp function to the per-ceptron cost function [21].[20] E. Gardner and B. Derrida, Journal of Physics A: Math-ematical and general , 271 (1988).[21] M. Griniasty and H. Gutfreund, Journal of Physics A:Mathematical and General , 715 (1991).[22] M. Bouten and B. Derrida, Journal of Physics A: Math-ematical, Nuclear and General27