[PDF] L0 regularization-based compressed sensing with quantum-classical hybrid approach

Abstract

L0-regularization-based compressed sensing (L0-RBCS) is capable of outperforming L1-RBCS, but it is difficult to solve an optimization problem for L0-RBCS that cannot be formulated as a convex optimization. To achieve the optimization for L0-RBCS, we propose a quantum-classical hybrid system consisting of a quantum machine and a classical digital processor. Because forming a densely-connected network on a quantum machine is required for solving this problem, the coherent Ising machine (CIM) is one of suitable quantum machines for composing this hybrid system. To evaluate theoretically the performance of the CIM-classical hybrid system, a truncated Wigner stochastic differential equation (W-SDE) is obtained from the master equation for the density operator of the network of degenerate optical parametric oscillators, and macroscopic equations are derived from the W-SDE using statistical mechanics. We show that the system performance in principle approaches the theoretical limit of compressed sensing and in practical situations this hybrid system can exceed L1-RBCS's estimation accuracy.

Full PDF

aa r X i v : . [ qu a n t - ph ] F e b L0 regularization-based compressed sensing with coherent Isingmachines

Toru Aonishi ∗ School of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan

Kazushi Mimura

Graduate School of Information Sciences,Hiroshima City University, Hiroshima, Japan andSchool of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan

Masato Okada

Graduate School of Frontier Sciences,The University of Tokyo, Kashiwa, Chiba, Japan andSchool of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan

Yoshihisa Yamamoto

Physics and Informatics Laboratories,NTT Research Inc., Palo Alto, CA, USA andE. L. Ginzton Laboratory, Stanford University, Stanford, CA, USA (Dated: February 24, 2021) bstract We propose a quantum-classical hybrid system consisting of coherent Ising machines (CIM)and classical digital processors, which performs alternating minimization for L0 regularization-based compressed sensing (CS). A truncated Wigner stochastic diﬀerential equation (W-SDE) isobtained from the master equation for the density operator of the network of degenerate opticalparametric oscillators. The macroscopic equations in statistical mechanics are derived from theW-SDE to evaluate theoretically the performance of CIM L0 regularization-based CS. We showthat the system performance in principle approaches the theoretical limit of L0 minimization-basedCS derived by Nakanishi et al. (2016) and possibly exceeds that of the least absolute shrinkage andselection operator (LASSO). We also conduct numerical experiments conﬁrming that in practicalsituations this hybrid system can exceed LASSO’s estimation accuracy. ∗ Electronic address: [email protected] . INTRODUCTION The least absolute shrinkage and selection operator (LASSO) [1] is a very eﬃcient ap-proach to solving various sparse signal reconstruction problems in exploration geophysics[2–5], magnetic resonance imaging [6–9], black hole observation [10] and material informat-ics [11, 12], which is formulated as: x = arg min x ∈ R N (cid:18) k y − Ax k + λ k x k (cid:19) , (1)where x is an N -dimensional source signal, y is an M -dimensional observation signal, A is an M -by- N observation matrix, and λ is a regularization parameter. L1 regularization-based compressed sensing (CS) including LASSO can be formulated as a convex optimizationproblem, for which many eﬃcient heuristic algorithms are available [13–18]. Equation (1)is asymptotically equal to L1 minimization-based CS (minimize k x k s.t. y = Ax ) in thelimit λ → +0, and thus in this limit LASSO can reconstruct a sparse source signal x withinﬁnitesimal errors as long as the ratio of the number of non-zero elements of x to N i.e. asparseness a is below a critical value a c reported in [19], where a c is less than a measurementcompression ratio α deﬁned as α = M/N .On the other hand, L0 regularization-based CS can be formulated with the following L0norm instead of L1 norm [20]: x = arg min x ∈ R N (cid:18) k y − Ax k + λ k x k (cid:19) . (2)It has been suggested that L0 regularization-based CS might outperform L1 regularization-based CS. This is because L1 regularization imposes a shrinkage on variables over a thresh-old (soft-thresholding) but L0 regularization does not impose such a shrinkage (hard-thresholding) [20]. Furthermore, Eq. (2) is asymptotically equal to L0 minimization-basedCS (minimize k x k s.t. y = Ax ) in the limit λ → +0, and thus in this limit L0 regularization-based CS can be expected to reconstruct x with inﬁnitesimal errors as long as a is belowa critical value a c = α [21]. Note that it is impossible for any system to go beyond theperformance limit, because the number of non-zero elements of x is the eﬀective rank of asystem of linear equations and thus the system does not have a single unique solution when a > α .L0 regularization-based CS deﬁned in Eq. (2) can be equivalently reformulated as a3wo-fold optimization problem [20, 21]:( r, σ ) = arg min σ ∈{ , } N arg min r ∈ R N (cid:18) k y − A ( σ ◦ r ) k + λ k σ k (cid:19) . (3)Here, the element r i in r represents the real number value of the i -th element in the N -dimensional source signal. The vector σ is called a support vector, which represents theplaces of non-zero elements in the N -dimensional source signal. The element σ i in σ takeseither 0 or 1 to indicate whether the i -th element in the source signal is zero or non-zero.The symbol ◦ denotes the Hadamard product. From the elementwise representation of Eq.(3), the Hamiltonian (or cost function) of L0 regularization-based CS is given as H = N X i>j M X µ =1 A µi A µj r i r j σ i σ j − N X i =1 M X µ =1 y µ A µi r i σ i + λ N X i =1 | σ i | , (4)where A µi is an element in a M -by- N observation matrix A , and y µ is an element in a M -dimensional observation signal.Under the assumption that the number of 1s of the support vector σ is equal to or lessthan M , the set of simultaneous linear equations obtained from Eq. (3) with respect to r gives the solution for the non-zero elements in the source signal if the support vector σ isgiven. On the other hand, minimization of H with respect to σ is identical to minimizingan Ising Hamiltonian if r is given. Because the mutual interaction J ij = − P Mµ =1 A µi A µj r i r j induces frustration among the spins σ i , the Hamiltonian might have numerous local minima.Thus, L0 regularization-based CS cannot be formulated as a convex optimization [22–24].To overcome the diﬃculty for estimating the support vector σ , we focus on the coherentIsing machine (CIM), which is a scalable and fast Ising solver consisting of the network ofdegenerate optical parametric oscillators (OPOs) [25–30]. It has been reported [31] thata measurement-feedback (MFB) CIM [27, 28] experimentally outperformed a quantum an-nealer [32] on two problem sets: one is the full-connected Sherrington-Kirkpatrick model[33] and the other is dense graph MAX-CUT. In contrast to the exponential computationtime proportional to exp ( O ( N )) for the quantum annealer, the CIM has an exponentialcomputational time proportional to exp (cid:16) O (cid:16) √ N (cid:17)(cid:17) , where N is a problem size [31].In this paper, we propose a quantum-classical hybrid system composed of CIM and clas-sical digital processors (CDPs) (Fig. 1). This system alternately performs two minimizationprocesses; (i) the CIM optimizes σ to minimize H with given r , and (ii) the CDP opti-mizes r to minimize H with given σ . We derive a truncated Wigner stochastic diﬀerential4quation (W-SDE) from the master equation for a density operator of network consistingof OPOs. Then we develop a statistical mechanics method based on self-consistent signal-to-noise analysis (SCSNA) [34–36] and derive macroscopic equations for the whole system[37–39]. We show that the performance of the proposed system approaches the theoreticallimit of L0 minimization-based CS [21] and possibly exceeds that of LASSO. Next, we con-duct numerical experiments showing that in practical situations this system indeed exceedsLASSO’s estimation accuracy. II. RESULTSA. Roles of CIM and CDP

The CIM-CDP hybrid system (Fig. 1) executes the L0 regularization-based CS describedin Eqs. (3) and (4). This system achieves the optimization by alternately performing thefollowing two minimization processes. The CIM optimizes σ to minimize H with given r ,and then forwards σ to the CDP. The CDP optimizes r to minimize H with given σ , andthen forwards r to the CIM. Algorithm 1 is an outline of the two alternating minimizationprocesses. In this algorithm, to make the basin of attraction wider, we heuristically introducea linear threshold reduction whereby the threshold η is linearly lowered from η init to η end asthe alternating minimization proceeds.The CIM estimates the support vector σ , i.e. the places of the non-zero elements in thesource signal. To optimize σ to minimize H with given r , we use a measurement feedbackcircuit to control the intensity modulator (IM) and phase modulator (PM), which producethe optical injection ﬁeld to the target ( i -th) OPO pulse: f sigi = K ( F χ ( h i ) − η ) , i = 1 , · · · , N, (5) F χ ( h ) =  h ( χ = +) | h | ( χ = ± ) . Here, K is the gain of the feedback circuit and η is the threshold. η is related to theregularization parameter λ in Eqs. (3) and (4) by η = √ λ . h i is the local ﬁeld explainedbelow. We use two diﬀerent functions F + ( h ) and F ± ( h ) for the local ﬁeld in accordance withthe source signal. F + ( h ) is the identity function: it is used for a non-negative source signal. F ± ( h ) is the absolute value function: it is used for a signed source signal.5he local ﬁeld for estimating the support vector on the CIM is set as h i = − M X j =1( = i ) N X µ =1 A µi A µj r j H ( X j ) + M X µ =1 A µi y µ , (6)where X j is the in-phase amplitude (generalized coordinate) of the j -th OPO pulse measuredby a homodyne detector. In the local ﬁeld, H ( X ) is the Heaviside step function taking 0 for X ≤ X > r j is a solution given by the CDP. During the support estimationon the CIM, all r j are ﬁxed. Thus, the ﬁrst term is the mutual interaction term, and thesecond term corresponds to the Zeeman term.The CDP obtains a solution of the linear simultaneous equations from the minimizationcondition of H with respect to r . Without loss of generality, the elementwise representationof the simultaneous equations with respect to the unknown values of r i can be rewritten as r i M X µ =1 ( A µi ) = H ( X i ) h i , i = 1 , · · · , N, (7) h i = − N X j =1( = i ) M X µ =1 A µi A µj H ( X j ) r j + M X µ =1 A µi y µ . (8)To eliminate indeﬁniteness, r i is set to zero when H ( X i ) is zero. Here, h i in Eq. (8) is thesame as the local ﬁeld (Eq. (6)) for the support estimation on the CIM. X j is the solutiongiven by the CIM. During the signal estimation on the CDP, all H ( X j ) are ﬁxed. Thesolution of the simultaneous equations (Eq. (7)) is r = (cid:0) diag[ A T A ] + SA T AS − diag[ SA T AS ] (cid:1) − SA T y,S = diag ( H ( X ) , H ( X ) , · · · , H ( X N )) . B. Macroscopic equation for quantum-classical hybrid system

We will solve the W-SDE (18) derived in Subsection IV C and the simultaneous equations(7) with statistical mechanics under the preconditions described in Subsection IV B. Asdescribed in Subsections II A and IV C, the W-SDE (18) and the simultaneous equations(7) share the same local ﬁeld (Eqs. (6) and (8)), which can be rewritten by substituting theobservation model (15) as h i = − M M X j =1( = i ) N X µ =1 A µi A µj r j H ( c j ) + 1 M M X j =1 N X µ =1 A µi A µj ξ j x j + 1 √ M M X µ =1 A µi n µ . (9)6here [ x , · · · , x N ] T is the N -dimensional source signal and [ ξ , · · · , ξ N ] T is the supportvector. [ n , · · · , n M ] T is M -dimensional observation noise satisfying h n µ i = 0, h n µ n ν i = β δ µν . β is the variance of the observation noise.Thus, the CIM and CDP can be uniﬁed into a single mean ﬁeld system in the steadystate. Since the W-SDE for the i -th OPO only depends on the self-state and the local ﬁeld h i , we can introduce a formal transfer function X from h i to H ( c i ) [35, 36], H ( c i ) = X ( h i ) . Substituting the formal transfer function X into Eq. (7) and because h ( A µi ) i = 1, theformal transfer function G from h i to r i is given by r i = X ( h i ) h i = G ( h i ) . Therefore, the local ﬁeld can be deﬁned in a self-consistent manner through the formaltransfer function G as follows, h i = − M M X j =1( = i ) N X µ =1 A µi A µj G ( h j ) + 1 M M X j =1 N X µ =1 A µi A µj ξ j x j + 1 √ M M X µ =1 A µi n µ , (10)Following a recipe of the SCSNA (see Subsection IV E), the local ﬁeld h i is separated intoa pure local ﬁeld ˜ h i independent of the self-state H ( c i ) r i and an eﬀective self-coupling termΓ H ( c i ) r i (called the Onsager reaction term (ORT)) in the thermodynamic limit [35, 36]: h i = ˜ h i + Γ H ( c i ) r i . (11)˜ h i and Γ are determined in a self-consistent manner. X redeﬁned on ˜ h i can be safely replacedwith its average value h H ( c i ) i (see Subsection IV D) by using the self-averaging property ofsuch a mean ﬁeld system. Finally, the following macroscopic equation are obtained usingthe self-consistent local ﬁeld: R = 1 a Z + ∞−∞ Dz (cid:28) xξh p Z + ∞ dc Z + ∞−∞ dsf ( c, s | h p ) (cid:29) x,ξ ,Q = 1 a Z + ∞−∞ Dz (cid:28) h p Z + ∞ dc Z + ∞−∞ dsf ( c, s | h p ) (cid:29) x,ξ ,U r β + aα ( Q + h x i x − R ) = 1 a Z + ∞−∞ Dzz (cid:28) h p Z + ∞ dc Z + ∞−∞ dsf ( c, s | h p ) (cid:29) x,ξ . (12)7ere, R , Q and U are macroscopic parameters called the overlap, the mean square magne-tization and the susceptibility, respectively. h·i x,ξ denotes the average with respect to x and ξ , and f ( c, s | h y ) ∝ exp  A s (cid:16) c ˜ K ( F χ ( h y ) − η ) − V ( c, s ) (cid:17) G c ( z, xξ ) + G s ( z, xξ ) + 0 .  , ( y = m, p ) ,h p = xξ + r β + aα ( Q + h x i x − R ) z, ( c > h m = 11 + aα U h p , ( c ≤ V ( c, s ) = 12 (1 − p ) c + 12 (1 + p ) s + 12 c s + 14 c + 14 s .G c ( z, xξ ) and G s ( z, xξ ) can be determined self-consistently from the following equations, G c ( z, xξ ) = Z −∞ dc Z + ∞−∞ dsc f ( c, s | h m ) + Z + ∞ dc Z + ∞−∞ dsc f ( c, s | h p ) ,G s ( z, xξ ) = Z −∞ dc Z + ∞−∞ dss f ( c, s | h m ) + Z + ∞ dc Z + ∞−∞ dss f ( c, s | h p ) . The saturation parameter A s (deﬁned in Subsection IV C) diverges in the inﬁnite limit ofthe amplitude of the injected pump ﬁeld ǫ → + ∞ . In the limit A s → + ∞ , we obtain thefollowing simpliﬁed macroscopic equation, R = 1 a Z + ∞−∞ Dz D xξh p ˜ X ( h p , h m ) E x,ξ ,Q = 1 a Z + ∞−∞ Dz D h p ˜ X ( h p , h m ) E x,ξ ,U r β + aα ( Q + h x i x − R ) = 1 a Z + ∞−∞ Dzz D h p ˜ X ( h p , h m ) E x,ξ . (13)Here, ˜ X ( h p , h m ) is an eﬀective output function obtained from the Maxwell rule [40], whichis given by ˜ X ( h p , h m ) = H ( F χ ( h p ) + F χ ( h m ) − η ) . C. Accuracy of macroscopic equations and comparison with LASSO when β = 0 First, to conﬁrm the accuracy of the macroscopic equations, we compared solutions tothe macroscopic equation with solutions given by Algorithm 1. Figures 2 a and 2 b show8he root-mean-square errors (RMSEs) (deﬁned in Subsection IV A) of the solutions to themacroscopic equation with A s = 250 (Eq. (12)) and those in the limit A s → ∞ (Eq. (13))for various values of the threshold η and compression rate α (red and green solid lines).The ﬁgures also indicate the RMSEs of solutions obtained using Algorithm 1 with A s = 250and A s = 10 (circles with error bars). Note that A s = 10 is on the same order as A s inexperimental real CIMs. The results in Fig. 2 are for the case when there is no observationnoise (i.e. β = 0) and the source signals are from a half-Gaussian (+) or Gaussian ( ± ).To conﬁrm that Algorithm 1 ﬁnds solutions corresponding to the near-zero RMSE statesobtained by the macroscopic equations, r was initialized as the true signal value, i.e. x ◦ ξ ,in the alternating minimization process described above. However, even in this situation,the c-amplitude was always initialized as c = 0 in the initial stage of the support estimationin the alternating minimization process.In the case of the half-Gaussian (+), two macroscopic states with non-zero RMSE (redsolid lines) and near-zero RMSE (green solid lines) coexist as in a CIM-implemented CDMAmultiuser detector [39, 41]. On the other hand, in the case of the Gaussian ( ± ), we onlyfound a single macroscopic state with near-zero RMSE (red solid lines). Compared withthe simulation results in Fig. 2 b , the theoretical results obtained with the macroscopicequation (13) were in good agreement with the results of Algorithm 1 with A s = 10 . Inboth the half-Gaussian case (+) and Gaussian case ( ± ), the near-zero-RMSE states of themacroscopic equation (13) (red solid lines in Fig. 2 b ) matched those of Algorithm 1 with A s = 10 (circles with error bars in Fig. 2 b ), and the phase transition points given by themacroscopic equation (13) coincided with those of Algorithm 1. Under these conditions, as η was lowered to 0 .

01, RMSE decreased monotonically and the phase transition point a c fromthe near-zero-RMSE state grew monotonically. On the other hand, the theoretical resultsobtained from the macroscopic equation (12) with A s = 250 (red solid lines on the left ofFig. 2 a ) were in good agreement with results of Algorithm 1 with A s = 250 (circles witherror bars on the left of Fig. 2 a ) in the half-Gaussian case (+), whereas the phase transitionpoints given by the macroscopic equation (12) became lower than those of Algorithm 1 as η became smaller in the Gaussian case ( ± ), as shown on the right of Fig. 2 a .Furthermore, to compare the abilities of CIM L0-regularization-based CS and LASSO,we computed the RMSE proﬁles of LASSO using the macroscopic equation (37) with thesame threshold value as CIM L0-regularization-based CS; these proﬁles are superimposed9pon Fig. 2 (blue solid lines). The RMSEs of CIM L0-regularization-based CS in the limit A s → ∞ (red solid lines in Fig. 2 b ) were lower than those of LASSO (blue solid lines) atthe same compression rate α and sparseness a , and the ﬁrst-order phase transition pointsof CIM L0-regularization-based CS were higher than those of LASSO. On the other hand,the RMSEs of CIM L0-regularization-based CS with A s = 250 (red solid lines and circleswith error bars in Fig. 2 a ) were lower than those of LASSO (blue solid lines) when η = 0 . .

05, but the RMSEs of CIM L0-regularization-based CS became higher than those ofLASSO when η = 0 . ± ) (See SupplementaryFig. 1). D. Phase diagrams of CIM L0-regularization-based CS and LASSO when β = 0 We drew phase diagrams of CIM L0-regularization-based CS for various values of thethreshold η when there was no observation noise (i.e. β = 0). Figure 3 a and SupplementaryFig. 2 show ﬁrst-order phase transition lines from the near-zero-RMSE state (red lines) in thehalf-Gaussian case (+) and Gaussian case ( ± ). The phase transition lines in SupplementaryFig. 2 are for A s = 250, while the ones in Fig. 3 a are for the limit A s → ∞ .As demonstrated in Fig. 3 a , in the limit A s → ∞ , the phase transition lines from thenear-zero-RMSE state in CIM L0-regularization-based CS become asymptotic to the blacksolid line a = α as η decreases, while the RMSEs of the near-zero-RMSE state in CIML0-regularization-based CS decrease to zero (the red lines in Fig. 2 b ). The black solid line a = α in Fig. 3 is a critical line indicating a boundary whether or not L0-minimization-basedCS can perfectly reconstruct a source signal with no error for both the non-negative caseand signed case [21]. Thus, the near-zero-RMSE solutions of CIM L0-regularization-basedCS are asymptotic to the perfect reconstruction solution of L0-minimization-based CS as η decreases.To compare the properties of CIM L0-regularization-based CS with those of LASSO, Fig.3 b shows the phase diagrams of LASSO; the blue lines are the ﬁrst-order phase transitionlines from the near-zero-RMSE state for various η . As η decreases, the phase transitionlines from the near-zero-RMSE state in LASSO for the half-Gaussian (+) and Gaussian10 ± ) become asymptotic to the two black dotted lines, while the RMSEs of the near-zero-RMSE state in LASSO decrease to zero (the blue lines in Fig. 2). The black dotted linesin Fig. 3 are critical lines indicating a boundary whether or not L1-minimization-based CScan perfectly reconstruct a source signal with no error for the non-negative case and signedcase, respectively [19]. Thus, the near-zero-RMSE solutions of LASSO are asymptotic tothe perfect reconstruction solution of L1-minimization-based CS as η decreases.CIM L0-regularization-based CS and LASSO have these asymptotic properties even in thecase of source signals from the Gamma (+) and bilateral Gamma ( ± ) (see SupplementaryFig. 3 b ). Note that we have theoretically proved that this asymptotic property of CIML0-regularization-based CS is invariant to diﬀerences in the probability distributions of thesource signal by applying a perturbation expansion to the macroscopic equation (13) inthe limit η → +0 (see Subsection IV F). Thus, we have conﬁrmed this theoretical resultnumerically.On the other hand, when A s = 250, the ﬁrst-order phase transition lines of CIM L0-regularization-based CS are not asymptotic to the black solid line a = α , as shown inSupplementary Figs. 2 and 3 a . Around η = 0 .

1, the phase transition line is closest to a = α .The black dotted dashed lines in Fig. 3 a shows the lower bounds of the ﬁrst-order phasetransition lines of CIM L0-regularization-based CS in the limit A s → ∞ . The lower boundlines are above the critical line (black dotted line) of L1-minimization-based CS when thecompression rate α is low. The lower boundary property in Fig. 3 a is satisﬁed even in thecase of source signals from the Gamma (+) and bilateral Gamma ( ± ) (see SupplementaryFig. 3 b ). On the other hand, there are no such lower bounds when A s = 250 (SupplementaryFigs. 2 and 3 a ). E. Basin of attraction when β = 0 To check the practicality of CIM L0-regularization-based CS, we veriﬁed the basin ofattraction of Algorithm 1. To make the basin wider, we heuristically introduced a linearthreshold attenuation wherein the threshold η was linearly lowered from η init to η end as theminimization process was alternated (see Algorithm 1). First, we carried out numericalexperiments to verify the size of the basin of attraction for various values of the initial11hreshold η init for ﬁxed η end = 0 .

01 in the case of no observation noise (i.e. β = 0). Asshown in Fig. 4 a , the basin of attraction tended to be widened by selecting a higher initialthreshold η init than η end . As the compression rate α decreased, this tendency became moremarked, especially in the Gaussian case ( ± ).Next, we sought to conﬁrm how well Algorithm 1 converged on the near-zero RMSEstate given by the macroscopic equation (13) when starting from an initial state r = 0 forvarious η init (Fig. 4 b ). As demonstrated in Fig. 4 b , when the sparseness a was lowerthan the lower bound of the ﬁrst-order phase transition points (the black dotted dashedline in Fig. 3 a ), Algorithm 1 with η init = 0 . η init . Compared with the RMSE proﬁles of LASSO in Fig. 4 b , Algorithm 1 exceededLASSO’s estimation accuracy under almost all of the conditions in which LASSO has a smallerror.The properties shown in Fig. 4 are satisﬁed even when the source signals are from theGamma (+) and bilateral Gamma ( ± ) (see Supplementary Fig. 4). F. Performance of CIM L0-regularization-based CS and LASSO when β = 0 Moreover, to check the practicality of the CIM L0-regularization-based CS, we veriﬁedthe accuracy and convergence of the CIM L0-regularization-based CS in the presence ofobservation noise (i.e. β = 0). We searched for the optimal threshold values that wouldgive the minimum RMSEs of CIM L0-regularization-based CS and those of LASSO (Figs.5 a and 6 a ) and computed the diﬀerence between their minimum RMSEs (Figs. 5 b and 6 b )under the optimal threshold for each method when β = 0 .

01, 0 .

05, and 0 .

1. The minimumRMSE was obtained by conducting a grid search on the set of solutions to the macroscopicequations (13) and (37) in the range 0 . ≤ η ≤ . a, α ). These ﬁguresshow cases of the half-Gaussian (+) and Gaussian ( ± ) source signals. As indicated in Figs.5 a and 6 a , as β decreases, the phase transition lines from the-near-zero RMSE state in CIML0-regularization-based CS under the optimal threshold approaches the critical line (blacksolid line) of L0-minimization-based CS, and the RMSEs of CIM L0-regularization-based CSunder the optimal threshold decreases. As shown in Figs. 5 b and 6 b , the RMSEs of LASSOare higher than those of CIM L0-regularization-based CS under almost all of the conditions in12hich LASSO has an error less than 0 .

2, and thus, CIM-L0-regularization-based CS exceedsLASSO’s estimation accuracy under the optimal threshold for each method.Next, for the case of observation noise, we determined whether the output of Algorithm 1with A s = 10 converged on solutions to the macroscopic equation (13) when starting fromthe initial state r = 0 and η init = 0 .

6. As shown in Figs. 5 c and 6 c , near or at the phasetransition points, Algorithm 1 converged to the solutions of the macroscopic equation (13).The properties shown in Fig. 5 and 6 were similar for source signals from the Gamma(+) and bilateral Gamma ( ± ) (see Supplementary Figs. 5 and 6). G. Performance of CIM L0 regularization-based CS on realistic data

Finally, we evaluated the performance of CIM L0 regularization-based CS and othermethods on realistic data. We used magnetic resonance imaging (MRI) data obtained fromthe fastMRI datasets [42]. A Haar-wavelet transform (HWT) was applied to the data, and79% of the HWT coeﬃcients were set to zero to create a signal spanned by Haar basisfunctions with a sparseness of 0 .

21 (left panel of Fig. 7 a ). A k-space data shown in themiddle panel of Fig. 7 a was obtained by calculating the discrete Fourier transform (DFT)from the signal of the left panel of Fig. 7 a , and 40% of k-space data were undersampledat random red points in the middle panel of Fig. 7 a to create an observation signal witha compression rate of 0 .

4. The right panel of Fig. 7 a shows an image with incoherentartifacts obtained by zero-ﬁlling Fourier reconstruction from the randomly undersampledk-space data.To achieve higher reconstruction accuracy from the undersampled signal, we formulatedthe following implementable optimization problem on CIM with L0 and L2 norms: x = arg min x ∈ R N (cid:18) k y − SF x k + 12 γ k ∆ x k + λ k Ψ x k (cid:19) , (14)where x is a source signal, y is a k-space undersampling signal, F is a DFT matrix, S isan undersampling matrix, Ψ is a HWT matrix, ∆ is a second derivative matrix, and γ and λ are regularization parameters. Under the variable transformation r = Ψ x , the local ﬁeldvector and the mutual interaction matrix for CIM L0 regularization-based CS can be set as h = − J r ◦ H ( X ) + SF Ψ T y,J = Ψ F T S T SF Ψ T + γ Ψ∆ T ∆Ψ T . k y − SF x k + γ k ∆ x k + λ k Ψ x k and that of L1 minimization-based CS minimizing k Ψ x k + γ k ∆ x k s.t. y = SF x .Figure 7 b shows images (and RMSEs) reconstructed from CIM L0 regularization-basedCS (left panel of Fig. 7 b ), LASSO [17] (middle panel of Fig. 7 b ) and L1 minimization-based CS implemented in CVX [43, 44] (right panel of Fig. 7 b ). As indicated in imagessurrounded by red circles in these panels, CIM L0 regularization-based CS gave the mostaccurate reconstruction.We evaluated the RMSEs of the three methods as a function of the threshold η . Asshown in Fig. 7 c , the blue line with error bars is the RMSE of CIM L0 regularization-basedCS obtained from ten trials, the red line is the RMSE of LASSO, and the circle is theRMSE of L1 minimization-based CS, which is identical to LASSO in the limit η →

0, asdemonstrated in Fig. 3 c . There is an optimal value of η to minimize the RMSEs of bothCIM L0 regularization-based CS and LASSO because of a trade-oﬀ between detecting smallnon-zero elements and eliminating incoherent artifacts by thresholding. The RMSE of CIML0 regularization-based CS was lower than those of the other methods in a wide range of η . III. DISCUSSIONA. Summary and Conclusion

We proposed the quantum-classical hybrid system which uses CIM and CDP alternatelyto minimize r and σ . To evaluate the performance of CIM L0-regularization-based CS, wederived the W-SDE from the master equation for the density operator of a system consistingof N OPOs and the measurement-feedback circuit. After that, we obtained the macroscopicequations for CIM L0-regularization-based CS from the W-SDE (18) and the simultaneousequations (7).As shown in Figs. 2, 5 c and 6 c and Supplementary Figs. 1, 5 c and 6 c , we conﬁrmed thatthe macroscopic equations derived here describe steady solutions of Algorithm 1 very wellregardless of whether observation noise exists in the observed signal y . In particular, thetheoretical results in the limit A s → ∞ are in good agreement with the results of Algorithm1 with A s = 10 . Because A s = 10 is on the same order as A s in the experimental14IMs [27, 28], we expect that the macroscopic equation (13) can be used to evaluate realexperimental CIMs.In the case of no observation noise, we theoretically showed that the performance ofthe quantum-classical hybrid system in principle approaches the theoretical limit of L0minimization-based CS [21] at high pump rates (see Subsection IV F, Fig. 3 a and Supple-mentary Fig. 3 b ). From a mathematical perspective, the theoretical limit line a = α meansthe condition when the rank of a matrix composed of column vectors of an observation ma-trix corresponding to non-zero elements of the source signal is full. Thus, it is impossiblefor any system to go beyond this line mathematically. As described above, because thetheoretical results in the limit A s → ∞ are in good agreement with those of Algorithm 1with A s = 10 , we expect that the theoretical performance limit of real experimental CIMsis close to this ideal limit.In the case of observation noise, we theoretically showed that the RMSEs of CIML0-regularization-based CS are lower than those of LASSO for almost all conditions inwhich LASSO has an error less than 0 .

2, and thus CIM-L0-regularization-based CS exceedsLASSO’s estimation accuracy under the optimal threshold for each method (see Figs. 5 a ,5 b , 6 a and 6 b and Supplementary Figs. 5 a , 5 b , 6 a and 6 b ).However, there exists a problem regarding the basin of attraction. As numerically demon-strated in Fig. 4 and Supplementary Figs. 4, when there is no observation noise, Algorithm 1cannot reach the theoretical performance limit if it starts from the practical initial condition r = 0. However, even in such a situation, Algorithm 1 exceeds LASSO’s estimation accuracyfor almost all conditions in which LASSO has a small error (Fig. 4 and Supplementary Fig.4). On the other hand, when there is observation noise, under the practical initial condition r = 0, Algorithm 1 gets very close to or achieves the theoretical performance limits givenby the macroscopic equation (see Figs. 5 c and 6 c and Supplementary Figs. 5 c and 6 c ).Finally, we conﬁrmed using realistic data that CIM L0 regularization-based CS gave themost accurate reconstruction compared with LASSO and L1 minimization-based CS (Fig.7). Therefore, we can conclude that the performance of the quantum-classical hybrid systemin principle approaches the theoretical limit of L0 minimization-based CS at high pump rates,exceeds that of LASSO, and moreover in practical situations exceeds LASSO’s estimationaccuracy. 15 detailed interpretation and discussion of these results are given below. B. Eﬀectiveness of CIM in support estimation

In Algorithm 1, the c-amplitude is always initialized as c = 0 in the initial stage of thesupport estimation even when r is initialized to the true signal value, i.e. x ◦ ξ . In thissituation, the solutions of Algorithm 1 match the near-zero-RMSE states of the macroscopicequations very well, as demonstrated in Fig. 2 and Supplementary Fig. 1, and hence, thesimulated CIM can reconstruct the support vector up to the theoretical limit. Note thatthe macroscopic equation in the limit A s → ∞ (Eq. (13)) is equivalent to a macroscopicequation obtained from an Ising spin system with zero temperature [38, 39]. Therefore, thesimulated CIM can reach the ground state to minimize H with respect to σ when r is ﬁxed. C. Correctness of assumptions

To derive the macroscopic equation (12), we derived an approximate value for h H ( c i ) i ofeach OPO pulse by replacing the state variables in the second-order coeﬃcient of the powerof the quantum noise with average values of the state variables (see Eq. (19)). As shown inFigs. 2 b , 5 c and 6 c , the macroscopic equation derived under this approximation has goodaccuracy at the values of A s used in the actual equipment of the CIM. However, as shown inFig. 2 a , some solutions of the macroscopic equation did not match the numerical solutionsof Algorithm 1 for smaller values of A s . Thus, this approximation is possible if the mutualinjection ﬁeld is much larger than the noise in the steady state where the c-amplitude hasgrown. D. Basin of attraction and its dependency on threshold

To make the basin of attraction of Algorithm 1 wider, we heuristically introduced alinear threshold attenuation in which the threshold η linearly decreases as the alternatingminimization proceeds. We conﬁrmed that the basin of attraction widens as a result oflowering η from a higher initial threshold η init to a lower terminal threshold η end (see Fig. 4and Supplementary Fig. 4). 16ccording to the deﬁnition of the injection ﬁeld for each OPO pulse in Eq. (5), thethreshold η acts as an external ﬁeld to give a negative bias for the OPO pulses to take thedown state. By initially giving a large negative external ﬁeld, almost all of the OPO pulsestake the π -phase state, and thus, almost all of the { H ( X j ) } j =1 , ··· ,N take zero in the initialstage of the alternating minimization process. In the initial stage, the system can easily reachthe ground state under a strong negative bias because the phase space, which consists of asmall number of up-state OPO pulses, is simple. Then, through the alternating minimizationprocess, the system tracks gradual changes in the ground state due to incremental increasesin the number of up-state OPO pulses by gradually sweeping out a negative external ﬁeld.Finally, the system achieves the ground state at the terminal threshold η end . This is thequalitative interpretation of the mechanism of widening the basin of attraction of Algorithm1 by linearly lowering the threshold.However, as demonstrated in Fig. 4 b , when there is no observation noise, the systemfailed to converge to the near-zero-RMSE solutions beyond the lower bound line of the ﬁrst-order phase transition points. We suspect that there might be many quasi steady statesat the condition beyond the lower bound line as in the spin-glass phase [45], and thus, thesystem might become trapped in one of the quasi steady states.On the other hand, when there is observation noise, as demonstrated in Figs. 5 c and6 c , the system converged to the near-zero-RMSE solutions even nearby the phase transitionline when it started from the practical initial condition r = 0. It was suggested that thesymmetries of the system allow for the creation of quasi steady states [46]. We expect thatobservation noise could break the symmetries for quasi steady states. E. Plan to improve CIM L0-regularization-based CS

To achieve the theoretical performance limit when there is no observation noise, we needto construct a full quantum system in which both the support estimation and the signalestimation are implemented on the CIM. We expect that due to the minimum gain principle[25, 31], the full quantum system could overcome the quasi-steady-state problem discussedabove. 17

V. METHODSA. Root-mean-square error

The numerical experiments used the root-mean-square error (RMSE) as the measure ofestimation accuracy. The RMSE of CIM L0-regularization-based CS isRMSE = vuut N N X i =1 ( r i H ( c i ) − x i ξ i ) = q aQ − aR + a h x i x , where R and Q are the overlap and mean square magnetization deﬁned above, a is sparseness,and a h x i x is the second moment of the source signal. The RMSE is zero if CIM L0-regularization-based CS perfectly reconstructs the source signal. B. Precondition for applying statistical mechanics

In order to use statistical mechanics to evaluate the performance of the CIM-L0-regularization-based CS, we introduce the following preconditions.We deﬁne the following observation model,  y y ... y M  = 1 √ M  A A . . . A N A A . . . A ... ... . . . ... A M A M . . . A MN   ξ x ξ x ... ξ N x N  +  n n ... n M  , (15)where [ y , · · · , y M ] T is the M -dimensional observation signal, [ x , · · · , x N ] T is the N -dimensional source signal and [ ξ , · · · , ξ N ] T is the support vector, i.e., the places of thenon-zero elements in the N -dimensional source signal, in which ξ i takes either +1 or 0.[ n , · · · , n M ] T is M -dimensional observation noise satisfying h n µ i = 0 , h n µ n ν i = β δ µν . As described above, β is the variance of the observation noise.[ A µi ] µ =1 , ··· ,M,i =1 , ··· ,N is the M -by- N observation matrix. For mathematical tractability, theobservation matrix is scaled by 1 / √ M . All elements of the observation matrix are generatedfrom an independent and identically distributed (I.I.D.) Gaussian satisfying h A µi i = 0 , (cid:10) A µi A νj (cid:11) = δ ij δ µν .

18s described above, we introduce the compression rate α , deﬁned as α = M/N . In thispaper, we deal with the case that M diverges to inﬁnity in the thermodynamic limit ( N →∞ ), while keeping the compression rate α a non-zero constant.[ ξ , · · · , ξ N ] T is generated from an I.I.D. Bernoulli distribution satisfying ξ i =  a − a , i = 1 , · · · , N, As described above, we introduce the sparseness a , deﬁned as the probability of non-zeroelements in the source signal.In the numerical simulations described in this paper, [ x , · · · , x N ] T was generated from thefollowing I.I.D. probability distributions: Gaussian g ( x ) = e − x / σ / √ πσ , half-Gaussian g ( x ) = 2 H ( x ) e − x / σ / √ πσ , Gamma g ( x ) = x k − e − x/θ / (Γ( k ) θ k ) and bilateral Gamma g ( x ) = | x | k − e − x/θ / (2Γ( k ) θ k ) (see Fig. 8). To verify the invariance of our results to thetype of probability distribution of the source signals, we used two diﬀerent probability dis-tributions in each of non-negative and signed cases. The Gaussian and bilateral Gammawere used to generate the signed source signals. On the other hand, the half-Gaussian andGamma distributions were used to generate the non-negative source signals. Here σ = 1 forthe Gaussian and half-Gaussian distributions, while k = 2 and θ = 0 . h x i x of the Gaussian (bilateralGamma) was the same as that of the half-Gaussian (Gamma). C. Derivation of W-SDE for CIM

As shown in Fig. 1, the pump pulses are injected into the main ring cavity through asecond harmonic generation (SHG) crystal. A periodically poled lithium niobate (PPLN)waveguide is a highly eﬃcient nonlinear medium for optical parametric oscillation. Supposethat the amplitude of the injected pump ﬁeld into the main cavity is ǫ and the parametriccoupling constant of the PPLN waveguide between the signal ﬁeld and the pump ﬁeld is κ . Then, the pumping Hamiltonian is ˆ H = i ~ ǫ (ˆ a † p − ˆ a p ) and the parametric interactionHamiltonian is ˆ H = i ~ κ/ a † s ˆ a p − ˆ a † p ˆ a s ). Here ˆ a p and ˆ a s are the annihilation operators forthe intra-cavity pump and signal ﬁelds. If the round-trip time of the ring cavity is correctlyadjusted to N times the pump pulse interval, N independent and identical OPO pulsescan be simultaneously generated inside the cavity. The photon annihilation and creation19perators for the j -th OPO signal pulse are denoted by ˆ a j and ˆ a † j . The intra-cavity pumpﬁeld and signal ﬁeld have loss rates γ p and γ s , respectively. If γ p ≫ γ s , the pump ﬁeld canbe eliminated by the slaving principle: the following master equation of the density operatorfor a solitary j -th OPO signal pulse is obtained by adiabatic elimination of the pump mode[47, 48], ∂ ˆ ρ DOP O ∂t = − i ~ S N X j =1 [ˆ a † j − ˆ a j , ˆ ρ DOP O ] + γ s N X j =1 (cid:16) a j ˆ ρ DOP O ˆ a † j − ˆ a † j ˆ a j ˆ ρ DOP O − ˆ ρ DOP O ˆ a † j ˆ a j (cid:17) + B N X j =1 (cid:16) a j ˆ ρ DOP O ˆ a † j − ˆ a † j ˆ a j ˆ ρ DOP O − ˆ ρ DOP O ˆ a † j ˆ a j (cid:17) , (16)where S = ǫκ/γ p and B = κ / (2 γ p ) are linear parametric gain coeﬃcient and two photonabsorption (or back conversion) rate, respectively. [ˆ x, ˆ y ] denotes the bosonic commutator.Next, let us examine the measurement-feedback circuit shown in Fig. 1. The circuit isconnected to the main cavity by extraction and injection couplers with reﬂection coeﬃcients R ex = j ex ∆ t and R in = j in ∆ t , where j ex and j in are coarse-grained out-coupling and in-coupling constants and ∆ t is the cavity round trip time. Under a condition in which B/γ s << W ( α ) representation of thedensity operator ˆ ρ in master equations, and we arrive at the following truncated Wignerstochastic diﬀerential equation (W-SDE) by applying Ito’s rule [50, 51], dα i dt = − ( γ s + j ) α i + Sα ∗ i − B | α i | α i + j in f sigi + r γ s j B | α i | υ i , i = 1 , · · · , N, (17)where j = j ex + j in , α i is the complex Wigner amplitude, and υ i is a c-number noise amplitudesatisfying h υ i ( t ) i = 0, h υ ∗ i ( t ) υ j ( t ′ ) i = 2 δ ij δ ( t − t ′ ).Then, by introducing a saturation parameter A s by A s = p γ p ( γ s + j ) /κ and applyingthe following scale transformation: α i /A s = c i + is i , t ( γ s + j ) = t ′ , p = S/ ( γ s + j ) and20 j in /A s ( γ s + j ) = ˜ K , we obtain dc i dt ′ = ( − p − c i − s i ) c i + ˜ K ( F χ ( h i ) − η ) + 1 A s q c i + s i + 1 / W i, ,ds i dt ′ = ( − − p − c i − s i ) s i + 1 A s q c i + s i + 1 / W i, , i = 1 , · · · , N, (18)where c i and s i are the in-phase and quadrature-phase normalized amplitudes of the i-thOPO pulse. The second term of the R.H.S. in the upper equation of Eq. (18) is the opticalinjection ﬁeld, which has only in-phase component. p is the normalized pump rate. p = 1corresponds to the oscillation threshold of a solitary OPO without mutual coupling. If p isabove the oscillation threshold ( p > π -phase state. The 0-phase of an OPO pulse is assigned to an Ising spin up-state, whilethe π -phase is assigned to the down-state. The last terms of both upper and lower equationsof Eq. (18) express the vacuum ﬂuctuations injected from external reservoirs and the pumpﬂuctuations coupled into the OPO system via gain saturation. W i, and W i, are independentreal Gaussian noise processes satisfying h W i,k ( t ) i = 0, h W i,k ( t ) W j,l ( t ′ ) i = δ ij δ kl δ ( t − t ′ ). Thesaturation parameter A s determines the nonlinear increase (abrupt jump) of the photonnumber at the OPO threshold.Finally, more general quantum model of MFB-CIM without Gaussian approximation wasderived for both discrete time model [52] and continuous time model [53]. D. Mean-ﬁeld behavior of OPO pulses

Here, we derive an approximate value of h H ( c i ) i for each OPO pulse [37].It is diﬃcult to solve Eq. (18) analytically. To obtain a mathematically tractable form,we will replace the state variables in the second-order coeﬃcient of the Kramers-Moyalexpansion [51] (representing the power of the quantum noise) with the average values ofthese state variables [37]: dc i dt ′ = ( − p − c i − s i ) c i + ˜ K ( F χ ( h i ) − η ) + 1 A s q h c i i + h s i i + 1 / W i ,ds i dt ′ = ( − − p − c i − s i ) s i + 1 A s q h c i i + h s i i + 1 / W i , i = 1 , · · · , N. (19)As described in Eq. (11), the local ﬁeld h i can be separated into a pure local ﬁeld ˜ h i independent of the self-state H ( c i ) r i and the ORT Γ H ( c i ) r i [35, 36]. From Eq. (7) and21 ( A µi ) i = 1, r i is given by r i =  c i ≤ ˜ h i − Γ c i > . (20)In the next subsection, these terms are determined in a self-consistent manner.Under these conditions, we can derive the following equations to determine the approxi-mate value of h H ( c i ) i for each OPO pulse [37]. h H ( c i ) i = Z + ∞−∞ dc Z + ∞−∞ dsH ( c ) f ( c, s | ˜ h i ) , (21) f ( c, s | ˜ h i ) ∝ exp  A s (cid:16) c ˜ K (cid:16) F χ (cid:16) ˜ h i + H ( c )˜ h i Γ / (1 − Γ) (cid:17) − η (cid:17) − V ( c, s ) (cid:17) G ci + G si + 0 .  ,V ( c, s ) = 12 (1 − p ) c + 12 (1 + p ) s + 12 c s + 14 c + 14 s , where V ( c, s ) is the potential appearing in the CIM-ferromagnetic and the CIM-ﬁnite loadingHopﬁeld models [37]. G ci and G si are parameters for calculating h H ( c i ) i , which satisfy G ci = Z + ∞−∞ dc Z + ∞−∞ dsc f ( c, s | ˜ h i ) , G si = Z + ∞−∞ dc Z + ∞−∞ dss f ( c, s | ˜ h i ) . (22) G ci and G si are equal to h c i i and h s i i , and by giving ˜ h i and Γ, they can be self-consistentlydetermined from the above equation. E. Derivation of macroscopic equation for quantum-classical hybrid system

By separating the local ﬁeld h i into the pure local ﬁeld ˜ h i and the ORT Γ H ( c i ) r i (Eq.(11)) with SCSNA [34–36, 38, 39], we can reduce the N -body system composed of N mu-tually coupled OPO pulses to an eﬀective one-body system. After that, we can derive themacroscopic equation for the quantum-classical hybrid system for L0-regularization-basedCS.Let us start by introducing the following parameters. g µ = 1 N N X j =1 A µj ( G ( h j ) − ξ j x j ) − p α/N n µ , (23)Under the given conditions, the correlation between the elements of the observation matrix A µj and G ( h j ) − ξ j x j becomes O (1 / √ N ) for any µ if the system succeeds in reconstructing.Thus, we assume that g µ = O (1 / √ N )( µ = 1 , · · · , M ) is satisﬁed.22ubstituting Eq. (23) into Eq. (10) gives h i = − α M X µ =1 A µi g µ + r i H ( c i ) , (24)where the ﬁrst term is the cross-talk noise part and the second term is the self-coupling part.The next step of SCSNA is to evaluate the cross-talk noise part. We derive the dominantterm of the correlation between the self-state r i H ( c i ) and A µi and split the crosstalk noisepart into a Gaussian-noise part independent of the self-state r i H ( c i ) and the self-couplingterm.Now let us imagine a variation in G ( h ) due to a small perturbation dh in the local ﬁeld h : dG = ∂G ( h ) ∂h dh, Then, g µ = O (1 / √ N ) of Eq. (23) can be expanded as g µ = 1 N N X j =1 A µj ( G ( µ ) j − ξ j x j ) − r αN n µ − aα g µ U ( µ ) , (25)where G ( µ ) i = G ( h ( µ ) i ) , U ( µ ) = 1 aN N X j =1 ∂G ( h ( µ ) j ) ∂h ( µ ) j , h ( µ ) i = − α M X ν =1( = µ ) A νi g ν + r i H ( c i ) , (26) U ( µ ) is a macroscopic parameter called the susceptibility. G ( µ ) i is independent of A µi , because h ( µ ) i does not contain A µi .From Eq. (25), we obtain g µ = αα + aU ( µ ) r αN n µ + 1 N N X j =1 A µj ( G ( µ ) j − ξ j x j ) ! , (27)Substituting Eq. (27) into the crosstalk noise of Eq. (24), we can derive the dominant termof the correlation between the self-state r i H ( c i ) and A µi , and rearrange h i as follows. h i = αx i ξ i α + aU + 1 α + aU z i + Γ r i H ( c i ) , (28) z i = − N N X j =1( = i ) M X µ =1 A µi A µj ( ˜ G j − ξ j x j ) + αN M X µ =1 A µi n µ , Γ = aUα + aU . (29)The ﬁrst term of Eq. (28) is the signal part, the second term is the Gaussian noise indepen-dent of the self-state r i H ( c i ), and the third term is the self-coupling part. A comparison of23q. (11) with Eq. (28) indicates that the ORT is the third term and the pure local ﬁeld ˜ h i consists of the ﬁrst and second terms:˜ h i = αx i ξ i α + aU + 1 α + aU z i . (30)˜ G i and U in Eq. (29) satisfy˜ G i = ˜ G (˜ h i ) , U = 1 aN N X j =1 ∂ ˜ G (˜ h j ) ∂ ˜ h j ∂ ˜ h j ∂h j . (31)˜ G is a function of the pure local ﬁeld ˜ h i , into which the ORT is renormalized. U is thesusceptibility, expressed as the average sensitivity of ˜ G i to the local ﬁeld h i instead of to thepure local ﬁeld ˜ h i because of the deﬁnition of U ( µ ) . In the thermodynamic limit ( N → + ∞ ),the ﬂuctuations in G ( µ ) i and U ( µ ) depending on µ become negligible, and thus, G ( µ ) i and U ( µ ) can be safely replaced with ˜ G i and U .From Eq. (20), r i is given by r i =  c i ≤ (cid:0) aα U (cid:1) ˜ h i c i > . (32)Substituting Eq. (32) into Eq. (28), h i can be rewritten as h i =  ˜ h i c i ≤ h i + aα U ˜ h i = x i ξ i + α z i c i > . (33)Thus, from Eq. (33), ∂ ˜ h j /∂h j can be obtained as ∂ ˜ h j ∂h j =  c i ≤ αα + aU c i > . The above manipulations reduce the N -body system to an eﬀective one-body system de-pending only on the pure local ﬁeld ˜ h .Next, we self-consistently derive the macroscopic equation. We evaluate the average andvariance of the Gaussian noise z i in Eq. (29). Because ˜ G i and U are not correlated with A µi ,the average and variance are h z i i = 0 , (cid:10) z i (cid:11) = aα ( Q + (cid:10) x (cid:11) x − R ) + α β . R and Q are macroscopic parameters called the overlap and the mean square magne-tization, respectively. R = 1 aN N X j =1 x j ξ j ˜ G j , (34) Q = 1 aN N X j =1 ˜ G j . (35)In the thermodynamic limit ( N → + ∞ ), the averaging operations over the site index j inEqs. (31)(34)(35) can be replaced with the average in Eq. (21) and the average over theGaussian noise z and over the source signal ξ ◦ x . Thus, we obtain the macroscopic equations(12) and (13). F. Perturbation expansion for macroscopic equation in the limit A s → ∞ and η → +0 By introducing a new macroscopic parameter W deﬁned by W = Q − R , when thereis no observation noise, i.e. β = 0, we can rewrite the macroscopic equation in the limit A s → ∞ as W = aα (cid:12)(cid:12) S + (cid:10) x (cid:11) x (cid:12)(cid:12) a Z + ∞−∞ Dzz D ˜ X ( h p , h m ) E x,ξ − a Z + ∞−∞ Dz D x ξ ˜ X ( h p , h m ) E x,ξ , (36)˜ X ( h p , h m ) = H (cid:16) F χ ( h p ) − η/ (cid:16) /F χ (cid:16) aα U (cid:17)(cid:17)(cid:17) . Here, we put 2 η/ (cid:0) /F χ (cid:0) aα U (cid:1)(cid:1) = ζ . In the limit η → +0, i.e. ζ →

0, the macro-scopic equation (36) has a solution W = − h x i x corresponding to perfect reconstruction.We assume that the above macroscopic equation has the following solution when ζ ≪ W = − (cid:10) x (cid:11) x + ζ w. Substituting this into the macroscopic equation (36) and expanding around ζ = 0, we obtainthe following relation independent of the probability distribution of x , g ( x ), if g (0) and g ′ (0)are ﬁnite. w = aα | w | . This equation suggests that the solution w = 0, i.e. the perfect reconstruction solution, isstable when a < α , neutral when a = α , and unstable when a > α .25hus, when there is no observation noise, in the inﬁnite limit of the amplitude of theinjected pump ﬁeld (i.e. A s → ∞ ) and in the inﬁnitesimal limit of η , the phase transitionline of the perfect reconstruction solution becomes a = α independent of g ( x ). G. Macroscopic equation of LASSO

Under the preconditions described above, the update rule of the LASSO is given by Y i := T χ,η ( h i ) , i = 1 , · · · , Nh i = − M M X j =1( = i ) N X µ =1 A µi A µj G ( c j ) + 1 M M X j =1 N X µ =1 A µi A µj ξ j x j + 1 √ M M X µ =1 A µi n µ , where T χ,η ( h j ) is a soft-thresholding function with threshold η , deﬁned as T + ,η ( h ) =  h − η ( h ≥ η )0 ( h < η ) ,T ± ,η ( h ) =  h − η ( h ≥ η )0 ( − η < h < η ) h + η ( h ≤ − η ) . We use two diﬀerent functions T + ,η ( h ) and T ± ,η ( h ) depending on the source signal. T + ,η ( h )is for non-negative source signals, and T ± ,η ( h ) is for signed source signals.Following the same recipe of the SCSNA, we obtain the following macroscopic equation, R = 1 a Z + ∞−∞ Dz D xξ ˜ T χ,η (˜ h ) E x,ξ ,Q = 1 a Z + ∞−∞ Dz D ˜ T χ,η (˜ h ) E x,ξ ,U r β + aα ( Q + h x i x − R ) = 1 a Z + ∞−∞ Dzz D ˜ T χ,η (˜ h ) E x,ξ , (37)where the pure local ﬁeld of the LASSO becomes˜ h i = αxξα + aU + 1 α + aU z. T χ,η (˜ h ) is an eﬀective output function, given by˜ T + ,η (˜ h ) =  (cid:0) aα U (cid:1) (˜ h − η ) (˜ h ≥ η )0 (˜ h < η ) , ˜ T ± ,η (˜ h ) =  (cid:0) aα U (cid:1) (˜ h − η ) (˜ h ≥ η )0 ( − η < ˜ h < η ) (cid:0) aα U (cid:1) (˜ h + η ) (˜ h ≤ − η ) . ˜ T + ,η (˜ h ) is for non-negative source signals, and ˜ T ± ,η (˜ h ) is for signed source signals. [1] R. Tibshirani, Journal of the Royal Statistical Society Series B-Methodological , 267 (1996).[2] J. F. Claerbout and F. Muir, Geophysics , 826 (1973).[3] H. L. Taylor, S. C. Banks, and J. F. Mccoy, Geophysics , 39 (1979).[4] N. R. Chapman and I. Barrodale, Geophysical Journal of the Royal Astronomical Society ,93 (1983).[5] T. Iinuma, R. Hino, N. Uchida, W. Nakamura, M. Kido, Y. Osada, and S. Miura, NatureCommunications (2016).[6] M. Lustig, D. Donoho, and J. M. Pauly, Magn Reson Med , 1182 (2007).[7] M. Doneva and A. Mertins, Mri: Physics, Image Reconstruction, and Analysis , 51 (2016).[8] W. Lu, I. C. Atkinson, and N. Vaswani, Mri: Physics, Image Reconstruction, and Analysis , 27 (2016).[9] T. Yamamoto, K. Fujimoto, T. Okada, Y. Fushimi, A. F. Stalder, Y. Natsuaki, M. Schmidt,and K. Togashi, Investigative Radiology , 372 (2016).[10] M. Honma, K. Akiyama, M. Uemura, and S. Ikeda, Publications of the Astronomical Societyof Japan (2014).[11] R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and C. Kim, Npj Computa-tional Materials (2017).[12] G. Nakada, Y. Igarashi, H. Lmai, and Y. Oaki, Advanced Theory and Simulations (2019).[13] W. J. J. Fu, Journal of Computational and Graphical Statistics , 397 (1998).[14] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Annals of Statistics , 407 (2004).

15] J. Friedman, T. Hastie, H. Hoﬂing, and R. Tibshirani, Annals of Applied Statistics , 302(2007).[16] J. M. Bioucas-Dias and M. A. T. Figueiredo, Ieee Transactions on Image Processing , 2992(2007).[17] A. Beck and M. Teboulle, Siam Journal on Imaging Sciences , 183 (2009), ISSN 1936-4954.[18] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Foundations and Trends in MachineLearning , 1 (2011).[19] D. L. Donoho, A. Maleki, and A. Montanari, Proc Natl Acad Sci U S A , 18914 (2009).[20] C. Louizos, M. Welling, and D. P. Kingma, in International Conference on Learning Repre-sentations (2018).[21] Y. Nakanishi-Ohno, T. Obuchi, M. Okada, and Y. Kabashima, Journal of StatisticalMechanics-Theory and Experiment (2016).[22] S. S. B. Chen, D. L. Donoho, and M. A. Saunders, Siam Journal on Scientiﬁc Computing ,33 (1998).[23] R. Chartrand, Ieee Signal Processing Letters , 707 (2007).[24] J. A. Tropp and A. C. Gilbert, Ieee Transactions on Information Theory , 4655 (2007).[25] A. Marandi, Z. Wang, K. Takata, R. L. Byer, and Y. Yamamoto, Nature Photonics , 937(2014).[26] Y. Yamamoto, K. Aihara, T. Leleu, K. Kawarabayashi, S. Kako, M. Fejer, K. In-oue, and H. Takesue, npj Quantum Information , 49 (2017), ISSN 2056-6387, URL .[27] T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L.McMahon, T. Umeki, K. Enbutsu, et al., Science , 603 (2016).[28] P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki,H. Takesue, S. Utsunomiya, K. Aihara, et al., Science , 614 (2016), ISSN 0036-8075, URL https://science.sciencemag.org/content/354/6312/614 .[29] T. Leleu, Y. Yamamoto, P. L. McMahon, and K. Aihara, Physical Review Letters (2019).[30] S. Kako, T. Leleu, Y. Inui, F. Khoyratee, S. Reifenstein, and Y. Ya-mamoto, Advanced Quantum Technologies , 2000045 (2020), URL https://onlinelibrary.wiley.com/doi/abs/10.1002/qute.202000045 .[31] R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, T. Onodera, E. Ng, . Langrock, K. Inaba, T. Honjo, et al., Science Advances (2019).[32] M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J.Berkley, J. Johansson, P. Bunyk, et al., Nature , 194 (2011).[33] D. Sherrington and S. Kirkpatrick, Physical Review Letters , 1792 (1975).[34] M. Shiino and T. Fukai, Journal of Physics a-Mathematical and General , L375 (1992).[35] T. Aonishi, K. Kurata, and M. Okada, Physical Review Letters , 2800 (1999).[36] T. Aonishi, K. Kurata, and M. Okada, Physical Review E (2002).[37] T. Aonishi, K. Mimura, S. Utsunomiya, M. Okada, and Y. Yamamoto, Journal of the PhysicalSociety of Japan , 104002 (2017).[38] T. Aonishi, M. Okada, K. Mimura, and Y. Yamamoto, Journal of Applied Physics , 152129(2018), URL https://aip.scitation.org/doi/abs/10.1063/1.5041997 .[39] T. Aonishi, K. Mimura, M. Okada, and Y. Yamamoto, Journal of Applied Physics , 233102(2018), URL https://aip.scitation.org/doi/abs/10.1063/1.5041998 .[40] H. Nishimori, Statistical physics of spin glasses and information processing : an introduction ,International series of monographs on physics (Oxford University Press, Oxford ; New York,2001).[41] M. Yoshida, T. Uezu, T. Tanaka, and M. Okada, Journal of the Physical Society of Japan ,054003 (2007).[42] J. Zbontar, F. Knoll, A. Sriram, M. J. Muckley, M. Bruno, A. Defazio, M. Parente,K. J. Geras, J. Katsnelson, H. Chandarana, et al., CoRR abs/1811.08839 (2018), URL http://arxiv.org/abs/1811.08839 .[43] M. Grant and S. Boyd, in Recent Advances in Learning and Control , edited by V. Blon-del, S. Boyd, and H. Kimura (Springer-Verlag Limited, 2008), Lecture Notes in Control andInformation Sciences, pp. 95–110, http://stanford.edu/~boyd/graph_dcp.html .[44] M. Grant and S. Boyd,

CVX: Matlab software for disciplined convex programming, version2.1 , http://cvxr.com/cvx (2014).[45] F. Tanaka and S. F. Edwards, Journal of Physics F-Metal Physics , 2769 (1980).[46] A. Crisanti and H. Sompolinsky, Physical Review A , 4865 (1988).[47] P. Kinsler and P. D. Drummond, Phys Rev A , 6194 (1991).[48] D. Maruo, S. Utsunomiya, and Y. Yamamoto, Physica Scripta (2016).[49] H. M. Wiseman and G. J. Milburn, Phys Rev Lett , 548 (1993).

50] Y. Inui and Y. Yamamoto,

Noise correlation and success probability in coherent ising machines (2020), 2009.10328.[51] H. Risken,

The Fokker-Planck Equation Methods of Solution and Applications , Springer Seriesin Synergetics, (Springer Berlin Heidelberg,, Berlin, Heidelberg, 1989), second edition. ed.,ISBN 9783642615443 0172-7389 ;, URL http://dx.doi.org/10.1007/978-3-642-61544-3 .[52] A. Yamamura, K. Aihara, and Y. Yamamoto, Phys. Rev. A , 053834 (2017), URL https://link.aps.org/doi/10.1103/PhysRevA.96.053834 .[53] T. Shoji, K. Aihara, and Y. Yamamoto, Phys. Rev. A , 053833 (2017), URL https://link.aps.org/doi/10.1103/PhysRevA.96.053833 . Acknowledgements

This work is supported by the Japan Science and Technology Agency through its ImPACTprogram, NTT Research Inc. and the National Science Foundation of the United States ofAmerica.

Author contributions

The project was conceived by T.A., K.M., M.O. and Y.Y. The hybrid system was devisedby T.A., K.M., M.O. and Y.Y. The manuscript was written by T.A., K.M. and Y.Y. Forthe analytical results, T.A. and Y.Y. derived the truncated Wigner stochastic diﬀerentialequation, and T.A., K.M. and M.O. derived the macroscopic equations. For the numericalresults, T.A. and K.M. performed the simulator implementations.

Competing interests

The authors declare no competing interests.

Additional information

Supplementary information is available for this paper athttps://doi.org/XXXXXX/YYYYYYY. 30 orrespondence and requests for materials should be addressed to T.A.31 lgorithm 1

Alternating minimization of CIM-L0-regularization-based CS

Require: M -by- N observation matrix: A , M -dimensional observation signal: y Ensure: N -dimensional support vector: σ , N -dimensional signal vector: r Initialize r and η = η init for t=1 to 50 do Minimize H with respect to σ by CIM: σ = CIM support estimation( r, η ) c = 0, and numerically integrate the W-SDE while increasingnormalized pump rate from 0 to 1 . A s = 10 or for twohundred times photon’s lifetime when A s = 250. Minimize H with respect to r by CDP: S = diag( σ ) r = (cid:0) diag[ A T A ] + SA T AS − diag[ SA T AS ] (cid:1) − SA T y Decrement η : η = max( η init (1– t/ , η end ) end for return σ and r (cid:51)(cid:47)(cid:49)(cid:54)(cid:43)(cid:42)(cid:51)(cid:88)(cid:80)(cid:83)(cid:83)(cid:88)(cid:79)(cid:86)(cid:72) (cid:54)(cid:43)(cid:42)(cid:83)(cid:88)(cid:79)(cid:86)(cid:72) (cid:41)(cid:51)(cid:42)(cid:36)(cid:86) (cid:50)(cid:88)(cid:87)(cid:83)(cid:88)(cid:87)(cid:70)(cid:82)(cid:88)(cid:83)(cid:79)(cid:72)(cid:85)(cid:41)(cid:76)(cid:69)(cid:72)(cid:85)(cid:3)(cid:85)(cid:76)(cid:81)(cid:74)(cid:3)(cid:70)(cid:68)(cid:89)(cid:76)(cid:87)(cid:92) (cid:50)(cid:51)(cid:50)(cid:3)(cid:83)(cid:88)(cid:79)(cid:86)(cid:72)(cid:86)(cid:6)(cid:20)(cid:6)(cid:21)(cid:6)(cid:76)(cid:44)(cid:48)(cid:18)(cid:48)(cid:51)(cid:4) (cid:41)(cid:72)(cid:72)(cid:71)(cid:69)(cid:68)(cid:70)(cid:78)(cid:83)(cid:88)(cid:79)(cid:86)(cid:72) (cid:47)(cid:50)(cid:83)(cid:88)(cid:79)(cid:86)(cid:72)(cid:44)(cid:81)(cid:83)(cid:88)(cid:87)(cid:70)(cid:82)(cid:88)(cid:83)(cid:79)(cid:72)(cid:85) (cid:16) (cid:36)(cid:39)(cid:38)(cid:39)(cid:36)(cid:38) (cid:37)(cid:88)(cid:73)(cid:73)(cid:72)(cid:85)(cid:37)(cid:88)(cid:73)(cid:73)(cid:72)(cid:85) (cid:38)(cid:79)(cid:68)(cid:86)(cid:86)(cid:76)(cid:70)(cid:68)(cid:79)(cid:39)(cid:76)(cid:74)(cid:76)(cid:87)(cid:68)(cid:79)(cid:51)(cid:85)(cid:82)(cid:70)(cid:72)(cid:86)(cid:86)(cid:82)(cid:85) H X i r i (cid:54)(cid:88)(cid:83)(cid:83)(cid:82)(cid:85)(cid:87)(cid:3)(cid:11)(cid:19)(cid:3)(cid:82)(cid:85)(cid:3)(cid:20)(cid:12)(cid:54)(cid:76)(cid:74)(cid:81)(cid:68)(cid:79)(cid:3)(cid:89)(cid:68)(cid:79)(cid:88)(cid:72) Coherent Ising Machine for Support Estimation Classical Digital Processor for Signal Estimation a b (cid:43)(cid:82)(cid:80)(cid:82)(cid:71)(cid:92)(cid:81)(cid:72)(cid:71)(cid:72)(cid:87)(cid:72)(cid:70)(cid:87)(cid:76)(cid:82)(cid:81)

FIG. 1: A quantum-classical hybrid system for L0-regularization-based CS consisting of a co-herent Ising machine (CIM) for support estimation and b classical digital processor (CDP) forsignal estimation. This system is realizes the alternating minimization described in Algorithm1. Pump pulses are injected into an optical parametric oscillator (OPO) formed in a ﬁber ringcavity through second harmonic generation (SHG) crystal. A periodically poled lithium niobate(PPLN) waveguide device induces a phase-sensitive degenerate optical parametric ampliﬁcation ofthe signal pulses, and each of the OPO pulses take either the 0-phase state (corresponding to theup-spin) or the π -phase state (corresponding to the down-spin) above the oscillation threshold.Part of each pulse is picked oﬀ from the main cavity by the output coupler, and it is measuredby optical homodyne detectors. A ﬁeld programmable gate array (FPGA) calculates the feedbacksignal which is then provided to the intensity modulator (IM) and phase modulator (PM) to pro-duce the injection ﬁeld described in Eq. (5) to each of the OPO pulses through the input coupler. H ( X i ) is a binarized value, either 0 or 1, of the in-phase amplitude of the i -th OPO pulse, whichis the support estimate to be transferred to the CDP. The CDP solves the linear simultaneousequation (Eq. (7)), and the solution r i is transferred to the CIM. alf-Gaussian(+) Gaussian(±) A s2 =250 ab R M SE R M SE a aa a η =0.1 η =0.05 η =0.01 η =0.1 η =0.05 η =0.01  = .  = .  = .  = . η =0.1 η =0.05 η =0.01 η =0.1 η =0.05 η =0.01 A s2 =10 FIG. 2: Comparison of solutions of macroscopic equations with solutions of Algorithm 1: cases ofno observation noise (i.e. β = 0) and half-Gaussian (+) and Gaussian ( ± ) source signals. TheRMSEs of the solutions are plotted as a function of sparseness a for various thresholds η andcompression rates α . a Comparison of solutions of the macroscopic equation (12) and those ofAlgorithm 1 with A s = 250. b Comparison of solutions of the macroscopic equation (13) and thoseof Algorithm 1 with A s = 10 . In a and b , the red lines and green lines respectively indicateRMSEs of the near-zero RMSE state and non-zero RMSE state in CIM L0-regularization-basedCS, which were obtained with the macroscopic equation (12) with A s = 250 and the macroscopicequation (13). The blue lines are RMSEs of the near-zero RMSE state in LASSO, which wereobtained with the macroscopic equation (37) with the same threshold value of η indicated abovethe graphs. The circles and error bars represent mean values and standard deviations of tentrial solutions numerically obtained by Algorithm 1. To conﬁrm the existence of solutions ofAlgorithm 1 corresponding to the near-zero RMSE states indicated by the macroscopic equation, r was initialized to the true signal value, i.e. x ◦ ξ , and η was kept constant by setting η init = η end to the value of η indicated above the graphs. For all graphs, ˜ K = 0 .

25 and N = 2000. a c L0L1(+)=0.2=0.1=0.05=0.03=0.01=0.001 a c L0L1( )=0.2=0.1=0.05=0.03=0.01 a c L0L1(+)Lower=1.0=0.8=0.6=0.4 =0.2=0.1=0.05=0.03=0.01=0.001 a c L0L1( )Lower=1.0=0.8=0.6=0.4 =0.2=0.1=0.05=0.03=0.01=0.001

Half-Gaussian(+) Gaussian(±)LASSOCIM L0 ( A s2 → ∞ ) ab FIG. 3: Phase diagrams of CIM L0-regularization-based CS in the limit A s → ∞ and LASSOfor various η : cases of no observation noise (i.e. β = 0) and half-Gaussian (+) and Gaussian ( ± )source signals. a Phase diagrams of CIM L0-regularization-based CS. b Phase diagrams of LASSO.In a , the red lines show the ﬁrst-order phase transition point a c from the near-zero-RMSE state asa function of α . In a , the black dotted-dashed line in each plot indicates the lower bound of theﬁrst-order transition points from the near-zero-RMSE state in CIM L0-regularization-based CS. In b , the blue lines show the ﬁrst-order phase transition point a c of LASSO as a function of α . Theblack solid line in each plot is a critical line indicating a boundary whether or not L0-minimization-based CS can perfectly reconstruct the source signal with no error for the non-negative case andsigned case [21], while the black dotted line is a critical line indicating a boundary whether ornot L1-minimization-based CS can perfectly reconstruct the source signal with no error for thenon-negative case and signed case [19]. M SE R M SE a aa a η init =0.6 η init =0.1 η init =0.6 η init =0.1 η init =0.01 Half-Gaussian(+) Gaussian(±)  = .  = .  = .  = . η init =0.01 η init =0.6 η init =0.1 η init =0.6 η init =0.1 η init =0.01 η init =0.01 ab FIG. 4: Basin of attraction of CIM L0-regularization-based CS depending on the initial threshold η init : cases of no observation noise (i.e. β = 0) and half-Gaussian (+) and Gaussian ( ± ) sourcesignals. a Final states of Algorithm 1 when starting from various initial states for various η init . Thepairs of points connected by a line indicate initial and ﬁnal states for each trial when η end = 0 . A s = 10 . b Final states of Algorithm 1 when starting from an initial state r = 0 for various η init . The circles and error bars represent mean values and standard deviations of twenty trialsolutions numerically obtained by Algorithm 1 with η end = 0 .

01 and A s = 10 . The red lines showthe solutions of the macroscopic equation (13) with near-zero RMSE when η = 0 .

01 and A s → ∞ ,while the blue lines indicate RMSEs of LASSO when η = 0 .

01. The black lines present the lowerbounds of ﬁrst-order phase transition points of the CIM L0-regularization-based CS. ˜ K = 0 .

25 and N = 4000. a a a a R M SE H a l f - G au ss i an ( + )  =0.4  =0.6  =0.8  = .  = . a a H a l f - G au ss i an ( + )  =0.01  =0.1  =0.17  =0.3  =0.037 a a a CIM L0 LASSO - CIM L0LASSO  = .  =0.05 a a bc FIG. 5: RMSEs under the optimal threshold when there is observation noise: case of half-Gaussian(+) source signals. The standard deviation of the observation noise was set to β = 0 .

01, 0 . . a Comparison of RMSEs of CIM L0-regularization-based CS and those of LASSO underthe optimal threshold for each method. The color-scale indicates the minimum RMSE under theoptimal threshold at each point ( a, α ) that was obtained by a grid search for the set of solutionsto the macroscopic equations (13) and (37) in the range 0 . ≤ η ≤ . a, α ). b Diﬀerence in minimum RMSE between LASSO and the CIM L0-regularization-based CS underthe optimal threshold for each method. The color-scale indicates the minimum RMSE of CIM L0regularization-based CS subtracted from that of LASSO at each point ( a, α ). c Comparison ofsolutions of the macroscopic equation (13) and those of Algorithm 1 with A s = 10 . The red solidlines show the near-zero RMSE solutions to the macroscopic equation (13). The circles and errorbars represent mean values and standard deviations of ten trial solutions numerically obtained byAlgorithm 1 when starting from the initial state r = 0. The value of η indicated on the right sideof the graphs in c is the optimal threshold at α = 0 .

5, which was set as the value of η end . For allthe graphs in c , η init = 0 .

6, ˜ K = 0 .

25 and N = 4000. a a G au ss i an ( ± )  = .  = . a a a a a a a CIM L0 LASSO - CIM L0LASSO  = . R M SE  =0.4  =0.6  =0.8  =0.01  =0.1  =0.18  =0.35  =0.038  =0.05 G au ss i an ( ± ) a a bc FIG. 6: RMSEs under the optimal threshold when there is observation noise: case of Gaussian( ± ) source signals. The standard deviation of the observation noise was set to β = 0 .

01, 0 . .

1. The methods and conditions for obtaining these graphs are the same as in Fig. 5except for the probability distribution of the source signals. a Comparison of RMSEs of CIML0-regularization-based CS and those of LASSO under the optimal threshold for each method.The color-scale indicates the minimum RMSE under the optimal threshold at each point ( a, α ). b Diﬀerence in minimum RMSE between LASSO and the CIM L0-regularization-based CS underthe optimal threshold for each method. The color-scale indicates the minimum RMSE of CIM L0regularization-based CS subtracted from that of LASSO at each point ( a, α ). c Comparison ofsolutions of the macroscopic equation (13) and those of Algorithm 1 with A s = 10 when startingfrom the initial state r = 0. bc FIG. 7: Performance of CIM L0 regularization-based CS and other methods on realistic data. a Left: Original image consisting of 64 ×

64 pixels, which is spanned by Haar basis functions.The sparseness of original image is 0 .

21. Middle: k-space data (64 ×

64 pixels) obtained withdiscrete Fourier transform from original image. 40% of k-space data were undersampled at randomred points. Thus, the compression rate of observation signal is 0 .

4. Right: Zero-ﬁlling Fourierreconstruction from undersampled k-space data. b Reconstructed images with lowest errors andtheir RMSEs. Left: CIM L0 regularization-based CS. η init = η end = 0 . η = 0 . c RMSEs as afunction of the threshold η . Blue line with error bars: CIM L0 regularization-based CS. Ten trials.Red line: LASSO. Circle: L1 minimization-based CS. For all methods, γ = 0 . x g ( x ) -4 -2 0 2 4 x g ( x ) -4 -2 0 2 4 x g ( x ) -4 -2 0 2 4 x g ( x ) Gamma(+) Bilateral Gamma(±)Half-Gaussian(+) Gaussian(±)

FIG. 8: Four kinds of probability density function used for generating the source signal in thenumerical experiments. The half-Gaussian (+) and Gamma (+) are deﬁned over a non-negativerandom variable. The Gaussian ( ± ) and bilateral Gamma ( ± ) are deﬁned over a signed randomvariable. The second moments of the half-Gaussian (+) and Gaussian ( ± ) are set to (cid:10) x (cid:11) x = 1.The shape and scale parameters of Gamma (+) and bilateral Gamma ( ± ) are set to k = 2 and θ = 0 .

4, and the second moments are (cid:10) x (cid:11) x = 0 .

96. The ﬁgures in the main text show resultsfor source signals generated from half-Gaussian (+) and Gaussian ( ± ), while the supplementaryﬁgures show results for source signals from Gamma (+) and bilateral Gamma ( ± ).).