[PDF] Computing One-bit Compressive Sensing via Double-Sparsity Constrained Optimization

Abstract

One-bit compressive sensing is popular in signal processing and communications due to the advantage of its low storage costs and hardware complexity. However, it has been a challenging task all along since only the one-bit (the sign) information is available to recover the signal. In this paper, we appropriately formulate the one-bit compressed sensing by a double-sparsity constrained optimization problem. The first-order optimality conditions via the newly introduced \tau-stationarity for this nonconvex and discontinuous problem are established, based on which, a gradient projection subspace pursuit (GPSP) approach with global convergence and fast convergence rate is proposed. Numerical experiments against other leading solvers illustrate the high efficiency of our proposed algorithm in terms of the computation time and the quality of the signal recovery as well.

Full PDF

CComputing One-bit Compressive Sensing viaDouble-Sparsity Constrained Optimization

Shenglong Zhou, [email protected] of Mathematics, University of Southampton, UKZiyan Luo, [email protected] Xiu, [email protected] of Applied Mathematics, Beijing Jiaotong University, China

Abstract

One-bit compressive sensing is popular in signal processing and communications dueto the advantage of its low storage costs and hardware complexity. However, it hasbeen a challenging task all along since only the one-bit (the sign) information isavailable to recover the signal. In this paper, we appropriately formulate the one-bitcompressed sensing by a double-sparsity constrained optimization problem. The ﬁrst-order optimality conditions via the newly introduced τ -stationarity for this nonconvexand discontinuous problem are established, based on which, a gradient projectionsubspace pursuit ( GPSP ) approach with global convergence and fast convergence rateis proposed. Numerical experiments against other leading solvers illustrate the higheﬃciency of our proposed algorithm in terms of the computation time and the qualityof the signal recovery as well.

Keywords:

One-bit compressive sensing, double-sparsity constrained optimization,optimality conditions, gradient projection subspace pursuit, global convergence

Mathematical Subject Classiﬁcation: · · · Compressive sensing (CS) has seen evolutionary advances in theory and algorithmsin the past few decades since it was introduced in the ground-breaking papers [6, 5, 10].It aims to reconstruct a sparse signal x from an underdetermined linear systems Φx = b,where Φ ∈ R m × n is the measurement matrix and b ∈ R m is the measurement observation.To reduce storage costs and hardware complexity, Boufounos and Baraniuk [3] beneﬁted1 a r X i v : . [ m a t h . O C ] J a n rom the one-bit quantization case of CS, where only the sign information of measurementsare preserved, that is c = sgn(Φx) . Here, sgn( t ) returns one if t is positive and negative one otherwise, and thus c i ∈ { , − } , i ∈ [ m ] := { , , · · · , m } is the one-bit measurement. This gives rise to the one-bit CS. It wasthen extensively applied into applications including communications [26, 30, 35], wirelesssensor network [22, 11, 33, 7], cognitive radio [19, 12], and to name a few [14, 9, 21]. The task of one-bit CS constructs the sparse signal from the one-bit measurements. Theideal optimization model is the following (cid:96) -norm minimization,min x ∈ R n (cid:107) x (cid:107) , s.t. c = sgn(Φx) , (1.1)where (cid:107) x (cid:107) is the (cid:96) -norm of x, counting the number of its non-zero entries. An impressivebody of work has developed numerical algorithms for solving the above problem, but mostof them placed the interest in its approximations due to the NP-hardness. The earliest onecan be traced back to [3] where (1.1) was relaxed bymin x ∈ R n (cid:107) x (cid:107) , s.t. A x ≥ , (cid:107) x (cid:107) = 1 . (1.2)Here, A := Diag(c)Φ , Diag(c) represents the diagonal matrix with diagonal entries from c, (cid:107)·(cid:107) is the (cid:96) -norm, and (cid:107)·(cid:107) is the Euclidean norm. A popular approach of dealing with theinequality constraints in (1.2) is using penalization, leading to the following optimizationmin x ∈ R n (cid:107) x (cid:107) + λϕ (x) , s.t. (cid:107) x (cid:107) = 1 , (1.3)where λ > ϕ : R n → R is a loss function. In [3], they adopted the one-sided (cid:96) function ϕ (x) := (cid:107) ( − A x) + (cid:107) with y + := (max { y , } , · · · , { y m , } ) (cid:62) and employed arenormalized ﬁxed point iteration algorithm. Since the targeted problem is a nonconvexoptimization, the convergence result has not been provided. The same problem was alsoaddressed by a restricted step shrinkage algorithm [18], where the generated sequence wasproved to converge to a stationary point of the penalty problem if some slightly strongassumptions on the sequence were satisﬁed.Following the work in [3], Boufounos modiﬁed CoSaMP [27], one of the most populargreedy methods in CS, to derive the matching sign pursuit method [2]. It turned out to2ddress the sparsity constrained model,min x ∈ R n ϕ (x) , s.t. (cid:107) x (cid:107) = s, (cid:107) x (cid:107) = 1 , (1.4)where s (cid:28) n is a given sparsity level, and ϕ is the one-sided (cid:96) function. Based on theframework of the famous iterative hard thresholding algorithm, the modiﬁed version BITHwas then developed in [17] to solve the problem (1.4). Apart from the one-sided (cid:96) function,BITH was also able to process the one-sided (cid:96) function, namely, ϕ (x) = (cid:107) ( − A x) + (cid:107) . Itwas claimed that with a high probability, the distance between a reconstructed signal byBIHT and the original one can be bounded by a preﬁxed accuracy if the former quantizesto the same quantization point as the latter. As a consequence, the method enjoys to alocal convergence property. The latest work on the problem (1.4) consists of the robustbinary iterative hard thresholding [13], the soft consistency reconstructions [4], the binaryiterative re-weighted method [31] and the pinball loss iterative hard thresholding [16].In [15], authors took advantage of the (cid:96) -regularized least squares to deal with theone-bit CS regardless of the sign information of Φx, namely,min x ∈ R n (cid:107) x (cid:107) + λ (cid:107) c − Φx (cid:107) . (1.5)As stated, with a high probability, the distance between the solution to the model up toa constant and a sparse solution can be bounded by a preﬁxed accuracy if the sample size m is greater than a threshold. Then a primal dual active set algorithm was proposed tosolve the above model and proved to converge within one step under two assumptions:the columns of the matrix Φ indexed on the non-zero components of the sparse solutionis full rank and the initial point is suﬃciently close to the sparse solution. Therefore, thegenerated sequence again has a local convergence property.When the number of sign ﬂips k ( (cid:28) m ) is provided, authors in [34] integrated a sparsevariable w to the problem (1.4). The non-zero components in w represent the measurementsthat have sign ﬂips. The resulting optimization problem ismin x ∈ R n , w ∈ R m (cid:107) ( − Diag(c)(Φx + w)) + (cid:107) pp , (1.6) s.t. (cid:107) x (cid:107) ≤ s, (cid:107) x (cid:107) = 1 , (cid:107) w (cid:107) ≤ k, where p = 1 or 2. To tackle the above problem, an alternating minimization method(adaptive outliers pursuit, AOP) was cast: solving one variable while ﬁxing the other.However, AOP has been tested to heavily rely on the choice of k and the convergence resultremains to be seen. Other work relating to (1.6) includes the noise-adaptive renormalizedﬁxed point iteration approach [24] and the noise-adaptive restricted step shrinkage [25].3hen the number of the sign ﬂips is unavailable, a compensation pursues a solutionwith sign ﬂips as few as possible, which can be fulﬁlled by the following one-sided (cid:96) functionminimization [8], min x ∈ R n (cid:107) ( (cid:15) − A x) + (cid:107) + η (cid:107) x (cid:107) , s.t. (cid:107) x (cid:107) ≤ s, (1.7)where η and (cid:15) are given positive parameters, and the latter is used to majorize the objectivefunction. The ﬁrst term in the objective function arises from maximizing a posteriorestimation from the perspective of statistics. It returns the number of positive componentsof ( (cid:15) − A x) and can be regarded as the number of the sign ﬂips if (cid:15) is quite small. Insteadof solving the one-sided (cid:96) model directly, a ﬁxed-point algorithm [8] was created for itsapproximation,min x ∈ R n , w ∈ R m (cid:107) ( (cid:15) − w) + (cid:107) + µ (cid:107) w − A x (cid:107) + η (cid:107) x (cid:107) , s.t. (cid:107) x (cid:107) ≤ s, (1.8)where µ >

0. It has shown that the generated sequence converges to a local minimizer ofthe approximation problem if the maximum singular value of the matrix A is bounded bysome chosen parameters. However, the relationship between the solution obtained by themethod and the original problem (1.7) has not been well explored.Some other numerical algorithms developed to solve the one-bit CS can be seen theconvex relaxation [28], the Bayesian approach [20], the sparse consistent coding algorithms[29] and the method based on a Schur-concave function [32]. Motivated by the previous work, we formulate the one-bit CS as the following double-sparsity constrained optimization:min x ∈ R n , y ∈ R m (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) , (1.9) s.t. (cid:107) x (cid:107) ≤ s, (cid:107) y + (cid:107) ≤ k where η > s (cid:28) n and k (cid:28) m are two integers representing theprior information on the upper bounds of the signal sparsity and the number of sign ﬂips,respectively. When penalizing the sign ﬂip constraint in our model, it turns to (1.8) withthe auxiliary y = (cid:15) − w. It is worth mentioning that the selection of k is very ﬂexiblein our approach as will be shown in the numerical experiments, which reveals that ourapproach has no heavy burden on the pre-knowledge of the true number of sign ﬂips. Ourcontributions in this paper are threefold: 41) The new optimization model.

The double-sparsity constrained optimizationproblem (1.9) is formulated to handle the one-bit CS. It is well-known that thesetwo discrete and nonconvex constraints lead to the NP-hardness in general. Never-theless, a necessary and suﬃcient optimality condition as stated in (3.1) for a localminimizer is established, see Lemma 3.1. Moreover, the necessary and suﬃcient opti-mality condition for a global minimizer is also studied in terms of the newly introduced τ -stationary point, see Theorem 3.1.(2) The eﬃcient

GPSP algorithm.

As the established optimality conditions indicate a τ -stationary point is instructive to pursue an optimal solution to (1.9), we thus designa gradient project method with a subspace pursuit scheme interpolated ( GPSP ). Theproposed method is proved to be globally convergent to a unique local minimizerwithout any assumptions, see Theorem 4.1. Furthermore, the produced sequenceeither enjoys a Q-linear convergence rate or is identical to the limiting point afterﬁnite iterations. Particularly,

GPSP will stop within ﬁnite steps once the limiting pointreaches the upper bounds of the corresponding double-sparsity, see Theorem 4.2.(3)

High numerical performance.

GPSP is demonstrated to be relatively robust tothe parameters k , (cid:15) , η in (1.9) in the numerical experiments, which indicates that wedo not need an exact upper bound k of the sign ﬂips. Most importantly, GPSP out-performs some leading solvers for synthetic data, in both time eﬃciency and recoveryaccuracy.

The remainder of the paper is organized as follows. In Section 2, some necessary mathemat-ical backgrounds are provided, including the notation and the projection onto the feasibleset of the problem (1.9). Section 3 is devoted to the optimality conditions of the problem,associated with the τ -stationary points, followed by its relationship to the global mini-mizers. In Section 4, the gradient projection subspace pursuit ( GPSP ) method is designed,and the global convergence and the Q-linear convergence rate are established. Numericalexperiments are given in Section 5, including the involved parameters ( s, k, (cid:15), η ) tuning andcomparisons with other state-of-the-art solvers. Concluding remarks are made in Section 6.5

Preliminaries

We ﬁrst deﬁne some notation employed throughout this paper. To diﬀer from sgn(t), thesign function is written as sign(t) that returns 0 if t = 0 and sgn(t) otherwise. Given a subset T ⊆ [ n ] := { , , · · · , n } , its cardinality and complementary set are | T | and T := [ n ] \ T . Fora vector x ∈ R n , the support set supp(x) represents the indices of non-zero elements of x andthe neighbourhood with a radius δ > N (x , δ ) := { w ∈ R n : (cid:107) w − x (cid:107) < δ } .Let (cid:107) x (cid:107) [ i ] be the i th largest (in absolute) element of x. In addition, x T stands for thesub-vector contains elements of x indexed on T . Similarly, for a matrix A ∈ R m × n , A Γ T is the sub-matrix containing rows indexed on Γ and columns indexed on T , particularly, A : T = A [ m ] T . Moreover, we merge two vectors x and y by z := (x; y) := (x (cid:62) y (cid:62) ) (cid:62) .For a positive deﬁnite matrix H , the H -weighted norm is written (cid:107) z (cid:107) H = (cid:104) z , H z (cid:105) , where (cid:104) z , z (cid:48) (cid:105) := (cid:80) z i z (cid:48) i is the inner product of two vectors. Given a scalar a ∈ R , (cid:100) a (cid:101) returns thesmallest integer that is no less than a . For simplicity, denote S := { x ∈ R n : (cid:107) x (cid:107) ≤ s } , K := { y ∈ R m : (cid:107) y + (cid:107) ≤ k } . (2.1)The feasible region of (1.9) is then denoted by F := S × K , with its interiorint F := { (x , y) ∈ R n × R m : (cid:107) x (cid:107) < s, (cid:107) y + (cid:107) < k } . For a nonempty and closed set Ω ⊆ R n , the projection Π Ω (x) of x ∈ R n onto Ω is given byΠ Ω (x) = argmin {(cid:107) x − w (cid:107) : w ∈ Ω } . By introducing(2.2) Σ(x; s ) := (cid:110) T ⊆ [ n ] : | T | = s, | x i | ≥ | x j | , ∀ i ∈ T , ∀ j / ∈ T (cid:111) , one can easily verify that(2.3) Π S (x) = (cid:110) (x T ; 0) : T ∈ Σ(x; s ) (cid:111) . To derive the projection of a point y ∈ R m onto K , denoteΓ + := { i ∈ [ m ] : y i > } , Γ := { i ∈ [ m ] : y i = 0 } , (2.4) Γ − := { i ∈ [ m ] : y i < } . + , Γ and Γ − should depend on y. We drop their dependence if no extraexplanations are provided for the sake of notational convenience. Based on the abovenotation, for a point y ∈ R m and an integer k ∈ [ m ], we deﬁne a set by(2.5) Θ(y; k ) := (cid:40) Γ = (Γ k ∪ Γ − ) ⊆ [ m ] : Γ k ⊆ Γ + , | Γ k | = min { k, | Γ + |} y i ≥ y j ≥ , ∀ i ∈ Γ k , ∀ j / ∈ Γ (cid:41) , where Γ + , Γ and Γ − be given by (2.4). One can observe that any Γ ∈ Θ(y; k ) consists ofthe indices of all negative elements and the ﬁrst min { k, | Γ + |} largest positive elements ofy. These notation allow us to derive the projection Π K (y) by(2.6) Π K (y) = (cid:110) (y Γ ; 0) : Γ ∈ Θ(y; k ) (cid:111) . A simple example is presented for illustration. Given y = (3 , , , , − (cid:62) , we haveΘ(y; 3) = (cid:110) { , , , } (cid:111) , Π K (y) = { y } , Θ(y; 2) = (cid:110) { , , } , { , , } (cid:111) , Π K (y) = (cid:110) (3 , , , , − (cid:62) , (3 , , , , − (cid:62) (cid:111) . To end this section, we present some properties of the objective function in (1.9) which canbe written in the following form f (x , y) := (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) = (cid:107) z (cid:107) H − (cid:104) b , z (cid:105) + m(cid:15) (2.7) =: f (z) , where the matrix H and the vector b are given by H := 12 ∇ f (z) = (cid:34) A (cid:62) A + ηI A (cid:62) A I (cid:35) , b = (cid:15) (cid:34) A (cid:62) (cid:35) . It is easy to verify that H is symmetric positive deﬁnite and hence has all eigenvaluespositive. Denote the smallest and the largest eigenvalues by λ min and λ max , respectively.The quadratic objective function f is then strongly convex and strongly smooth since forany z and z (cid:48) in R n + m , f (z) − f (z (cid:48) ) − (cid:104)∇ f (z (cid:48) ) , z − z (cid:48) (cid:105) = (cid:107) z − z (cid:48) (cid:107) H ∈ (cid:2) λ min (cid:107) z − z (cid:48) (cid:107) , λ max (cid:107) z − z (cid:48) (cid:107) (cid:3) . (2.8) 7 Optimality Conditions

The ﬁrst-order necessary and suﬃcient optimality conditions for (1.9) are established inthis section.

Lemma 3.1

A local minimizer z ∗ := (x ∗ ; y ∗ ) ∈ F of (1.9) must satisfy  ∇ x f (z ∗ ) = 0 , if (cid:107) x ∗ (cid:107) < s, ( ∇ x f (z ∗ )) T ∗ = 0 , if (cid:107) x ∗ (cid:107) = s, ∇ y f (z ∗ ) = 0 , if (cid:107) y ∗ + (cid:107) < k, ( ∇ y f (z ∗ )) Γ ∗ = 0 , ( ∇ y f (z ∗ )) Γ ∗ ≤ , if (cid:107) y ∗ + (cid:107) = k, (3.1) where T ∗ := supp(x ∗ ) , Γ ∗ := supp(y ∗ ) . Conversely, let a point z ∗ ∈ F satisfy (3.1). Then it is the unique global minimizer if z ∗ ∈ int F and the unique local minimizer otherwise. Furthermore, there is a δ ∗ > suchthat, for any z ∈ F ∩ N (z ∗ , δ ∗ ) , we have the following quadratic growth property f (z) − f (z ∗ ) ≥ (cid:107) z − z ∗ (cid:107) H . (3.2) Proof

Let z ∗ be a local minimizer of (1.9). Then there is a δ > ∈ F ∩ N (z ∗ , δ ),0 ≤ f (z) − f (z ∗ ) (2.8) ≤ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ max (cid:107) z − z ∗ (cid:107) =: g (z) . (3.3)To verify that z ∗ satisﬁes (3.1), we consider the four involved cases. • (cid:107) y ∗ + (cid:107) < k . If there is an ( ∇ y f (z ∗ )) i (cid:54) = 0, then for any t ∈ R , deﬁnex t := x ∗ + t · , y t := y ∗ + t ( ∇ y f (z ∗ )) i · e i , z t := (x t ; y t ) . (3.4)where e i ∈ R m is the i th column of the identity matrix. It is easy to see that z t ∈ F and g (z t ) = ( t + λ max t )[( ∇ y f (z ∗ )) i ] . For any t ∈ (max {− δ/ | ( ∇ y f (z ∗ )) i | , − /λ max } , t ∈ F ∩ N (z ∗ , δ ) and g (z t ) <

0, contradicting with (3.3). Thus ∇ y f (z ∗ ) = 0. • (cid:107) y ∗ + (cid:107) = k . If there is an i ∈ Γ ∗ such that ( ∇ y f (z ∗ )) i (cid:54) = 0, then let x t , y t and z t be as(3.4). The same reasoning for the case (cid:107) y ∗ + (cid:107) < s enables to prove ( ∇ y f (z ∗ )) i = 0.8his displays ( ∇ y f (z ∗ )) Γ ∗ = 0. To show the desired inequality ( ∇ y f (z ∗ )) Γ ∗ ≤ i ∈ Γ ∗ . For any t ≤

0, letx t := x ∗ + t · , y t := y ∗ + t · e i , z t := (x t ; y t ) . (3.5)It follows from y ∗ i = 0 and t ≤ t ) + = y ∗ + and (cid:107) z t − z ∗ (cid:107) = − t . Thus,z t ∈ F ∩ N (z ∗ , δ ) for any t ∈ ( − δ, g (z t ) = t ( ∇ y f (z ∗ )) i + λ max t ≥ , which implies ( ∇ y f (z ∗ )) i + λ max t ≤ . Letting t → ∇ y f (z ∗ )) i ≤

0. Thisdelivers ( ∇ y f (z ∗ )) Γ ∗ ≤ • (cid:107) x ∗ (cid:107) < s . The same reasoning for the case (cid:107) y ∗ + (cid:107) < k leads to ∇ x f (z ∗ ) = 0. • (cid:107) x ∗ (cid:107) = s . The same reasoning for the case (cid:107) y ∗ + (cid:107) < k yields ( ∇ x f (z ∗ )) T ∗ = 0.Conversely, let z ∗ satisfy (3.1). We consider the following four cases. • (cid:107) y ∗ + (cid:107) < k . This leads to (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105) = 0 since ∇ y f (z ∗ ) = 0 by (3.1). • (cid:107) y ∗ + (cid:107) = k . Consider a local region N (z ∗ , δ ) with δ := min { y ∗ i : y ∗ i > } . Thus, forany z ∈ F ∩ N (z ∗ , δ ), we have y j > y ∗ j > y i ≤ , ∀ i ∈ Γ ∗ . (3.6)In fact, if there exists an i ∈ Γ ∗ satisfying y i >

0, then (cid:107) y + (cid:107) ≥ (cid:107) y ∗ + (cid:107) + 1 = k + 1,which contradicts with y ∈ K . Direct calculations yield (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105) (3.1) = (cid:104) , (y − y ∗ ) Γ ∗ (cid:105) + (cid:104) ( ∇ y f (z ∗ )) Γ ∗ , y Γ ∗ (cid:105) (3.1)(3.6) ≥ . • (cid:107) x ∗ (cid:107) < s . This yields (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) = 0 due to ∇ x f (z ∗ ) = 0 by (3.1). • (cid:107) x ∗ (cid:107) = s . Consider a local region N (z ∗ , δ ) with δ := min {| x ∗ i | : x ∗ i (cid:54) = 0 } . For anyz := (x; y) ∈ F ∩ N (z ∗ , δ ), x j (cid:54) = 0 if x ∗ j (cid:54) = 0, which indicates T ∗ ⊆ supp(x). Thistogether with x ∈ S that (cid:107) x (cid:107) ≤ s = | T ∗ | suﬃces tosupp(x) = T ∗ . (3.7)Now, one can verify that (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) (3.1)(3.7) = (cid:104) , (x − x ∗ ) T ∗ (cid:105) + (cid:104) ( ∇ x f (z ∗ )) T ∗ , (cid:105) = 0 . ∗ ∈ int F , then ∇ f (z ∗ ) = 0. Combining with (2.8), we obtain f (z) − f (z ∗ ) = (cid:107) z − z ∗ (cid:107) H + (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) > , ∀ z (cid:54) = z ∗ . Thus, z ∗ is the unique global optimal solution to the problem (1.9).If z ∗ / ∈ int F , then for any z ∈ F ∩ N (z ∗ , min { δ , δ } ), it follows from (2.8) that f (z) − f (z ∗ ) − (cid:107) z − z ∗ (cid:107) H = (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) + (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105)≥ . Therefore, z ∗ is the unique global minimizer of min { f (z) : z ∈ F ∩ N (z ∗ , min { δ , δ } ) } ,which means it is the unique local minimizer of the problem (1.9). The above lemmashows the optimality conditions of a point being a local minimizer. We further establishthe conditions for a global minimizer. To do that, we introduce a τ -stationary point. Apoint z ∗ := (x ∗ ; y ∗ ) is called a τ -stationary point of (1.9) with some τ > ∗ ∈ Π F (z ∗ − τ ∇ f (z ∗ )) . (3.8)An equivalent characterization is presented in the following lemma. Lemma 3.2

A point z ∗ is a τ -stationary point of the problem (1.9) with some τ > ifand only if it satisﬁes (cid:107) x ∗ (cid:107) ≤ s, τ ( ∇ x f (z ∗ )) i (cid:40) = 0 , i ∈ T ∗ , ∈ [ −(cid:107) x ∗ (cid:107) [ s ] , (cid:107) x ∗ (cid:107) [ s ] ] , i / ∈ T ∗ , (cid:107) y ∗ + (cid:107) ≤ k, τ ( ∇ y f (z ∗ )) i (cid:40) = 0 , i ∈ Γ ∗ , ∈ [ −(cid:107) y ∗ + (cid:107) [ k ] , , i / ∈ Γ ∗ . (3.9) Proof A τ -stationary point satisﬁes (3.8) which is equivalent to  x ∗ ∈ Π S (x ∗ − τ ∇ x f (z ∗ )) , y ∗ ∈ Π K (y ∗ − τ ∇ y f (z ∗ )) . (3.10)Therefore, we show the equivalence between (3.9) and (3.10). For the x ∗ part, this can beguaranteed by [1, Lemma 2.2]. For the y ∗ part, the projection (2.6) enables to show that(3.9) ⇒ (3.10). So we only prove (3.10) ⇒ (3.9). Let λ ∗ := ∇ y f (z ∗ ). It follows from (2.6)that y ∗ ∈ Π K (y ∗ − τ λ ∗ ) = (cid:40)(cid:34) y ∗ Γ − τ λ ∗ Γ (cid:35) : Γ ∈ Θ(y ∗ − τ λ ∗ ; k ) (cid:41) . This derives (cid:107) y ∗ + (cid:107) ≤ k , and for any Γ ∈ Θ(y ∗ − τ λ ∗ ; k ),y ∗ Γ = 0 , λ ∗ Γ = 0 , y ∗ − τ λ ∗ = (cid:2) y ∗ Γ ; − τ λ ∗ Γ (cid:3) , (3.11) 10hich together with the deﬁnition of Θ(y ∗ − τ λ ∗ ; k ) in (2.5) gives rise toΓ = Γ k ∪ Γ − = supp(y ∗ ) , Γ = (Γ + \ Γ k ) ∪ Γ , where Γ + , Γ − and Γ are deﬁned as (2.4) in which y is replaced by y ∗ − τ λ ∗ . On the indexset Γ, all elements y ∗ i − τ λ ∗ i = − τ λ ∗ i ≥

0, namely, λ ∗ i ≤ , i ∈ Γ.For (cid:107) y ∗ + (cid:107) < k , suppose there is an i ∈ Γ such that λ ∗ i <

0, then y ∗ − τ λ ∗ has at least (cid:107) y ∗ + (cid:107) + 1 ≤ k positive entries and thus (cid:107) y ∗ + (cid:107) = (cid:107) (Π K (y ∗ − τ λ ∗ )) + (cid:107) ≥ (cid:107) y ∗ + (cid:107) + 1 . This is a contradiction. So, λ ∗ Γ = 0, leading to λ ∗ = 0 by (3.11), which satisﬁes (3.9).For (cid:107) y ∗ + (cid:107) = k , (3.9) is satisﬁed for any j ∈ supp(y ∗ ) = Γ due to λ ∗ Γ = 0. For j / ∈ supp(y ∗ ), namely, j ∈ Γ, the deﬁnition of Γ k in (2.5) yields0 ≤ y ∗ j − τ λ ∗ j ≤ y ∗ i − τ λ ∗ i , ∀ i ∈ Γ k , which together with Γ k ⊆ Γ and (3.11) results in0 ≤ − τ λ ∗ j ≤ y ∗ i , ∀ i ∈ Γ k . Hence, −(cid:107) y ∗ + (cid:107) [ k ] = − min i ∈ Γ k y ∗ i ≤ τ λ ∗ j ≤ , ∀ j ∈ Γ( j / ∈ supp(y ∗ )), showing (3.9). The fol-lowing theorem reveals the relationships between τ -stationary points and global minimizersof the problem (1.9). Theorem 3.1

For (1.9) and a point z ∗ ∈ F , the following statements hold.a) For z ∗ ∈ int F , the point z ∗ is a global minimizer if and only if it is a τ -stationarypoint with τ > .b) For z ∗ / ∈ int F , a global minimizer z ∗ is a τ -stationary point with < τ ≤ / (2 λ max ) ,and conversely, a τ -stationary point with τ ≥ / (2 λ min ) is also a global minimizer. Proof a) If z ∗ ∈ int F is a global minimizer, it follows readily from Lemma 3.1 that ∇ f (z ∗ ) = 0. Thus, by deﬁnition, z ∗ is a τ -stationary point for any τ > ∗ ∈ int F is a τ -stationary point for some τ >

0, then (cid:107) x ∗ (cid:107) [ s ] = (cid:107) y ∗ + (cid:107) [ k ] =0, which further implies ∇ x f (z ∗ ) = 0 and ∇ y f (z ∗ ) = 0 by (3.9). Applying Lemma 3.1 again,one can conclude that z ∗ is a global minimizer.b) Let z ∗ be a global minimizer. If it is not a τ -stationary point with 0 < τ ≤ / (2 λ max ),then we have the following condition,z( (cid:54) = z ∗ ) ∈ Π F (z ∗ − τ ∇ f (z ∗ )) . (3.12) 11hus, (cid:107) z − (z ∗ − τ ∇ f (z ∗ )) (cid:107) < (cid:107) z ∗ − (z ∗ − τ ∇ f (z ∗ )) (cid:107) , which suﬃces to 2 τ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) < −(cid:107) z − z ∗ (cid:107) . Together with (2.8) and 0 < τ ≤ / (2 λ max ) derives f (z) − f (z ∗ ) ≤ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ max (cid:107) z − z ∗ (cid:107) < ( λ max − / (2 τ )) (cid:107) z − z ∗ (cid:107) ≤ . It contradicts the global optimality of z ∗ . Therefore, z ∗ is a τ -stationary point with 0 <τ ≤ / (2 λ max ).Conversely, let z ∗ be a τ -stationary point with τ ≥ / (2 λ min ). The conditions in (3.8)and the deﬁnition of Π F indicates (cid:107) z ∗ − (z ∗ − τ ∇ f (z ∗ )) (cid:107) ≤ (cid:107) z − (z ∗ − τ ∇ f (z ∗ )) (cid:107) , for any z ∈ F , delivering 2 τ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) ≥ −(cid:107) z − z ∗ (cid:107) . This and (2.8) yield f (z) − f (z ∗ ) ≥ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ min (cid:107) z − z ∗ (cid:107) ≥ ( λ min − / (2 τ )) (cid:107) z − z ∗ (cid:107) . Since τ ≥ / (2 λ min ), the above relation shows the global optimality of z ∗ to (1.9). A gradient projection method with a subspace pursuit strategy is proposed to handle theproblem (1.9) by seeking a τ -stationary point. For notational simplicity, hereafter, for aparameter τ ∈ (0 , (cid:96) ( τ ) = (cid:2) x (cid:96) ( τ ); y (cid:96) ( τ ) (cid:3) ∈ Π F (z (cid:96) − τ ∇ f (z (cid:96) )) , (4.1)for the (cid:96) th iteration z (cid:96) := (x (cid:96) ; y (cid:96) ). Analogous to the Γ-related indices deﬁned for y in (2.4),we also deﬁne Γ (cid:96) + := (cid:8) i ∈ [ m ] : y (cid:96)i > (cid:9) , (cid:101) Γ (cid:96) + := { i ∈ [ m ] : (y (cid:96) ( τ (cid:96) )) i > } , Γ (cid:96) := (cid:8) i ∈ [ m ] : y (cid:96)i = 0 (cid:9) , Γ (cid:96) − := (cid:8) i ∈ [ m ] : y (cid:96)i < (cid:9) . (cid:96) and x (cid:96) ( τ (cid:96) ) as follows T (cid:96) := supp(x (cid:96) ) , (cid:101) T (cid:96) := supp(x (cid:96) ( τ (cid:96) )) . (4.2)Given z (cid:96) = (x (cid:96) ; y (cid:96) ) ∈ F , deﬁne the following subspaceΩ(z (cid:96) ) := (cid:40) z = (x; y) : x T (cid:96) ∈ R | T (cid:96) | , x T (cid:96) = 0 , y Γ (cid:96) + ∈ R | Γ (cid:96) + | , y Γ (cid:96) = 0 , y Γ (cid:96) − ≤ (cid:41) . (4.3)It is easy to see that Ω(z (cid:96) ) ⊆ F . Based on these notation, we summarize the framework ofthe proposed method in Algorithm 1. Algorithm 1:

GPSP : Gradient projection subspace pursuitInitialize z ∈ F , tol = + ∞ , β ∈ (0 ,

1) and ρ, ε >

0. Set (cid:96) := 0. while tol (cid:96) > ε do Gradient descent: Find the smallest integer σ = 0 , , · · · such that f (z (cid:96) ( β σ )) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( β σ ) − z (cid:96) (cid:107) . (4.4)Set τ (cid:96) = β σ , u (cid:96) := z (cid:96) ( τ (cid:96) ) and z (cid:96) +1 = u (cid:96) .Subspace pursuit: if T (cid:96) = (cid:101) T (cid:96) and Γ (cid:96) + = (cid:101) Γ (cid:96) + then v (cid:96) = argmin { f (z) : z ∈ Ω(z (cid:96) ) } . (4.5)If f (v (cid:96) ) ≤ f (u (cid:96) ) − ρ (cid:107) v (cid:96) − u (cid:96) (cid:107) , then set z (cid:96) +1 = v (cid:96) . end Compute tol (cid:96) := (cid:107) u (cid:96) − z (cid:96) (cid:107) and set (cid:96) := (cid:96) + 1. end Output the solution x ∗ = x (cid:96) / (cid:107) x (cid:96) (cid:107) . Observing that the initial point z ∈ F , and Ω(z (cid:96) ) ⊆ F , we can see that all iterationsare feasible. Particularly, if the gap tol (cid:96) = (cid:107) z (cid:96) − u (cid:96) (cid:107) vanishes, thenz (cid:96) = u (cid:96) ∈ Π F (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) , which indicates that z (cid:96) is a τ -stationary point with τ ≤ τ (cid:96) . Additionally, once the conditions T (cid:96) = (cid:101) T (cid:96) and Γ (cid:96) + = (cid:101) Γ (cid:96) + are satisﬁed, we have u l ∈ Ω(z (cid:96) ). The unique minimizer v l of f overΩ(z (cid:96) ) implies that (cid:104)∇ f (v (cid:96) ) , u (cid:96) − v (cid:96) (cid:105) ≥ . (4.6) 13n virtue of (2.8), we have f (v (cid:96) ) ≤ f (u (cid:96) ) − λ min (cid:107) u (cid:96) − v (cid:96) (cid:107) . (4.7)Suppose 0 < ρ ≤ λ min . Then the candidate v (cid:96) will be taken, namely, z (cid:96) +1 = v (cid:96) . Remark 4.1

The main computation in Algorithm 1 is taken when updating u (cid:96) and v (cid:96) ateach iteration.i) To update u (cid:96) , we need to select one point from Π F (z (cid:96) − τ ∇ f (z (cid:96) )) . Namely, three quan-tities are computed: ¯z (cid:96) := (¯x (cid:96) ; ¯y (cid:96) ) := z (cid:96) − τ ∇ f (z (cid:96) ) , Π S (¯x (cid:96) ) and Π K (¯y (cid:96) ) . For the for-mer, the computational complexity is about O ( mn ) . To select one point from Π S (¯x (cid:96) ) ,we only pick the ﬁrst s largest (in absolute) elements of ¯x (cid:96) . This allows us to use aMATLAB built-in function maxk whose computational complexity is O ( n + s log s ) .Similarly, for Π K (¯y (cid:96) ) , the computational complexity is O ( m + k log k ) . Thus, updating u (cid:96) takes a computational complexity of order O ( σmn ) , where σ is the smallest integersatisfying (4.4).ii) To update v (cid:96) , one needs to solve a quadratic programming, v (cid:96) = argmin (x;y) (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) (4.8) s . t . x T (cid:96) = 0 , y Γ (cid:96) = 0 , y Γ (cid:96) − ≤ , for the ﬁxed T (cid:96) , Γ (cid:96) and Γ (cid:96) − . Any solvers for solving the quadratic programming canbe used to solve (4.8) to pursue a solution in good quality. To further reduce thecomputation cost, we drop the constraint y Γ (cid:96) − ≤ from (4.8), and simply solve thesystem of equations: x T (cid:96) = 0 , y Γ (cid:96) = 0 ,  A (cid:62) : T (cid:96) A : T (cid:96) + ηI A (cid:62) Γ (cid:96) T (cid:96) A Γ (cid:96) T (cid:96) I  (cid:34) x T (cid:96) y Γ (cid:96) (cid:35) = (cid:34) A (cid:62) : T (cid:96) (cid:15) (cid:15) (cid:35) . The solution ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) to the above equations can be derived by (cid:101) x (cid:96)T (cid:96) = (cid:104) A (cid:62) Γ (cid:96) T (cid:96) A Γ (cid:96) T (cid:96) + ηI (cid:105) − (cid:104) A (cid:62) Γ (cid:96) T (cid:96) (cid:15) (cid:105) , (cid:101) x (cid:96)T (cid:96) = 0 , (cid:101) y (cid:96) Γ (cid:96) = 0 , (cid:101) y (cid:96) Γ (cid:96) = (cid:15) − A Γ (cid:96) T (cid:96) (cid:101) x (cid:96)T (cid:96) . f (cid:101) y (cid:96) Γ (cid:96) − ≤ , namely, ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) is the solution to (4.8), then we set v (cid:96) = ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) . Oth-erwise, this point will not be taken into consideration, and we set z (cid:96) +1 = u (cid:96) . Thecomputational complexity of addressing the above equations is about O ( ms + s ) .Overall, the total computational complexity of each iteration is O ( σmn + ms + s ) . The ﬁrst result shows that the Armijo-type step size { τ (cid:96) } is well deﬁned. Lemma 4.1

For any < τ ≤ / (2 ρ + 2 λ max ) , it holds that f (z (cid:96) ( τ )) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) , (4.9) and thus inf (cid:96) ≥ { τ (cid:96) } ≥ τ > , where τ := min (cid:26) , β ρ + 2 λ max (cid:27) . (4.10) Proof

It follows from z (cid:96) ( τ ) ∈ Π F (z (cid:96) − τ ∇ f (z (cid:96) )) that (cid:107) z (cid:96) ( τ ) − (z (cid:96) − τ ∇ f (z (cid:96) )) (cid:107) ≤ (cid:107) z (cid:96) − (z (cid:96) − τ ∇ f (z (cid:96) )) (cid:107) , which results in 2 τ (cid:104)∇ f (z (cid:96) ) , z (cid:96) ( τ ) − z (cid:96) (cid:105) ≤ −(cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) . (4.11)Combining with (2.8) leads to f (z (cid:96) ( τ )) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , z (cid:96) ( τ ) − z (cid:96) (cid:105) + λ max (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) − (cid:16) / (2 τ ) − λ max (cid:17) (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) , where the last inequality is from 0 < τ ≤ / (2 ρ + 2 λ max ). Invoking the Armijo-type stepsize rule, one has τ (cid:96) ≥ β/ (2 ρ + 2 λ max ), which together with τ (cid:96) ≤ Lemma 4.2

Let { z (cid:96) } be the sequence generated by GPSP and τ be given by (4.10). Thenthe following results hold.a) The sequence { z (cid:96) } is bounded and lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = lim (cid:96) →∞ (cid:107) u (cid:96) − z (cid:96) (cid:107) = 0 . ) Any accumulating point of the sequence { z (cid:96) } is a τ -stationary point with < τ ≤ τ of the problem (1.9). Proof a) By Lemma 4.1 and u (cid:96) = z (cid:96) ( τ (cid:96) ), we have f (u (cid:96) ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) . (4.12)By the framework of Algorithm 1, if z (cid:96) +1 = u (cid:96) , then the above condition implies, f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) (4.13) = f (z (cid:96) ) − ρ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) . If z (cid:96) +1 = v (cid:96) , then we obtain f (z (cid:96) +1 ) = f (v (cid:96) ) ≤ f (u (cid:96) ) − ρ (cid:107) z (cid:96) +1 − u (cid:96) (cid:107) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) − ρ (cid:107) z (cid:96) +1 − u (cid:96) (cid:107) (4.14) ≤ f (z (cid:96) ) − ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , where the second and last inequalities used (4.12) and a fact (cid:107) a + b (cid:107) ≤ (cid:107) a (cid:107) + 2 (cid:107) b (cid:107) forall vectors a and b. Both cases lead to f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , (4.15) f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) . Therefore, { f (z (cid:96) ) } is a non-increasing sequence, and thusmax {(cid:107) A x (cid:96) − (cid:15) + y (cid:96) (cid:107) , η (cid:107) x (cid:96) (cid:107) } ≤ f (z (cid:96) ) ≤ f (z ) , which indicates the boundedness of { x (cid:96) } and { y (cid:96) } , and so that of { z (cid:96) } . The non-increasingproperty in (4.15) and f ≥ (cid:88) (cid:96) ≥ max { ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) }≤ (cid:88) (cid:96) ≥ (cid:2) f (z (cid:96) ) − f (z (cid:96) +1 ) (cid:3) = f (z ) − lim (cid:96) →∞ f (z (cid:96) +1 ) ≤ f (z ) . The above condition suﬃces to lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = lim (cid:96) →∞ (cid:107) u (cid:96) − z (cid:96) (cid:107) = 0 . b) Let z ∗ be any accumulating point of { z (cid:96) } . Then there exists a subset J of { , , , · · · } such that lim (cid:96) ( ∈ J ) →∞ z (cid:96) = z ∗ . This further implies lim (cid:96) ( ∈ J ) →∞ u (cid:96) = z ∗ by applying a). In16ddition, as stated in Lemma 4.1, we have { τ (cid:96) } ⊆ [ τ , L of J and a scalar τ ∗ ∈ [ τ ,

1] such that { τ (cid:96) : (cid:96) ∈ L } → τ ∗ . To summarize,we have lim (cid:96) ( ∈ L ) →∞ z (cid:96) = lim (cid:96) ( ∈ L ) →∞ u (cid:96) = z ∗ , lim (cid:96) ( ∈ L ) →∞ τ (cid:96) = τ ∗ ∈ [ τ , . (4.16)Let z (cid:96) := z (cid:96) − τ (cid:96) ∇ f (z (cid:96) ). The framework of Algorithm 1 impliesu (cid:96) ∈ Π F (z (cid:96) ) , lim (cid:96) ( ∈ L ) →∞ z (cid:96) = z ∗ − τ ∗ ∇ f (z ∗ ) =: z ∗ . (4.17)The ﬁrst condition means u (cid:96) ∈ F for any (cid:96) ≥

1. Note that F is closed and z ∗ is theaccumulating point of { u (cid:96) } by (4.16). Therefore, z ∗ ∈ F , which results inmin z ∈F (cid:107) z − z ∗ (cid:107) ≤ (cid:107) z ∗ − z ∗ (cid:107) . (4.18)If the strict inequality holds in the above condition, then there is an ε > (cid:107) z ∗ − z ∗ (cid:107) − ε = min z ∈F (cid:107) z − z ∗ (cid:107)≥ min z ∈F ( (cid:107) z − z (cid:96) (cid:107) − (cid:107) z (cid:96) − z ∗ (cid:107) )= (cid:107) u (cid:96) − z (cid:96) (cid:107) − (cid:107) z (cid:96) − z ∗ (cid:107) where the last equality is from (4.17). Taking the limit of both sides of the above conditionalong (cid:96) ( ∈ L ) → ∞ yields (cid:107) z ∗ − z ∗ (cid:107) − ε ≥ (cid:107) z ∗ − z ∗ (cid:107) by (4.16) and (4.17), a contradictionwith ε >

0. Therefore, we must have the equality holds in (4.18), showing thatz ∗ ∈ Π F (z ∗ ) = Π F (z ∗ − τ ∗ ∇ f (z ∗ )) . The above relation means the conditions in (3.9) hold for τ = τ ∗ , then these conditionsmust hold for any 0 < τ ≤ τ due to τ ≤ τ ∗ from (4.16), namely,z ∗ ∈ Π F (z ∗ − τ ∇ f (z ∗ )) , displaying that z ∗ is a τ -stationary point of (1.9), as desired. The above lemma allows usto conclude that the whole sequence converges. Theorem 4.1

Let { z (cid:96) } be the sequence generated by GPSP . Then the whole sequence con-verges to z ∗ , which is necessarily the unique global minimizer of (1.9) if z ∗ ∈ int F and theunique local minimizer otherwise. roof As shown in Lemma 4.2, one can ﬁnd a subsequence of { z (cid:96) } that converges tothe τ -stationary point z ∗ with 0 < τ ≤ τ of (1.9). Recall that a τ -stationary point z ∗ thatsatisﬁes (3.9) also meets (3.1), which by Lemma 3.1 indicates that z ∗ is the unique globalminimizer if z ∗ ∈ int F and the unique local minimizer otherwise. In other words, z ∗ isan isolated local minimizer of (1.9). Finally, it follows from z ∗ being isolated, [23, Lemma4.10] and lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = 0 by Lemma 4.2 that the whole sequence converges to z ∗ .The following theorem establishes the convergence rate. One can see that GPSP eitherenjoys a Q -linear convergence rate or terminates at the limit of the sequence after a certainpoint. Theorem 4.2

Let { z (cid:96) } be the sequence generated by GPSP and z ∗ be the limit.i) For suﬃciently large (cid:96) , it follows (cid:107) u (cid:96) − z ∗ (cid:107) ≤ − τ λ min τ λ min (cid:107) z (cid:96) − z ∗ (cid:107) . (4.19) ii) The sequence either has inﬁnitely many and suﬃciently large (cid:96) satisfying (cid:107) z (cid:96) +1 − z ∗ (cid:107) ≤ − τ λ min τ λ min (cid:107) z (cid:96) − z ∗ (cid:107) , (4.20) or remains identical to z ∗ after a certain point, say ˆ (cid:96) ≥ , namely, z (cid:96) = v (cid:96) = z ∗ , ∀ (cid:96) > ˆ (cid:96). (4.21) iii) Let ρ be chosen as < ρ ≤ λ min . If the limit z ∗ satisﬁes (cid:107) x ∗ (cid:107) = s and (cid:107) y ∗ + (cid:107) = k ,then GPSP will terminate at the limit z ∗ within ﬁnite steps. Proof i) Theorem 4.1 states that the whole sequence { z (cid:96) } converges to z ∗ . So does { u (cid:96) } by Lemma 4.2. Then it is easy to show that, for suﬃciently large (cid:96) ,supp(z ∗ ) ⊆ supp(u (cid:96) ) . In addition, it follows from u (cid:96) := z (cid:96) ( τ (cid:96) ) = (x (cid:96) ( τ (cid:96) ); y (cid:96) ( τ (cid:96) )) ∈ Π F (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) thatx (cid:96) ( τ (cid:96) ) ∈ Π S (x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) )) , y (cid:96) ( τ (cid:96) ) ∈ Π K (y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) )) , which by (2.3) and (2.6) results inx (cid:96) ( τ (cid:96) ) = ((x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) )) T ; 0) , y (cid:96) ( τ (cid:96) ) = ((y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) )) Γ ; 0)(4.22) 18here T ∈ Σ(x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) ); s ) and Γ ∈ Θ(y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) ); k ). Therefore,supp(z ∗ ) ⊆ supp(u (cid:96) ) = supp((x (cid:96) ( τ (cid:96) ); y (cid:96) ( τ (cid:96) ))) ⊆ T ∪ ( n + Γ) := W, (4.23)for suﬃciently large (cid:96) , where n + Γ := { n + i : i ∈ Γ } , which leads toz ∗ W = u (cid:96)W = 0 , u (cid:96)W = (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W , (4.24)where the last equality is from (4.22). Those conditions contribute to (cid:104) z ∗ − u (cid:96) , u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:105) = (cid:104) z ∗ W − u (cid:96)W , u (cid:96)W − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W (cid:105) + (cid:104) z ∗ W − u (cid:96)W , u (cid:96)W − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W (cid:105) = 0 . (4.25)Using the above fact, we obtain (cid:107) z ∗ − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) = (cid:107) z ∗ − u (cid:96) + u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) = (cid:107) z ∗ − u (cid:96) (cid:107) + (cid:107) u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) , which after the simple manipulating results in (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:107) u (cid:96) − z (cid:96) (cid:107) (4.26) = (cid:104)∇ f (z (cid:96) ) , z ∗ − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) . By Lemma 4.1, the Armijo-type step size rule indicates τ ≤ τ (cid:96) ≤ / (2 ρ + 2 λ max ) < / (2 λ max ). Therefore, it follows from (2.8) that f (u (cid:96) ) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + λ max (cid:107) u (cid:96) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:107) u (cid:96) − z (cid:96) (cid:107) = f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , z ∗ − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) (2.8) ≤ f (z ∗ ) − λ min (cid:107) z ∗ − z (cid:96) (cid:107) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) . (4.27)Recall that a τ -stationary point z ∗ that satisﬁes (3.9) also meets (3.1). By Lemma 3.1,there exits a δ ∗ > ∈ F ∩ N (z ∗ , δ ∗ ). Thus, the fact u (cid:96) → z ∗ indicates that for suﬃciently large (cid:96) , we have u l ∈ F ∩ N (z ∗ , δ ∗ ) and hence f (u (cid:96) ) ≥ f (z ∗ ) + (cid:107) z ∗ − u (cid:96) (cid:107) H ≥ f (z ∗ ) + λ min (cid:107) z ∗ − u (cid:96) (cid:107) . τ (cid:96) λ min (cid:107) z ∗ − u (cid:96) (cid:107) ≤ − τ (cid:96) λ min (cid:107) z ∗ − z (cid:96) (cid:107) + (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) . Note that τ (cid:96) ≥ τ . The desired assertion then follows immediately from(1 + 2 τ λ min ) (cid:107) z ∗ − u (cid:96) (cid:107) ≤ (1 − τ λ min ) (cid:107) z ∗ − z (cid:96) (cid:107) . ii) If there are inﬁnitely many (cid:96) such that z (cid:96) +1 = u (cid:96) , then (4.20) for suﬃciently large (cid:96) can be derived by (4.19) immediately. Otherwise, there is an ˜ (cid:96) ≥ (cid:96) +1 = v (cid:96) forany (cid:96) ≥ ˜ (cid:96) . The updating (4.5) of v (cid:96) indicates T (cid:96) +1 ⊆ T (cid:96) , Γ (cid:96) +1+ ⊆ Γ (cid:96) + , Γ (cid:96) +10 ⊇ Γ (cid:96) , Γ (cid:96) +1 − ⊆ Γ (cid:96) − . Note that these sets have ﬁnite elements. Therefore, the sequences { T (cid:96) } , { Γ (cid:96) + } , { Γ (cid:96) } , { Γ (cid:96) − } converge. In other words, there is an ˆ (cid:96) ≥ ˜ (cid:96) such that, for any (cid:96) ≥ ˆ (cid:96) , it holds T (cid:96) +1 = T (cid:96) , Γ (cid:96) +1+ = Γ (cid:96) + , Γ (cid:96) +10 = Γ (cid:96) , Γ (cid:96) +1 − = Γ (cid:96) − , (4.28)which implies Ω(z (cid:96) +1 ) = Ω(z (cid:96) ). Then it follows from (4.5) thatz (cid:96) +2 = v (cid:96) +1 = argmin { f (z) : z ∈ Ω(z (cid:96) +1 ) } = argmin { f (z) : z ∈ Ω(z (cid:96) ) } = v (cid:96) = z (cid:96) +1 . Overall, for any (cid:96) > ˆ (cid:96) , we have z (cid:96) = v ˆ (cid:96) . Recall Theorem 4.1 that the whole sequence { z (cid:96) } converges to z ∗ , which suﬃces to z ∗ = lim (cid:96) →∞ z (cid:96) = v ˆ (cid:96) . iii) Note that both z (cid:96) ( ∈ F ) → z ∗ and u (cid:96) ( ∈ F ) → z ∗ . We must have T (cid:96) = (cid:101) T (cid:96) andΓ (cid:96) + = (cid:101) Γ (cid:96) + for suﬃciently large (cid:96) if (cid:107) x ∗ (cid:107) = s and (cid:107) y ∗ + (cid:107) = k . Therefore, the framework ofof Algorithm 1 allows us to assert that z (cid:96) +1 = v (cid:96) for all suﬃciently large (cid:96) due to f (v (cid:96) ) (4.7) ≤ f (u (cid:96) ) − λ min (cid:107) u (cid:96) − v (cid:96) (cid:107) ≤ f (u (cid:96) ) − ρ (cid:107) u (cid:96) − v (cid:96) (cid:107) , where the second inequality is from 0 < ρ ≤ λ min . Then similar reasoning to prove ii) canclaim the conclusion immediately. In this section, we will conduct extensive numerical experiments to showcase the perfor-mance of our proposed

GPSP , by using MATLAB (R2019a) on a laptop of 32GB memoryand Inter(R) Core(TM) i9-9880H 2.3Ghz CPU.20 .1 Testing examples

Examples with the data generated from the Gaussian distributions are taken into account.

Example 5.1 (Independent covariance [34, 8])

Entries of

Φ := [ φ , · · · , φ m ] (cid:62) ∈ R m × n and the nonzero entries of the ground-truth s ∗ -sparse vector x ∗ ∈ R n (i.e., (cid:107) x ∗ (cid:107) ≤ s ∗ ) aregenerated from the independent and identically distributed (i.i.d.) samples of the standardGaussian distribution N (0 , . To avoid tiny non-zero entries of x ∗ , let x ∗ i = x ∗ i + sign( x ∗ i ) for non-zero x ∗ i , followed by normalizing x ∗ to be a unit vector. Let c ∗ = sgn(Φx ∗ ) and ˜c = sgn(Φx ∗ + ε ) , where entries of the noise ε are the i.i.d. samples of N (0 , . ) . Finally,we randomly select (cid:100) rm (cid:101) entries in ˜c and ﬂip their signs, and the ﬂipped vector is denotedby c , where r is the ﬂipping ratio. Example 5.2 (Correlated covariance [15])

Rows of Φ are generated from the i.i.d.samples of N (0 , Σ) with Σ ij = v | i − j | , i, j ∈ [ n ] , where v ∈ (0 , . Then x ∗ , c ∗ and c aregenerated the same as those in Example 5.1. To demonstrate the performance of one method, apart from the CPU time , we will alsoreport the signal-to-noise ratio (

SNR ), the hamming error ( HE ) and the hamming distance( HD ). They are deﬁned by SNR := 10log (cid:107) x − x ∗ (cid:107) − , HD := (1 /m ) (cid:107) sgn(Φx) − c (cid:107) , HE := (1 /m ) (cid:107) sgn(Φx) − c ∗ (cid:107) , where x is the solution generated by one method. Apparently, the larger SNR (or the smaller HE or HD ) means the better recovery. Set the accuracy parameter ε = 10 − in Algorithm 1. GPSP will terminate if (cid:107) z (cid:96) − u (cid:96) (cid:107) ≤ ε or (cid:96) > β = 0 . ρ = 10 − and z = 0. The parameters (cid:15) , η , s and k in (1.9) are tuned as follows. (I) Selection of η . For Example 5.1, ﬁx n = 500, (cid:15) = 0 . , s = 5 and k = (cid:100) . m (cid:101) but vary m ∈ { . , . , . , } n and η ∈ { − , − , · · · , } . Average results over 500trials are reported in Figure 1. It can be evidently seen that results are stabilized when η ≤

10 while getting worse when η >

10 is rising. Similar trends are also observed for

GPSP ,

10] can be used to set η . In fact, GPSP alsoperforms the similar results when η = 0. For simplicity, we ﬁx η = 10 − . -5 SNR m=0.25nm=0.50nm=0.75nm=1.00n -5 HD m=0.25nm=0.50nm=0.75nm=1.00n -5 HE m=0.25nm=0.50nm=0.75nm=1.00n Figure 1: Eﬀect to η for Example 5.1. (II) Selection of (cid:15) . We now ﬁx n = 500, η = 10 − , s = 5 and k = (cid:100) . m (cid:101) but alter m ∈ { . , . , . , } n and (cid:15) ∈ [10 − ,

10] for

GPSP solving Example 5.1. Asshown in Figure 2, the average results over 500 trials do not ﬂuctuate signiﬁcantly amongthe changing of (cid:15) , which indicates

GPSP is quite robust to the choices of (cid:15) in the range[10 − , GPSP solving Example 5.2. Forsimplicity, we ﬁx (cid:15) = 0 .

01 in the subsequent numerical experiments. -4 -2 SNR m=0.25nm=0.50nm=0.75nm=1.00n -4 -2 HD m=0.25nm=0.50nm=0.75nm=1.00n -4 -2 HE m=0.25nm=0.50nm=0.75nm=1.00n Figure 2: Eﬀect to (cid:15) for Example 5.1. (III) Selection of s . For the sparsity level s , it clearly has a heavy inﬂuence ofthe recovery quality. As shown in Figure 3, the ground-truth signal x ∗ has s ∗ = 5 non-zero components with their indices denoted by T ∗ = supp(x ∗ ). Apparently, GPSP gets themost accurate signal if we set s = s ∗ because it almost exactly recovers those non-zero22omponents. For s = s ∗ − T ∗ . While for s = s ∗ + 2 or s = s ∗ + 4, GPSP generates a solution x whose support set supp(x) covers T ∗ with extra incorrect indices.However, compared with the magnitude | x i | , i ∈ T ∗ , those redundant non-zero components | x i | , i ∈ supp(x) \ T ∗ are pretty small. If we remove those small parts and normalize thesignal to have a unit length, then the new signal is much closer to x ∗ . For simplicity, weset s = s ∗ in the sequel. -0.500.5 SNR = 7.469, s = s * - 2

100 200 300 400 500

Ground-TruthRecovered -0.500.5

SNR = 21.51, s = s * + 0

100 200 300 400 500

Ground-TruthRecovered -0.500.5

SNR = 15.26, s = s * + 2

100 200 300 400 500

Ground-TruthRecovered -0.500.5

SNR = 12.96, s = s * + 4

100 200 300 400 500

Ground-TruthRecovered

Figure 3: Eﬀect to s for Example 5.1 with ( n, m, s ∗ , r ) = (500 , , , . SNR k r=0.025r=0.050 r=0.075r=0.100 HD k r=0.100r=0.075 r=0.050r=0.025 Time k r=0.025r=0.050 r=0.075r=0.100 Figure 4: Eﬀect to k for Example 5.1. (IV) Selection of k . Note that k is the upper bound of the number of sign ﬂipsof Φx, and is usually unknown beforehand. However, the model (1.9) does not requirean exact k . One could either ﬁx it by a small integer (e.g. k = (cid:100) . m (cid:101) ) or start with23 slightly bigger value and reduce it iteratively. We tested GPSP for solving Example 5.1and Example 5.2 under both schemes and corresponding numerical performance does nothave a big diﬀerence. For instance, as indicated in Figure 4, where n = 500 , m = 250, η =10 − , s = 5 and (cid:15) = 10 − , we select k ∈ { , , · · · , } and then ﬁx it for GPSP . Evidently,for each case of the ﬂipping ratio r , the results SNR and HD do not vary signiﬁcantly alongwith k altering. On the other hand, it seems that the CPU time gets increasing when k is rising, which indicates a smaller value of k would be better. Hence, in our numericalexperiments, we pick k = (cid:100) . m (cid:101) . Four state-of-the-art solvers are selected for comparisons, including

PDASC [15],

BIHT [17],

AOP [34] and

PBAOP [16]. Like our method, the last three ones need to specify the truesparsity level s ∗ . In addition, the last two solvers also require the number of the signﬂips k . To make the comparisons more fairly, we adopt a choice from [34], and set k = (cid:107) sgn(Φx) − c (cid:107) , where x is the solution generated by BIHT . The other parameters for eachmethod are chosen as defaults. Finally, all methods are initialized by x = 0, and theirﬁnal solutions are normalized to have a unit length. We now apply the ﬁve methods into solving two examples under diﬀerent scenarios. Foreach scenario, we report average results over 500 instances if n ≤ n, m, s ∗ , r, v ), where v onlymakes sense for Example 5.2. In the following numerical comparisons, we shall see theeﬀect of each solver to these factors by altering one factor while ﬁxing the others. (a) Eﬀect to s ∗ . We ﬁrst see the recovery performance of each method for solvingExample 5.2. In Figure 5, the blue circles represent the ground-truth signals and red starsstand for the recovered signals obtained by ﬁve methods. When s ∗ = 3, all methods succeedin recovering the true signal, while for the other two cases s ∗ = 4 and s ∗ = 5, the signalsobtained by GPSP have better qualities than those produced by the other methods.To proceed more, we employ ﬁve method to solve Example 5.1 and increase s ∗ from 2to 10 with ﬁxing ( n, m, r, v ) = (500 , , . , . GPSP gets the highest

SNR , the smallest HD and HE for each s ∗ . The lines of SNR display declining trends, which means the signal is getting harder to recover when it has24ore non-zero components, namely, the sparsity level s ∗ is getting bigger. Apparently, AOP and

PBAOP deliver the similar results. Somehow, the comparison is not fair for

PDASC sinceit does not require the true sparsity level.

BIHT:SNR=19.5231

100 200 300 400

BIHT:SNR=0.93889

100 200 300 400

BIHT:SNR=5.5138

100 200 300 400

AOP:SNR=15.6465

100 200 300 400

AOP:SNR=5.8091

100 200 300 400

AOP:SNR=6.4328

100 200 300 400

PBAOP:SNR=15.4614

100 200 300 400

PBAOP:SNR=5.8091

100 200 300 400

PBAOP:SNR=6.9888

100 200 300 400

PDASC:SNR=21.0222

100 200 300 400

PDASC:SNR=8.5556

100 200 300 400

PDASC:SNR=8.8385

100 200 300 400 GP SP :SNR=25.6915

100 200 300 400 (a) s ∗ = 3 GP SP :SNR=19.7302

100 200 300 400 (b) s ∗ = 4 GP SP :SNR=15.7633

100 200 300 400 (c) s ∗ = 5 Figure 5: Eﬀect to s ∗ for Example 5.2 with ( n, m, r, v ) = (500 , , . , . s * SNR

BIHTAOPPBAOPPDASCGP SP s * HD s * HE Figure 6: Eﬀect to s ∗ for Example 5.1.25 b) Eﬀect to m . To see the eﬀect of each method on the sample size m , we selectit from the range { . , . , · · · , . } n and ﬁx ( n, s ∗ , r, v ) = (500 , , . , . GPSP outperforms the others for solving Example 5.2 since it delivers muchhigher

SNR and lower HD and HE . It is evidently seen that all methods are behaving betteralong with the rising of the sample size m , this is because the signal is getting easier torecover when more samples are available. m/n SNR

BIHTAOPPBAOPPDASCGP SP m/n HD m/n HE Figure 7: Eﬀect to m for Example 5.2. (c) Eﬀect to r . To see the eﬀect of each method to the ﬂipping ratio r , we alter itfrom { , . , . } but ﬁx ( n, m, s ∗ , v ) = (500 , , , . r is, the worse performance of each method, because morecorrect signs are ﬂipped. This can be testiﬁed by SNR (resp. HD and HE ) whose median/meanobtained by each method is declining (resp. rising) when r ascends. Once again, GPSP behaves the best because it delivers the highest median of

SNR and the lowest median of HD and HE in each box. Similar results can be observed for Example 5.1 and are omitted here. (d) Eﬀect to v . Note that in Example 5.2, the larger v , the more correlated betweeneach pair of samples (i.e., rows in Φ), which might lead to more diﬃculty to recover thesignal. To see this, we alter v from { . , . , · · · , . } but ﬁx ( n, m, s ∗ , r ) = (500 , , , . v is, the more diﬃcultthe recovery will be. It can be seen that GPSP is quite robust to v between 0 . . GPSP stay steadily when v ∈ [0 . , . v changes, GPSP always performs the best results among those ﬁve methods.26 S NR r=0 r=0.05 r=0.1 HD A1 A2 A3 A4 A500.10.20.3 H E A1 A2 A3 A4 A500.10.20.30.4 A1 A2 A3 A4 A500.10.20.30.4

Figure 8: Eﬀect to r for Example 5.2, where A1-A5 stand for BIHT , AOP , PBAOP , PDASC , GPSP , respectively. v SNR v HD BIHTAOPPBAOPPDASCGP SP v HE Figure 9: Eﬀect to v for Example 5.2.27 e) Eﬀect to n . To see the computational speed of each method, we consider some big-ger values of n from { , , , } with ﬁxing ( m, s ∗ , r, v ) = ( n/ , n/ , . , . GPSP achieves the best time eﬃ-ciency, and the highest recovery accuracy in terms of the highest

SNR , the lowest HD and HE , against the other methods.Table 1: Eﬀect to the bigger values of n .Example 5.1 Example 5.2 n BIHT AOP PBAOP PDASC GPSP BIHT AOP PBAOP PDASC GPSPSNR SNR

HD HD

HE HE

Time (in seconds)

Time (in seconds)5000 0.554 3.668 1.463 0.367 0.292 0.545 3.520 1.452 0.354 0.30710000 4.730 14.352 6.931 1.485 1.030 4.635 14.30 6.944 1.538 0.81015000 10.55 32.30 15.93 3.313 1.647 10.35 31.54 15.62 3.144 1.65220000 20.02 56.68 28.35 5.200 2.592 19.42 55.76 27.99 5.273 2.633

In this paper we have proposed a nonconvex optimization problem (1.9) to process theone-bitCS, in which the double-sparsity constrains the sparsity of the signal and the num-28er of sign ﬂips. To conquer the hardness resulting from the nonconvex and discrete con-straints, we have established necessary and suﬃcient optimality conditions via the so-called τ -stationarity. These optimality conditions have facilitated the design of a gradient projec-tion subspace pursuit method GPSP , which has been shown to admit global convergence andhighly eﬃcient numerical performance both in computation time and the recovery accuracyof signals.

References [1] A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimalityconditions and algorithms.

SIAM Journal on Optimization , 23(3):1480–1509, 2013.[2] P. T. Boufounos. Greedy sparse signal reconstruction from sign measurements. In , pages 1305–1309. IEEE, 2009.[3] P. T. Boufounos and R. G. Baraniuk. 1-bit compressive sensing. In , pages 16–21. IEEE, 2008.[4] X. Cai, Z. Zhang, H. Zhang, and C. Li. Soft consistency reconstruction: a robust 1-bitcompressive sensing algorithm. In , pages 4530–4535. IEEE, 2014.[5] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signalreconstruction from highly incomplete frequency information.

IEEE Transactions onInformation Theory , 52(2):489–509, 2006.[6] E. J. Candes and T. Tao. Decoding by linear programming.

IEEE Transactions onInformation Theory , 51(12):4203–4215, 2005.[7] C.-H. Chen and J.-Y. Wu. Amplitude-aided 1-bit compressive sensing over noisywireless sensor networks.

IEEE Wireless Communications Letters , 4(5):473–476, 2015.[8] D.-Q. Dai, L. Shen, Y. Xu, and N. Zhang. Noisy 1-bit compressive sensing: modelsand algorithms.

Applied and Computational Harmonic Analysis , 40(1):1–32, 2016.[9] X. Dong and Y. Zhang. A map approach for 1-bit compressive sensing in syntheticaperture radar imaging.

IEEE Geoscience and Remote Sensing Letters , 12(6):1237–1241, 2015. 2910] D. L. Donoho. Compressed sensing.

IEEE Transactions on Information Theory ,52(4):1289–1306, 2006.[11] C. Feng, S. Valaee, and Z. Tan. Multiple target localization using compressive sensing.In

GLOBECOM 2009-2009 IEEE Global Telecommunications Conference , pages 1–6.IEEE, 2009.[12] N. Fu, L. Yang, and J. Zhang. Sub-nyquist 1 bit sampling system for sparse multibandsignals. In , pages 736–740. IEEE, 2014.[13] X. Fu, F.-M. Han, and H. Zou. Robust 1-bit compressive sensing against sign ﬂips. In , pages 3121–3125. IEEE, 2014.[14] J. Haboba, M. Mangia, R. Rovatti, and G. Setti. An architecture for 1-bit localizedcompressive sensing with applications to EEG. In , pages 137–140. IEEE, 2011.[15] J. Huang, Y. Jiao, X. Lu, and L. Zhu. Robust decoding from 1-bit compressivesampling with ordinary and regularized least squares.

SIAM Journal on ScientiﬁcComputing , 40(4):A2062–A2086, 2018.[16] X. Huang, L. Shi, M. Yan, and J. A. Suykens. Pinball loss minimization for one-bitcompressive sensing: Convex models and algorithms.

Neurocomputing , 314:275–283,2018.[17] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk. Robust 1-bit compres-sive sensing via binary stable embeddings of sparse vectors.

IEEE Transactions onInformation Theory , 59(4):2082–2102, 2013.[18] J. N. Laska, Z. Wen, W. Yin, and R. G. Baraniuk. Trust, but verify: Fast and accuratesignal recovery from 1-bit compressive measurements.

IEEE Transactions on SignalProcessing , 59(11):5289–5301, 2011.[19] D. Lee, T. Sasaki, T. Yamada, K. Akabane, Y. Yamaguchi, and K. Uehara. Spectrumsensing for networked system using 1-bit compressed sensing with partial randomcirculant measurement matrices. In , pages 1–5. IEEE, 2012. 3020] F. Li, J. Fang, H. Li, and L. Huang. Robust one-bit bayesian compressed sensing withsign-ﬂip errors.

IEEE Signal Processing Letters , 22(7):857–861, 2014.[21] Z. Li, W. Xu, X. Zhang, and J. Lin. A survey on one-bit compressed sensing: Theoryand applications.

Frontiers of Computer Science , 12(2):217–230, 2018.[22] J. Meng, H. Li, and Z. Han. Sparse event detection in wireless sensor networks usingcompressive sensing. In , pages 181–185. IEEE, 2009.[23] J. J. Mor´e and D. C. Sorensen. Computing a trust region step.

SIAM Journal onScientiﬁc and Statistical Computing , 4(3):553–572, 1983.[24] A. Movahed, A. Panahi, and G. Durisi. A robust rfpi-based 1-bit compressive sensingreconstruction algorithm. In , pages 567–571. IEEE, 2012.[25] A. Movahed, A. Panahi, and M. C. Reed. Recovering signals with variable sparsitylevels from the noisy 1-bit compressive measurements. In , pages 6454–6458.IEEE, 2014.[26] A. Movahed and M. C. Reed. Iterative detection for compressive sensing: Turbo CS.In , pages 4518–4523.IEEE, 2014.[27] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete andinaccurate samples.

Applied and Computational Harmonic Analysis , 26(3):301–321,2009.[28] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparse logistic regres-sion: A convex programming approach.

IEEE Transactions on Information Theory ,59(1):482–494, 2012.[29] L. Rencker, F. Bach, W. Wang, and M. D. Plumbley. Sparse recovery and dictionarylearning from nonlinear compressive measurements.

IEEE Transactions on SignalProcessing , 67(21):5659–5670, 2019. 3130] W. Tang, W. Xu, X. Zhang, and J. Lin. A low-cost channel feedback scheme in mmwavemassive mimo system. In , pages 89–93. IEEE, 2017.[31] H. Wang, X. Huang, Y. Liu, H. Sabine Van, and W. Qun. Binary reweighted l1-normminimization for one-bit compressed sensing. In

In Procedding of 8th InternationalJoint Conference on Biomedical Engineering Systems and Technologies , 2015.[32] P. Xiao, B. Liao, and J. Li. One-bit compressive sensing via schur-concave functionminimization.

IEEE Transactions on Signal Processing , 67(16):4139–4151, 2019.[33] J. Xiong and Q. Tang. 1-bit compressive data gathering for wireless sensor networks.

Journal of Sensors , 2014, 2014.[34] M. Yan, Y. Yang, and S. Osher. Robust 1-bit compressive sensing using adaptiveoutlier pursuit.

IEEE Transactions on Signal Processing , 60(7):3868–3875, 2012.[35] Z. Zhou, X. Chen, D. Guo, and M. L. Honig. Sparse channel estimation for massiveMIMO with 1-bit feedback per dimension. In2017 IEEE Wireless Communicationsand Networking Conference (WCNC)