Computing One-bit Compressive Sensing via Double-Sparsity Constrained Optimization
CComputing One-bit Compressive Sensing viaDouble-Sparsity Constrained Optimization
Shenglong Zhou, [email protected] of Mathematics, University of Southampton, UKZiyan Luo, [email protected] Xiu, [email protected] of Applied Mathematics, Beijing Jiaotong University, China
Abstract
One-bit compressive sensing is popular in signal processing and communications dueto the advantage of its low storage costs and hardware complexity. However, it hasbeen a challenging task all along since only the one-bit (the sign) information isavailable to recover the signal. In this paper, we appropriately formulate the one-bitcompressed sensing by a double-sparsity constrained optimization problem. The first-order optimality conditions via the newly introduced τ -stationarity for this nonconvexand discontinuous problem are established, based on which, a gradient projectionsubspace pursuit ( GPSP ) approach with global convergence and fast convergence rateis proposed. Numerical experiments against other leading solvers illustrate the highefficiency of our proposed algorithm in terms of the computation time and the qualityof the signal recovery as well.
Keywords:
One-bit compressive sensing, double-sparsity constrained optimization,optimality conditions, gradient projection subspace pursuit, global convergence
Mathematical Subject Classification: · · · Compressive sensing (CS) has seen evolutionary advances in theory and algorithmsin the past few decades since it was introduced in the ground-breaking papers [6, 5, 10].It aims to reconstruct a sparse signal x from an underdetermined linear systems Φx = b,where Φ ∈ R m × n is the measurement matrix and b ∈ R m is the measurement observation.To reduce storage costs and hardware complexity, Boufounos and Baraniuk [3] benefited1 a r X i v : . [ m a t h . O C ] J a n rom the one-bit quantization case of CS, where only the sign information of measurementsare preserved, that is c = sgn(Φx) . Here, sgn( t ) returns one if t is positive and negative one otherwise, and thus c i ∈ { , − } , i ∈ [ m ] := { , , · · · , m } is the one-bit measurement. This gives rise to the one-bit CS. It wasthen extensively applied into applications including communications [26, 30, 35], wirelesssensor network [22, 11, 33, 7], cognitive radio [19, 12], and to name a few [14, 9, 21]. The task of one-bit CS constructs the sparse signal from the one-bit measurements. Theideal optimization model is the following (cid:96) -norm minimization,min x ∈ R n (cid:107) x (cid:107) , s.t. c = sgn(Φx) , (1.1)where (cid:107) x (cid:107) is the (cid:96) -norm of x, counting the number of its non-zero entries. An impressivebody of work has developed numerical algorithms for solving the above problem, but mostof them placed the interest in its approximations due to the NP-hardness. The earliest onecan be traced back to [3] where (1.1) was relaxed bymin x ∈ R n (cid:107) x (cid:107) , s.t. A x ≥ , (cid:107) x (cid:107) = 1 . (1.2)Here, A := Diag(c)Φ , Diag(c) represents the diagonal matrix with diagonal entries from c, (cid:107)·(cid:107) is the (cid:96) -norm, and (cid:107)·(cid:107) is the Euclidean norm. A popular approach of dealing with theinequality constraints in (1.2) is using penalization, leading to the following optimizationmin x ∈ R n (cid:107) x (cid:107) + λϕ (x) , s.t. (cid:107) x (cid:107) = 1 , (1.3)where λ > ϕ : R n → R is a loss function. In [3], they adopted the one-sided (cid:96) function ϕ (x) := (cid:107) ( − A x) + (cid:107) with y + := (max { y , } , · · · , { y m , } ) (cid:62) and employed arenormalized fixed point iteration algorithm. Since the targeted problem is a nonconvexoptimization, the convergence result has not been provided. The same problem was alsoaddressed by a restricted step shrinkage algorithm [18], where the generated sequence wasproved to converge to a stationary point of the penalty problem if some slightly strongassumptions on the sequence were satisfied.Following the work in [3], Boufounos modified CoSaMP [27], one of the most populargreedy methods in CS, to derive the matching sign pursuit method [2]. It turned out to2ddress the sparsity constrained model,min x ∈ R n ϕ (x) , s.t. (cid:107) x (cid:107) = s, (cid:107) x (cid:107) = 1 , (1.4)where s (cid:28) n is a given sparsity level, and ϕ is the one-sided (cid:96) function. Based on theframework of the famous iterative hard thresholding algorithm, the modified version BITHwas then developed in [17] to solve the problem (1.4). Apart from the one-sided (cid:96) function,BITH was also able to process the one-sided (cid:96) function, namely, ϕ (x) = (cid:107) ( − A x) + (cid:107) . Itwas claimed that with a high probability, the distance between a reconstructed signal byBIHT and the original one can be bounded by a prefixed accuracy if the former quantizesto the same quantization point as the latter. As a consequence, the method enjoys to alocal convergence property. The latest work on the problem (1.4) consists of the robustbinary iterative hard thresholding [13], the soft consistency reconstructions [4], the binaryiterative re-weighted method [31] and the pinball loss iterative hard thresholding [16].In [15], authors took advantage of the (cid:96) -regularized least squares to deal with theone-bit CS regardless of the sign information of Φx, namely,min x ∈ R n (cid:107) x (cid:107) + λ (cid:107) c − Φx (cid:107) . (1.5)As stated, with a high probability, the distance between the solution to the model up toa constant and a sparse solution can be bounded by a prefixed accuracy if the sample size m is greater than a threshold. Then a primal dual active set algorithm was proposed tosolve the above model and proved to converge within one step under two assumptions:the columns of the matrix Φ indexed on the non-zero components of the sparse solutionis full rank and the initial point is sufficiently close to the sparse solution. Therefore, thegenerated sequence again has a local convergence property.When the number of sign flips k ( (cid:28) m ) is provided, authors in [34] integrated a sparsevariable w to the problem (1.4). The non-zero components in w represent the measurementsthat have sign flips. The resulting optimization problem ismin x ∈ R n , w ∈ R m (cid:107) ( − Diag(c)(Φx + w)) + (cid:107) pp , (1.6) s.t. (cid:107) x (cid:107) ≤ s, (cid:107) x (cid:107) = 1 , (cid:107) w (cid:107) ≤ k, where p = 1 or 2. To tackle the above problem, an alternating minimization method(adaptive outliers pursuit, AOP) was cast: solving one variable while fixing the other.However, AOP has been tested to heavily rely on the choice of k and the convergence resultremains to be seen. Other work relating to (1.6) includes the noise-adaptive renormalizedfixed point iteration approach [24] and the noise-adaptive restricted step shrinkage [25].3hen the number of the sign flips is unavailable, a compensation pursues a solutionwith sign flips as few as possible, which can be fulfilled by the following one-sided (cid:96) functionminimization [8], min x ∈ R n (cid:107) ( (cid:15) − A x) + (cid:107) + η (cid:107) x (cid:107) , s.t. (cid:107) x (cid:107) ≤ s, (1.7)where η and (cid:15) are given positive parameters, and the latter is used to majorize the objectivefunction. The first term in the objective function arises from maximizing a posteriorestimation from the perspective of statistics. It returns the number of positive componentsof ( (cid:15) − A x) and can be regarded as the number of the sign flips if (cid:15) is quite small. Insteadof solving the one-sided (cid:96) model directly, a fixed-point algorithm [8] was created for itsapproximation,min x ∈ R n , w ∈ R m (cid:107) ( (cid:15) − w) + (cid:107) + µ (cid:107) w − A x (cid:107) + η (cid:107) x (cid:107) , s.t. (cid:107) x (cid:107) ≤ s, (1.8)where µ >
0. It has shown that the generated sequence converges to a local minimizer ofthe approximation problem if the maximum singular value of the matrix A is bounded bysome chosen parameters. However, the relationship between the solution obtained by themethod and the original problem (1.7) has not been well explored.Some other numerical algorithms developed to solve the one-bit CS can be seen theconvex relaxation [28], the Bayesian approach [20], the sparse consistent coding algorithms[29] and the method based on a Schur-concave function [32]. Motivated by the previous work, we formulate the one-bit CS as the following double-sparsity constrained optimization:min x ∈ R n , y ∈ R m (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) , (1.9) s.t. (cid:107) x (cid:107) ≤ s, (cid:107) y + (cid:107) ≤ k where η > s (cid:28) n and k (cid:28) m are two integers representing theprior information on the upper bounds of the signal sparsity and the number of sign flips,respectively. When penalizing the sign flip constraint in our model, it turns to (1.8) withthe auxiliary y = (cid:15) − w. It is worth mentioning that the selection of k is very flexiblein our approach as will be shown in the numerical experiments, which reveals that ourapproach has no heavy burden on the pre-knowledge of the true number of sign flips. Ourcontributions in this paper are threefold: 41) The new optimization model.
The double-sparsity constrained optimizationproblem (1.9) is formulated to handle the one-bit CS. It is well-known that thesetwo discrete and nonconvex constraints lead to the NP-hardness in general. Never-theless, a necessary and sufficient optimality condition as stated in (3.1) for a localminimizer is established, see Lemma 3.1. Moreover, the necessary and sufficient opti-mality condition for a global minimizer is also studied in terms of the newly introduced τ -stationary point, see Theorem 3.1.(2) The efficient
GPSP algorithm.
As the established optimality conditions indicate a τ -stationary point is instructive to pursue an optimal solution to (1.9), we thus designa gradient project method with a subspace pursuit scheme interpolated ( GPSP ). Theproposed method is proved to be globally convergent to a unique local minimizerwithout any assumptions, see Theorem 4.1. Furthermore, the produced sequenceeither enjoys a Q-linear convergence rate or is identical to the limiting point afterfinite iterations. Particularly,
GPSP will stop within finite steps once the limiting pointreaches the upper bounds of the corresponding double-sparsity, see Theorem 4.2.(3)
High numerical performance.
GPSP is demonstrated to be relatively robust tothe parameters k , (cid:15) , η in (1.9) in the numerical experiments, which indicates that wedo not need an exact upper bound k of the sign flips. Most importantly, GPSP out-performs some leading solvers for synthetic data, in both time efficiency and recoveryaccuracy.
The remainder of the paper is organized as follows. In Section 2, some necessary mathemat-ical backgrounds are provided, including the notation and the projection onto the feasibleset of the problem (1.9). Section 3 is devoted to the optimality conditions of the problem,associated with the τ -stationary points, followed by its relationship to the global mini-mizers. In Section 4, the gradient projection subspace pursuit ( GPSP ) method is designed,and the global convergence and the Q-linear convergence rate are established. Numericalexperiments are given in Section 5, including the involved parameters ( s, k, (cid:15), η ) tuning andcomparisons with other state-of-the-art solvers. Concluding remarks are made in Section 6.5
Preliminaries
We first define some notation employed throughout this paper. To differ from sgn(t), thesign function is written as sign(t) that returns 0 if t = 0 and sgn(t) otherwise. Given a subset T ⊆ [ n ] := { , , · · · , n } , its cardinality and complementary set are | T | and T := [ n ] \ T . Fora vector x ∈ R n , the support set supp(x) represents the indices of non-zero elements of x andthe neighbourhood with a radius δ > N (x , δ ) := { w ∈ R n : (cid:107) w − x (cid:107) < δ } .Let (cid:107) x (cid:107) [ i ] be the i th largest (in absolute) element of x. In addition, x T stands for thesub-vector contains elements of x indexed on T . Similarly, for a matrix A ∈ R m × n , A Γ T is the sub-matrix containing rows indexed on Γ and columns indexed on T , particularly, A : T = A [ m ] T . Moreover, we merge two vectors x and y by z := (x; y) := (x (cid:62) y (cid:62) ) (cid:62) .For a positive definite matrix H , the H -weighted norm is written (cid:107) z (cid:107) H = (cid:104) z , H z (cid:105) , where (cid:104) z , z (cid:48) (cid:105) := (cid:80) z i z (cid:48) i is the inner product of two vectors. Given a scalar a ∈ R , (cid:100) a (cid:101) returns thesmallest integer that is no less than a . For simplicity, denote S := { x ∈ R n : (cid:107) x (cid:107) ≤ s } , K := { y ∈ R m : (cid:107) y + (cid:107) ≤ k } . (2.1)The feasible region of (1.9) is then denoted by F := S × K , with its interiorint F := { (x , y) ∈ R n × R m : (cid:107) x (cid:107) < s, (cid:107) y + (cid:107) < k } . For a nonempty and closed set Ω ⊆ R n , the projection Π Ω (x) of x ∈ R n onto Ω is given byΠ Ω (x) = argmin {(cid:107) x − w (cid:107) : w ∈ Ω } . By introducing(2.2) Σ(x; s ) := (cid:110) T ⊆ [ n ] : | T | = s, | x i | ≥ | x j | , ∀ i ∈ T , ∀ j / ∈ T (cid:111) , one can easily verify that(2.3) Π S (x) = (cid:110) (x T ; 0) : T ∈ Σ(x; s ) (cid:111) . To derive the projection of a point y ∈ R m onto K , denoteΓ + := { i ∈ [ m ] : y i > } , Γ := { i ∈ [ m ] : y i = 0 } , (2.4) Γ − := { i ∈ [ m ] : y i < } . + , Γ and Γ − should depend on y. We drop their dependence if no extraexplanations are provided for the sake of notational convenience. Based on the abovenotation, for a point y ∈ R m and an integer k ∈ [ m ], we define a set by(2.5) Θ(y; k ) := (cid:40) Γ = (Γ k ∪ Γ − ) ⊆ [ m ] : Γ k ⊆ Γ + , | Γ k | = min { k, | Γ + |} y i ≥ y j ≥ , ∀ i ∈ Γ k , ∀ j / ∈ Γ (cid:41) , where Γ + , Γ and Γ − be given by (2.4). One can observe that any Γ ∈ Θ(y; k ) consists ofthe indices of all negative elements and the first min { k, | Γ + |} largest positive elements ofy. These notation allow us to derive the projection Π K (y) by(2.6) Π K (y) = (cid:110) (y Γ ; 0) : Γ ∈ Θ(y; k ) (cid:111) . A simple example is presented for illustration. Given y = (3 , , , , − (cid:62) , we haveΘ(y; 3) = (cid:110) { , , , } (cid:111) , Π K (y) = { y } , Θ(y; 2) = (cid:110) { , , } , { , , } (cid:111) , Π K (y) = (cid:110) (3 , , , , − (cid:62) , (3 , , , , − (cid:62) (cid:111) . To end this section, we present some properties of the objective function in (1.9) which canbe written in the following form f (x , y) := (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) = (cid:107) z (cid:107) H − (cid:104) b , z (cid:105) + m(cid:15) (2.7) =: f (z) , where the matrix H and the vector b are given by H := 12 ∇ f (z) = (cid:34) A (cid:62) A + ηI A (cid:62) A I (cid:35) , b = (cid:15) (cid:34) A (cid:62) (cid:35) . It is easy to verify that H is symmetric positive definite and hence has all eigenvaluespositive. Denote the smallest and the largest eigenvalues by λ min and λ max , respectively.The quadratic objective function f is then strongly convex and strongly smooth since forany z and z (cid:48) in R n + m , f (z) − f (z (cid:48) ) − (cid:104)∇ f (z (cid:48) ) , z − z (cid:48) (cid:105) = (cid:107) z − z (cid:48) (cid:107) H ∈ (cid:2) λ min (cid:107) z − z (cid:48) (cid:107) , λ max (cid:107) z − z (cid:48) (cid:107) (cid:3) . (2.8) 7 Optimality Conditions
The first-order necessary and sufficient optimality conditions for (1.9) are established inthis section.
Lemma 3.1
A local minimizer z ∗ := (x ∗ ; y ∗ ) ∈ F of (1.9) must satisfy ∇ x f (z ∗ ) = 0 , if (cid:107) x ∗ (cid:107) < s, ( ∇ x f (z ∗ )) T ∗ = 0 , if (cid:107) x ∗ (cid:107) = s, ∇ y f (z ∗ ) = 0 , if (cid:107) y ∗ + (cid:107) < k, ( ∇ y f (z ∗ )) Γ ∗ = 0 , ( ∇ y f (z ∗ )) Γ ∗ ≤ , if (cid:107) y ∗ + (cid:107) = k, (3.1) where T ∗ := supp(x ∗ ) , Γ ∗ := supp(y ∗ ) . Conversely, let a point z ∗ ∈ F satisfy (3.1). Then it is the unique global minimizer if z ∗ ∈ int F and the unique local minimizer otherwise. Furthermore, there is a δ ∗ > suchthat, for any z ∈ F ∩ N (z ∗ , δ ∗ ) , we have the following quadratic growth property f (z) − f (z ∗ ) ≥ (cid:107) z − z ∗ (cid:107) H . (3.2) Proof
Let z ∗ be a local minimizer of (1.9). Then there is a δ > ∈ F ∩ N (z ∗ , δ ),0 ≤ f (z) − f (z ∗ ) (2.8) ≤ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ max (cid:107) z − z ∗ (cid:107) =: g (z) . (3.3)To verify that z ∗ satisfies (3.1), we consider the four involved cases. • (cid:107) y ∗ + (cid:107) < k . If there is an ( ∇ y f (z ∗ )) i (cid:54) = 0, then for any t ∈ R , definex t := x ∗ + t · , y t := y ∗ + t ( ∇ y f (z ∗ )) i · e i , z t := (x t ; y t ) . (3.4)where e i ∈ R m is the i th column of the identity matrix. It is easy to see that z t ∈ F and g (z t ) = ( t + λ max t )[( ∇ y f (z ∗ )) i ] . For any t ∈ (max {− δ/ | ( ∇ y f (z ∗ )) i | , − /λ max } , t ∈ F ∩ N (z ∗ , δ ) and g (z t ) <
0, contradicting with (3.3). Thus ∇ y f (z ∗ ) = 0. • (cid:107) y ∗ + (cid:107) = k . If there is an i ∈ Γ ∗ such that ( ∇ y f (z ∗ )) i (cid:54) = 0, then let x t , y t and z t be as(3.4). The same reasoning for the case (cid:107) y ∗ + (cid:107) < s enables to prove ( ∇ y f (z ∗ )) i = 0.8his displays ( ∇ y f (z ∗ )) Γ ∗ = 0. To show the desired inequality ( ∇ y f (z ∗ )) Γ ∗ ≤ i ∈ Γ ∗ . For any t ≤
0, letx t := x ∗ + t · , y t := y ∗ + t · e i , z t := (x t ; y t ) . (3.5)It follows from y ∗ i = 0 and t ≤ t ) + = y ∗ + and (cid:107) z t − z ∗ (cid:107) = − t . Thus,z t ∈ F ∩ N (z ∗ , δ ) for any t ∈ ( − δ, g (z t ) = t ( ∇ y f (z ∗ )) i + λ max t ≥ , which implies ( ∇ y f (z ∗ )) i + λ max t ≤ . Letting t → ∇ y f (z ∗ )) i ≤
0. Thisdelivers ( ∇ y f (z ∗ )) Γ ∗ ≤ • (cid:107) x ∗ (cid:107) < s . The same reasoning for the case (cid:107) y ∗ + (cid:107) < k leads to ∇ x f (z ∗ ) = 0. • (cid:107) x ∗ (cid:107) = s . The same reasoning for the case (cid:107) y ∗ + (cid:107) < k yields ( ∇ x f (z ∗ )) T ∗ = 0.Conversely, let z ∗ satisfy (3.1). We consider the following four cases. • (cid:107) y ∗ + (cid:107) < k . This leads to (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105) = 0 since ∇ y f (z ∗ ) = 0 by (3.1). • (cid:107) y ∗ + (cid:107) = k . Consider a local region N (z ∗ , δ ) with δ := min { y ∗ i : y ∗ i > } . Thus, forany z ∈ F ∩ N (z ∗ , δ ), we have y j > y ∗ j > y i ≤ , ∀ i ∈ Γ ∗ . (3.6)In fact, if there exists an i ∈ Γ ∗ satisfying y i >
0, then (cid:107) y + (cid:107) ≥ (cid:107) y ∗ + (cid:107) + 1 = k + 1,which contradicts with y ∈ K . Direct calculations yield (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105) (3.1) = (cid:104) , (y − y ∗ ) Γ ∗ (cid:105) + (cid:104) ( ∇ y f (z ∗ )) Γ ∗ , y Γ ∗ (cid:105) (3.1)(3.6) ≥ . • (cid:107) x ∗ (cid:107) < s . This yields (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) = 0 due to ∇ x f (z ∗ ) = 0 by (3.1). • (cid:107) x ∗ (cid:107) = s . Consider a local region N (z ∗ , δ ) with δ := min {| x ∗ i | : x ∗ i (cid:54) = 0 } . For anyz := (x; y) ∈ F ∩ N (z ∗ , δ ), x j (cid:54) = 0 if x ∗ j (cid:54) = 0, which indicates T ∗ ⊆ supp(x). Thistogether with x ∈ S that (cid:107) x (cid:107) ≤ s = | T ∗ | suffices tosupp(x) = T ∗ . (3.7)Now, one can verify that (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) (3.1)(3.7) = (cid:104) , (x − x ∗ ) T ∗ (cid:105) + (cid:104) ( ∇ x f (z ∗ )) T ∗ , (cid:105) = 0 . ∗ ∈ int F , then ∇ f (z ∗ ) = 0. Combining with (2.8), we obtain f (z) − f (z ∗ ) = (cid:107) z − z ∗ (cid:107) H + (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) > , ∀ z (cid:54) = z ∗ . Thus, z ∗ is the unique global optimal solution to the problem (1.9).If z ∗ / ∈ int F , then for any z ∈ F ∩ N (z ∗ , min { δ , δ } ), it follows from (2.8) that f (z) − f (z ∗ ) − (cid:107) z − z ∗ (cid:107) H = (cid:104)∇ x f (z ∗ ) , x − x ∗ (cid:105) + (cid:104)∇ y f (z ∗ ) , y − y ∗ (cid:105)≥ . Therefore, z ∗ is the unique global minimizer of min { f (z) : z ∈ F ∩ N (z ∗ , min { δ , δ } ) } ,which means it is the unique local minimizer of the problem (1.9). The above lemmashows the optimality conditions of a point being a local minimizer. We further establishthe conditions for a global minimizer. To do that, we introduce a τ -stationary point. Apoint z ∗ := (x ∗ ; y ∗ ) is called a τ -stationary point of (1.9) with some τ > ∗ ∈ Π F (z ∗ − τ ∇ f (z ∗ )) . (3.8)An equivalent characterization is presented in the following lemma. Lemma 3.2
A point z ∗ is a τ -stationary point of the problem (1.9) with some τ > ifand only if it satisfies (cid:107) x ∗ (cid:107) ≤ s, τ ( ∇ x f (z ∗ )) i (cid:40) = 0 , i ∈ T ∗ , ∈ [ −(cid:107) x ∗ (cid:107) [ s ] , (cid:107) x ∗ (cid:107) [ s ] ] , i / ∈ T ∗ , (cid:107) y ∗ + (cid:107) ≤ k, τ ( ∇ y f (z ∗ )) i (cid:40) = 0 , i ∈ Γ ∗ , ∈ [ −(cid:107) y ∗ + (cid:107) [ k ] , , i / ∈ Γ ∗ . (3.9) Proof A τ -stationary point satisfies (3.8) which is equivalent to x ∗ ∈ Π S (x ∗ − τ ∇ x f (z ∗ )) , y ∗ ∈ Π K (y ∗ − τ ∇ y f (z ∗ )) . (3.10)Therefore, we show the equivalence between (3.9) and (3.10). For the x ∗ part, this can beguaranteed by [1, Lemma 2.2]. For the y ∗ part, the projection (2.6) enables to show that(3.9) ⇒ (3.10). So we only prove (3.10) ⇒ (3.9). Let λ ∗ := ∇ y f (z ∗ ). It follows from (2.6)that y ∗ ∈ Π K (y ∗ − τ λ ∗ ) = (cid:40)(cid:34) y ∗ Γ − τ λ ∗ Γ (cid:35) : Γ ∈ Θ(y ∗ − τ λ ∗ ; k ) (cid:41) . This derives (cid:107) y ∗ + (cid:107) ≤ k , and for any Γ ∈ Θ(y ∗ − τ λ ∗ ; k ),y ∗ Γ = 0 , λ ∗ Γ = 0 , y ∗ − τ λ ∗ = (cid:2) y ∗ Γ ; − τ λ ∗ Γ (cid:3) , (3.11) 10hich together with the definition of Θ(y ∗ − τ λ ∗ ; k ) in (2.5) gives rise toΓ = Γ k ∪ Γ − = supp(y ∗ ) , Γ = (Γ + \ Γ k ) ∪ Γ , where Γ + , Γ − and Γ are defined as (2.4) in which y is replaced by y ∗ − τ λ ∗ . On the indexset Γ, all elements y ∗ i − τ λ ∗ i = − τ λ ∗ i ≥
0, namely, λ ∗ i ≤ , i ∈ Γ.For (cid:107) y ∗ + (cid:107) < k , suppose there is an i ∈ Γ such that λ ∗ i <
0, then y ∗ − τ λ ∗ has at least (cid:107) y ∗ + (cid:107) + 1 ≤ k positive entries and thus (cid:107) y ∗ + (cid:107) = (cid:107) (Π K (y ∗ − τ λ ∗ )) + (cid:107) ≥ (cid:107) y ∗ + (cid:107) + 1 . This is a contradiction. So, λ ∗ Γ = 0, leading to λ ∗ = 0 by (3.11), which satisfies (3.9).For (cid:107) y ∗ + (cid:107) = k , (3.9) is satisfied for any j ∈ supp(y ∗ ) = Γ due to λ ∗ Γ = 0. For j / ∈ supp(y ∗ ), namely, j ∈ Γ, the definition of Γ k in (2.5) yields0 ≤ y ∗ j − τ λ ∗ j ≤ y ∗ i − τ λ ∗ i , ∀ i ∈ Γ k , which together with Γ k ⊆ Γ and (3.11) results in0 ≤ − τ λ ∗ j ≤ y ∗ i , ∀ i ∈ Γ k . Hence, −(cid:107) y ∗ + (cid:107) [ k ] = − min i ∈ Γ k y ∗ i ≤ τ λ ∗ j ≤ , ∀ j ∈ Γ( j / ∈ supp(y ∗ )), showing (3.9). The fol-lowing theorem reveals the relationships between τ -stationary points and global minimizersof the problem (1.9). Theorem 3.1
For (1.9) and a point z ∗ ∈ F , the following statements hold.a) For z ∗ ∈ int F , the point z ∗ is a global minimizer if and only if it is a τ -stationarypoint with τ > .b) For z ∗ / ∈ int F , a global minimizer z ∗ is a τ -stationary point with < τ ≤ / (2 λ max ) ,and conversely, a τ -stationary point with τ ≥ / (2 λ min ) is also a global minimizer. Proof a) If z ∗ ∈ int F is a global minimizer, it follows readily from Lemma 3.1 that ∇ f (z ∗ ) = 0. Thus, by definition, z ∗ is a τ -stationary point for any τ > ∗ ∈ int F is a τ -stationary point for some τ >
0, then (cid:107) x ∗ (cid:107) [ s ] = (cid:107) y ∗ + (cid:107) [ k ] =0, which further implies ∇ x f (z ∗ ) = 0 and ∇ y f (z ∗ ) = 0 by (3.9). Applying Lemma 3.1 again,one can conclude that z ∗ is a global minimizer.b) Let z ∗ be a global minimizer. If it is not a τ -stationary point with 0 < τ ≤ / (2 λ max ),then we have the following condition,z( (cid:54) = z ∗ ) ∈ Π F (z ∗ − τ ∇ f (z ∗ )) . (3.12) 11hus, (cid:107) z − (z ∗ − τ ∇ f (z ∗ )) (cid:107) < (cid:107) z ∗ − (z ∗ − τ ∇ f (z ∗ )) (cid:107) , which suffices to 2 τ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) < −(cid:107) z − z ∗ (cid:107) . Together with (2.8) and 0 < τ ≤ / (2 λ max ) derives f (z) − f (z ∗ ) ≤ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ max (cid:107) z − z ∗ (cid:107) < ( λ max − / (2 τ )) (cid:107) z − z ∗ (cid:107) ≤ . It contradicts the global optimality of z ∗ . Therefore, z ∗ is a τ -stationary point with 0 <τ ≤ / (2 λ max ).Conversely, let z ∗ be a τ -stationary point with τ ≥ / (2 λ min ). The conditions in (3.8)and the definition of Π F indicates (cid:107) z ∗ − (z ∗ − τ ∇ f (z ∗ )) (cid:107) ≤ (cid:107) z − (z ∗ − τ ∇ f (z ∗ )) (cid:107) , for any z ∈ F , delivering 2 τ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) ≥ −(cid:107) z − z ∗ (cid:107) . This and (2.8) yield f (z) − f (z ∗ ) ≥ (cid:104)∇ f (z ∗ ) , z − z ∗ (cid:105) + λ min (cid:107) z − z ∗ (cid:107) ≥ ( λ min − / (2 τ )) (cid:107) z − z ∗ (cid:107) . Since τ ≥ / (2 λ min ), the above relation shows the global optimality of z ∗ to (1.9). A gradient projection method with a subspace pursuit strategy is proposed to handle theproblem (1.9) by seeking a τ -stationary point. For notational simplicity, hereafter, for aparameter τ ∈ (0 , (cid:96) ( τ ) = (cid:2) x (cid:96) ( τ ); y (cid:96) ( τ ) (cid:3) ∈ Π F (z (cid:96) − τ ∇ f (z (cid:96) )) , (4.1)for the (cid:96) th iteration z (cid:96) := (x (cid:96) ; y (cid:96) ). Analogous to the Γ-related indices defined for y in (2.4),we also define Γ (cid:96) + := (cid:8) i ∈ [ m ] : y (cid:96)i > (cid:9) , (cid:101) Γ (cid:96) + := { i ∈ [ m ] : (y (cid:96) ( τ (cid:96) )) i > } , Γ (cid:96) := (cid:8) i ∈ [ m ] : y (cid:96)i = 0 (cid:9) , Γ (cid:96) − := (cid:8) i ∈ [ m ] : y (cid:96)i < (cid:9) . (cid:96) and x (cid:96) ( τ (cid:96) ) as follows T (cid:96) := supp(x (cid:96) ) , (cid:101) T (cid:96) := supp(x (cid:96) ( τ (cid:96) )) . (4.2)Given z (cid:96) = (x (cid:96) ; y (cid:96) ) ∈ F , define the following subspaceΩ(z (cid:96) ) := (cid:40) z = (x; y) : x T (cid:96) ∈ R | T (cid:96) | , x T (cid:96) = 0 , y Γ (cid:96) + ∈ R | Γ (cid:96) + | , y Γ (cid:96) = 0 , y Γ (cid:96) − ≤ (cid:41) . (4.3)It is easy to see that Ω(z (cid:96) ) ⊆ F . Based on these notation, we summarize the framework ofthe proposed method in Algorithm 1. Algorithm 1:
GPSP : Gradient projection subspace pursuitInitialize z ∈ F , tol = + ∞ , β ∈ (0 ,
1) and ρ, ε >
0. Set (cid:96) := 0. while tol (cid:96) > ε do Gradient descent: Find the smallest integer σ = 0 , , · · · such that f (z (cid:96) ( β σ )) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( β σ ) − z (cid:96) (cid:107) . (4.4)Set τ (cid:96) = β σ , u (cid:96) := z (cid:96) ( τ (cid:96) ) and z (cid:96) +1 = u (cid:96) .Subspace pursuit: if T (cid:96) = (cid:101) T (cid:96) and Γ (cid:96) + = (cid:101) Γ (cid:96) + then v (cid:96) = argmin { f (z) : z ∈ Ω(z (cid:96) ) } . (4.5)If f (v (cid:96) ) ≤ f (u (cid:96) ) − ρ (cid:107) v (cid:96) − u (cid:96) (cid:107) , then set z (cid:96) +1 = v (cid:96) . end Compute tol (cid:96) := (cid:107) u (cid:96) − z (cid:96) (cid:107) and set (cid:96) := (cid:96) + 1. end Output the solution x ∗ = x (cid:96) / (cid:107) x (cid:96) (cid:107) . Observing that the initial point z ∈ F , and Ω(z (cid:96) ) ⊆ F , we can see that all iterationsare feasible. Particularly, if the gap tol (cid:96) = (cid:107) z (cid:96) − u (cid:96) (cid:107) vanishes, thenz (cid:96) = u (cid:96) ∈ Π F (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) , which indicates that z (cid:96) is a τ -stationary point with τ ≤ τ (cid:96) . Additionally, once the conditions T (cid:96) = (cid:101) T (cid:96) and Γ (cid:96) + = (cid:101) Γ (cid:96) + are satisfied, we have u l ∈ Ω(z (cid:96) ). The unique minimizer v l of f overΩ(z (cid:96) ) implies that (cid:104)∇ f (v (cid:96) ) , u (cid:96) − v (cid:96) (cid:105) ≥ . (4.6) 13n virtue of (2.8), we have f (v (cid:96) ) ≤ f (u (cid:96) ) − λ min (cid:107) u (cid:96) − v (cid:96) (cid:107) . (4.7)Suppose 0 < ρ ≤ λ min . Then the candidate v (cid:96) will be taken, namely, z (cid:96) +1 = v (cid:96) . Remark 4.1
The main computation in Algorithm 1 is taken when updating u (cid:96) and v (cid:96) ateach iteration.i) To update u (cid:96) , we need to select one point from Π F (z (cid:96) − τ ∇ f (z (cid:96) )) . Namely, three quan-tities are computed: ¯z (cid:96) := (¯x (cid:96) ; ¯y (cid:96) ) := z (cid:96) − τ ∇ f (z (cid:96) ) , Π S (¯x (cid:96) ) and Π K (¯y (cid:96) ) . For the for-mer, the computational complexity is about O ( mn ) . To select one point from Π S (¯x (cid:96) ) ,we only pick the first s largest (in absolute) elements of ¯x (cid:96) . This allows us to use aMATLAB built-in function maxk whose computational complexity is O ( n + s log s ) .Similarly, for Π K (¯y (cid:96) ) , the computational complexity is O ( m + k log k ) . Thus, updating u (cid:96) takes a computational complexity of order O ( σmn ) , where σ is the smallest integersatisfying (4.4).ii) To update v (cid:96) , one needs to solve a quadratic programming, v (cid:96) = argmin (x;y) (cid:107) A x + y − (cid:15) (cid:107) + η (cid:107) x (cid:107) (4.8) s . t . x T (cid:96) = 0 , y Γ (cid:96) = 0 , y Γ (cid:96) − ≤ , for the fixed T (cid:96) , Γ (cid:96) and Γ (cid:96) − . Any solvers for solving the quadratic programming canbe used to solve (4.8) to pursue a solution in good quality. To further reduce thecomputation cost, we drop the constraint y Γ (cid:96) − ≤ from (4.8), and simply solve thesystem of equations: x T (cid:96) = 0 , y Γ (cid:96) = 0 , A (cid:62) : T (cid:96) A : T (cid:96) + ηI A (cid:62) Γ (cid:96) T (cid:96) A Γ (cid:96) T (cid:96) I (cid:34) x T (cid:96) y Γ (cid:96) (cid:35) = (cid:34) A (cid:62) : T (cid:96) (cid:15) (cid:15) (cid:35) . The solution ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) to the above equations can be derived by (cid:101) x (cid:96)T (cid:96) = (cid:104) A (cid:62) Γ (cid:96) T (cid:96) A Γ (cid:96) T (cid:96) + ηI (cid:105) − (cid:104) A (cid:62) Γ (cid:96) T (cid:96) (cid:15) (cid:105) , (cid:101) x (cid:96)T (cid:96) = 0 , (cid:101) y (cid:96) Γ (cid:96) = 0 , (cid:101) y (cid:96) Γ (cid:96) = (cid:15) − A Γ (cid:96) T (cid:96) (cid:101) x (cid:96)T (cid:96) . f (cid:101) y (cid:96) Γ (cid:96) − ≤ , namely, ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) is the solution to (4.8), then we set v (cid:96) = ( (cid:101) x (cid:96) ; (cid:101) y (cid:96) ) . Oth-erwise, this point will not be taken into consideration, and we set z (cid:96) +1 = u (cid:96) . Thecomputational complexity of addressing the above equations is about O ( ms + s ) .Overall, the total computational complexity of each iteration is O ( σmn + ms + s ) . The first result shows that the Armijo-type step size { τ (cid:96) } is well defined. Lemma 4.1
For any < τ ≤ / (2 ρ + 2 λ max ) , it holds that f (z (cid:96) ( τ )) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) , (4.9) and thus inf (cid:96) ≥ { τ (cid:96) } ≥ τ > , where τ := min (cid:26) , β ρ + 2 λ max (cid:27) . (4.10) Proof
It follows from z (cid:96) ( τ ) ∈ Π F (z (cid:96) − τ ∇ f (z (cid:96) )) that (cid:107) z (cid:96) ( τ ) − (z (cid:96) − τ ∇ f (z (cid:96) )) (cid:107) ≤ (cid:107) z (cid:96) − (z (cid:96) − τ ∇ f (z (cid:96) )) (cid:107) , which results in 2 τ (cid:104)∇ f (z (cid:96) ) , z (cid:96) ( τ ) − z (cid:96) (cid:105) ≤ −(cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) . (4.11)Combining with (2.8) leads to f (z (cid:96) ( τ )) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , z (cid:96) ( τ ) − z (cid:96) (cid:105) + λ max (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) − (cid:16) / (2 τ ) − λ max (cid:17) (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) − ρ (cid:107) z (cid:96) ( τ ) − z (cid:96) (cid:107) , where the last inequality is from 0 < τ ≤ / (2 ρ + 2 λ max ). Invoking the Armijo-type stepsize rule, one has τ (cid:96) ≥ β/ (2 ρ + 2 λ max ), which together with τ (cid:96) ≤ Lemma 4.2
Let { z (cid:96) } be the sequence generated by GPSP and τ be given by (4.10). Thenthe following results hold.a) The sequence { z (cid:96) } is bounded and lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = lim (cid:96) →∞ (cid:107) u (cid:96) − z (cid:96) (cid:107) = 0 . ) Any accumulating point of the sequence { z (cid:96) } is a τ -stationary point with < τ ≤ τ of the problem (1.9). Proof a) By Lemma 4.1 and u (cid:96) = z (cid:96) ( τ (cid:96) ), we have f (u (cid:96) ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) . (4.12)By the framework of Algorithm 1, if z (cid:96) +1 = u (cid:96) , then the above condition implies, f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) (4.13) = f (z (cid:96) ) − ρ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) . If z (cid:96) +1 = v (cid:96) , then we obtain f (z (cid:96) +1 ) = f (v (cid:96) ) ≤ f (u (cid:96) ) − ρ (cid:107) z (cid:96) +1 − u (cid:96) (cid:107) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) − ρ (cid:107) z (cid:96) +1 − u (cid:96) (cid:107) (4.14) ≤ f (z (cid:96) ) − ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , where the second and last inequalities used (4.12) and a fact (cid:107) a + b (cid:107) ≤ (cid:107) a (cid:107) + 2 (cid:107) b (cid:107) forall vectors a and b. Both cases lead to f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , (4.15) f (z (cid:96) +1 ) ≤ f (z (cid:96) ) − ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) . Therefore, { f (z (cid:96) ) } is a non-increasing sequence, and thusmax {(cid:107) A x (cid:96) − (cid:15) + y (cid:96) (cid:107) , η (cid:107) x (cid:96) (cid:107) } ≤ f (z (cid:96) ) ≤ f (z ) , which indicates the boundedness of { x (cid:96) } and { y (cid:96) } , and so that of { z (cid:96) } . The non-increasingproperty in (4.15) and f ≥ (cid:88) (cid:96) ≥ max { ( ρ/ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) , ρ (cid:107) u (cid:96) − z (cid:96) (cid:107) }≤ (cid:88) (cid:96) ≥ (cid:2) f (z (cid:96) ) − f (z (cid:96) +1 ) (cid:3) = f (z ) − lim (cid:96) →∞ f (z (cid:96) +1 ) ≤ f (z ) . The above condition suffices to lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = lim (cid:96) →∞ (cid:107) u (cid:96) − z (cid:96) (cid:107) = 0 . b) Let z ∗ be any accumulating point of { z (cid:96) } . Then there exists a subset J of { , , , · · · } such that lim (cid:96) ( ∈ J ) →∞ z (cid:96) = z ∗ . This further implies lim (cid:96) ( ∈ J ) →∞ u (cid:96) = z ∗ by applying a). In16ddition, as stated in Lemma 4.1, we have { τ (cid:96) } ⊆ [ τ , L of J and a scalar τ ∗ ∈ [ τ ,
1] such that { τ (cid:96) : (cid:96) ∈ L } → τ ∗ . To summarize,we have lim (cid:96) ( ∈ L ) →∞ z (cid:96) = lim (cid:96) ( ∈ L ) →∞ u (cid:96) = z ∗ , lim (cid:96) ( ∈ L ) →∞ τ (cid:96) = τ ∗ ∈ [ τ , . (4.16)Let z (cid:96) := z (cid:96) − τ (cid:96) ∇ f (z (cid:96) ). The framework of Algorithm 1 impliesu (cid:96) ∈ Π F (z (cid:96) ) , lim (cid:96) ( ∈ L ) →∞ z (cid:96) = z ∗ − τ ∗ ∇ f (z ∗ ) =: z ∗ . (4.17)The first condition means u (cid:96) ∈ F for any (cid:96) ≥
1. Note that F is closed and z ∗ is theaccumulating point of { u (cid:96) } by (4.16). Therefore, z ∗ ∈ F , which results inmin z ∈F (cid:107) z − z ∗ (cid:107) ≤ (cid:107) z ∗ − z ∗ (cid:107) . (4.18)If the strict inequality holds in the above condition, then there is an ε > (cid:107) z ∗ − z ∗ (cid:107) − ε = min z ∈F (cid:107) z − z ∗ (cid:107)≥ min z ∈F ( (cid:107) z − z (cid:96) (cid:107) − (cid:107) z (cid:96) − z ∗ (cid:107) )= (cid:107) u (cid:96) − z (cid:96) (cid:107) − (cid:107) z (cid:96) − z ∗ (cid:107) where the last equality is from (4.17). Taking the limit of both sides of the above conditionalong (cid:96) ( ∈ L ) → ∞ yields (cid:107) z ∗ − z ∗ (cid:107) − ε ≥ (cid:107) z ∗ − z ∗ (cid:107) by (4.16) and (4.17), a contradictionwith ε >
0. Therefore, we must have the equality holds in (4.18), showing thatz ∗ ∈ Π F (z ∗ ) = Π F (z ∗ − τ ∗ ∇ f (z ∗ )) . The above relation means the conditions in (3.9) hold for τ = τ ∗ , then these conditionsmust hold for any 0 < τ ≤ τ due to τ ≤ τ ∗ from (4.16), namely,z ∗ ∈ Π F (z ∗ − τ ∇ f (z ∗ )) , displaying that z ∗ is a τ -stationary point of (1.9), as desired. The above lemma allows usto conclude that the whole sequence converges. Theorem 4.1
Let { z (cid:96) } be the sequence generated by GPSP . Then the whole sequence con-verges to z ∗ , which is necessarily the unique global minimizer of (1.9) if z ∗ ∈ int F and theunique local minimizer otherwise. roof As shown in Lemma 4.2, one can find a subsequence of { z (cid:96) } that converges tothe τ -stationary point z ∗ with 0 < τ ≤ τ of (1.9). Recall that a τ -stationary point z ∗ thatsatisfies (3.9) also meets (3.1), which by Lemma 3.1 indicates that z ∗ is the unique globalminimizer if z ∗ ∈ int F and the unique local minimizer otherwise. In other words, z ∗ isan isolated local minimizer of (1.9). Finally, it follows from z ∗ being isolated, [23, Lemma4.10] and lim (cid:96) →∞ (cid:107) z (cid:96) +1 − z (cid:96) (cid:107) = 0 by Lemma 4.2 that the whole sequence converges to z ∗ .The following theorem establishes the convergence rate. One can see that GPSP eitherenjoys a Q -linear convergence rate or terminates at the limit of the sequence after a certainpoint. Theorem 4.2
Let { z (cid:96) } be the sequence generated by GPSP and z ∗ be the limit.i) For sufficiently large (cid:96) , it follows (cid:107) u (cid:96) − z ∗ (cid:107) ≤ − τ λ min τ λ min (cid:107) z (cid:96) − z ∗ (cid:107) . (4.19) ii) The sequence either has infinitely many and sufficiently large (cid:96) satisfying (cid:107) z (cid:96) +1 − z ∗ (cid:107) ≤ − τ λ min τ λ min (cid:107) z (cid:96) − z ∗ (cid:107) , (4.20) or remains identical to z ∗ after a certain point, say ˆ (cid:96) ≥ , namely, z (cid:96) = v (cid:96) = z ∗ , ∀ (cid:96) > ˆ (cid:96). (4.21) iii) Let ρ be chosen as < ρ ≤ λ min . If the limit z ∗ satisfies (cid:107) x ∗ (cid:107) = s and (cid:107) y ∗ + (cid:107) = k ,then GPSP will terminate at the limit z ∗ within finite steps. Proof i) Theorem 4.1 states that the whole sequence { z (cid:96) } converges to z ∗ . So does { u (cid:96) } by Lemma 4.2. Then it is easy to show that, for sufficiently large (cid:96) ,supp(z ∗ ) ⊆ supp(u (cid:96) ) . In addition, it follows from u (cid:96) := z (cid:96) ( τ (cid:96) ) = (x (cid:96) ( τ (cid:96) ); y (cid:96) ( τ (cid:96) )) ∈ Π F (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) thatx (cid:96) ( τ (cid:96) ) ∈ Π S (x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) )) , y (cid:96) ( τ (cid:96) ) ∈ Π K (y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) )) , which by (2.3) and (2.6) results inx (cid:96) ( τ (cid:96) ) = ((x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) )) T ; 0) , y (cid:96) ( τ (cid:96) ) = ((y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) )) Γ ; 0)(4.22) 18here T ∈ Σ(x (cid:96) − τ (cid:96) ∇ x f (z (cid:96) ); s ) and Γ ∈ Θ(y (cid:96) − τ (cid:96) ∇ y f (z (cid:96) ); k ). Therefore,supp(z ∗ ) ⊆ supp(u (cid:96) ) = supp((x (cid:96) ( τ (cid:96) ); y (cid:96) ( τ (cid:96) ))) ⊆ T ∪ ( n + Γ) := W, (4.23)for sufficiently large (cid:96) , where n + Γ := { n + i : i ∈ Γ } , which leads toz ∗ W = u (cid:96)W = 0 , u (cid:96)W = (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W , (4.24)where the last equality is from (4.22). Those conditions contribute to (cid:104) z ∗ − u (cid:96) , u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:105) = (cid:104) z ∗ W − u (cid:96)W , u (cid:96)W − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W (cid:105) + (cid:104) z ∗ W − u (cid:96)W , u (cid:96)W − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) W (cid:105) = 0 . (4.25)Using the above fact, we obtain (cid:107) z ∗ − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) = (cid:107) z ∗ − u (cid:96) + u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) = (cid:107) z ∗ − u (cid:96) (cid:107) + (cid:107) u (cid:96) − (z (cid:96) − τ (cid:96) ∇ f (z (cid:96) )) (cid:107) , which after the simple manipulating results in (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:107) u (cid:96) − z (cid:96) (cid:107) (4.26) = (cid:104)∇ f (z (cid:96) ) , z ∗ − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) . By Lemma 4.1, the Armijo-type step size rule indicates τ ≤ τ (cid:96) ≤ / (2 ρ + 2 λ max ) < / (2 λ max ). Therefore, it follows from (2.8) that f (u (cid:96) ) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + λ max (cid:107) u (cid:96) − z (cid:96) (cid:107) ≤ f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , u (cid:96) − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:107) u (cid:96) − z (cid:96) (cid:107) = f (z (cid:96) ) + (cid:104)∇ f (z (cid:96) ) , z ∗ − z (cid:96) (cid:105) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) (2.8) ≤ f (z ∗ ) − λ min (cid:107) z ∗ − z (cid:96) (cid:107) + 12 τ (cid:96) (cid:2) (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) (cid:3) . (4.27)Recall that a τ -stationary point z ∗ that satisfies (3.9) also meets (3.1). By Lemma 3.1,there exits a δ ∗ > ∈ F ∩ N (z ∗ , δ ∗ ). Thus, the fact u (cid:96) → z ∗ indicates that for sufficiently large (cid:96) , we have u l ∈ F ∩ N (z ∗ , δ ∗ ) and hence f (u (cid:96) ) ≥ f (z ∗ ) + (cid:107) z ∗ − u (cid:96) (cid:107) H ≥ f (z ∗ ) + λ min (cid:107) z ∗ − u (cid:96) (cid:107) . τ (cid:96) λ min (cid:107) z ∗ − u (cid:96) (cid:107) ≤ − τ (cid:96) λ min (cid:107) z ∗ − z (cid:96) (cid:107) + (cid:107) z ∗ − z (cid:96) (cid:107) − (cid:107) z ∗ − u (cid:96) (cid:107) . Note that τ (cid:96) ≥ τ . The desired assertion then follows immediately from(1 + 2 τ λ min ) (cid:107) z ∗ − u (cid:96) (cid:107) ≤ (1 − τ λ min ) (cid:107) z ∗ − z (cid:96) (cid:107) . ii) If there are infinitely many (cid:96) such that z (cid:96) +1 = u (cid:96) , then (4.20) for sufficiently large (cid:96) can be derived by (4.19) immediately. Otherwise, there is an ˜ (cid:96) ≥ (cid:96) +1 = v (cid:96) forany (cid:96) ≥ ˜ (cid:96) . The updating (4.5) of v (cid:96) indicates T (cid:96) +1 ⊆ T (cid:96) , Γ (cid:96) +1+ ⊆ Γ (cid:96) + , Γ (cid:96) +10 ⊇ Γ (cid:96) , Γ (cid:96) +1 − ⊆ Γ (cid:96) − . Note that these sets have finite elements. Therefore, the sequences { T (cid:96) } , { Γ (cid:96) + } , { Γ (cid:96) } , { Γ (cid:96) − } converge. In other words, there is an ˆ (cid:96) ≥ ˜ (cid:96) such that, for any (cid:96) ≥ ˆ (cid:96) , it holds T (cid:96) +1 = T (cid:96) , Γ (cid:96) +1+ = Γ (cid:96) + , Γ (cid:96) +10 = Γ (cid:96) , Γ (cid:96) +1 − = Γ (cid:96) − , (4.28)which implies Ω(z (cid:96) +1 ) = Ω(z (cid:96) ). Then it follows from (4.5) thatz (cid:96) +2 = v (cid:96) +1 = argmin { f (z) : z ∈ Ω(z (cid:96) +1 ) } = argmin { f (z) : z ∈ Ω(z (cid:96) ) } = v (cid:96) = z (cid:96) +1 . Overall, for any (cid:96) > ˆ (cid:96) , we have z (cid:96) = v ˆ (cid:96) . Recall Theorem 4.1 that the whole sequence { z (cid:96) } converges to z ∗ , which suffices to z ∗ = lim (cid:96) →∞ z (cid:96) = v ˆ (cid:96) . iii) Note that both z (cid:96) ( ∈ F ) → z ∗ and u (cid:96) ( ∈ F ) → z ∗ . We must have T (cid:96) = (cid:101) T (cid:96) andΓ (cid:96) + = (cid:101) Γ (cid:96) + for sufficiently large (cid:96) if (cid:107) x ∗ (cid:107) = s and (cid:107) y ∗ + (cid:107) = k . Therefore, the framework ofof Algorithm 1 allows us to assert that z (cid:96) +1 = v (cid:96) for all sufficiently large (cid:96) due to f (v (cid:96) ) (4.7) ≤ f (u (cid:96) ) − λ min (cid:107) u (cid:96) − v (cid:96) (cid:107) ≤ f (u (cid:96) ) − ρ (cid:107) u (cid:96) − v (cid:96) (cid:107) , where the second inequality is from 0 < ρ ≤ λ min . Then similar reasoning to prove ii) canclaim the conclusion immediately. In this section, we will conduct extensive numerical experiments to showcase the perfor-mance of our proposed
GPSP , by using MATLAB (R2019a) on a laptop of 32GB memoryand Inter(R) Core(TM) i9-9880H 2.3Ghz CPU.20 .1 Testing examples
Examples with the data generated from the Gaussian distributions are taken into account.
Example 5.1 (Independent covariance [34, 8])
Entries of
Φ := [ φ , · · · , φ m ] (cid:62) ∈ R m × n and the nonzero entries of the ground-truth s ∗ -sparse vector x ∗ ∈ R n (i.e., (cid:107) x ∗ (cid:107) ≤ s ∗ ) aregenerated from the independent and identically distributed (i.i.d.) samples of the standardGaussian distribution N (0 , . To avoid tiny non-zero entries of x ∗ , let x ∗ i = x ∗ i + sign( x ∗ i ) for non-zero x ∗ i , followed by normalizing x ∗ to be a unit vector. Let c ∗ = sgn(Φx ∗ ) and ˜c = sgn(Φx ∗ + ε ) , where entries of the noise ε are the i.i.d. samples of N (0 , . ) . Finally,we randomly select (cid:100) rm (cid:101) entries in ˜c and flip their signs, and the flipped vector is denotedby c , where r is the flipping ratio. Example 5.2 (Correlated covariance [15])
Rows of Φ are generated from the i.i.d.samples of N (0 , Σ) with Σ ij = v | i − j | , i, j ∈ [ n ] , where v ∈ (0 , . Then x ∗ , c ∗ and c aregenerated the same as those in Example 5.1. To demonstrate the performance of one method, apart from the CPU time , we will alsoreport the signal-to-noise ratio (
SNR ), the hamming error ( HE ) and the hamming distance( HD ). They are defined by SNR := 10log (cid:107) x − x ∗ (cid:107) − , HD := (1 /m ) (cid:107) sgn(Φx) − c (cid:107) , HE := (1 /m ) (cid:107) sgn(Φx) − c ∗ (cid:107) , where x is the solution generated by one method. Apparently, the larger SNR (or the smaller HE or HD ) means the better recovery. Set the accuracy parameter ε = 10 − in Algorithm 1. GPSP will terminate if (cid:107) z (cid:96) − u (cid:96) (cid:107) ≤ ε or (cid:96) > β = 0 . ρ = 10 − and z = 0. The parameters (cid:15) , η , s and k in (1.9) are tuned as follows. (I) Selection of η . For Example 5.1, fix n = 500, (cid:15) = 0 . , s = 5 and k = (cid:100) . m (cid:101) but vary m ∈ { . , . , . , } n and η ∈ { − , − , · · · , } . Average results over 500trials are reported in Figure 1. It can be evidently seen that results are stabilized when η ≤
10 while getting worse when η >
10 is rising. Similar trends are also observed for
GPSP ,
10] can be used to set η . In fact, GPSP alsoperforms the similar results when η = 0. For simplicity, we fix η = 10 − . -5 SNR m=0.25nm=0.50nm=0.75nm=1.00n -5 HD m=0.25nm=0.50nm=0.75nm=1.00n -5 HE m=0.25nm=0.50nm=0.75nm=1.00n Figure 1: Effect to η for Example 5.1. (II) Selection of (cid:15) . We now fix n = 500, η = 10 − , s = 5 and k = (cid:100) . m (cid:101) but alter m ∈ { . , . , . , } n and (cid:15) ∈ [10 − ,
10] for
GPSP solving Example 5.1. Asshown in Figure 2, the average results over 500 trials do not fluctuate significantly amongthe changing of (cid:15) , which indicates
GPSP is quite robust to the choices of (cid:15) in the range[10 − , GPSP solving Example 5.2. Forsimplicity, we fix (cid:15) = 0 .
01 in the subsequent numerical experiments. -4 -2 SNR m=0.25nm=0.50nm=0.75nm=1.00n -4 -2 HD m=0.25nm=0.50nm=0.75nm=1.00n -4 -2 HE m=0.25nm=0.50nm=0.75nm=1.00n Figure 2: Effect to (cid:15) for Example 5.1. (III) Selection of s . For the sparsity level s , it clearly has a heavy influence ofthe recovery quality. As shown in Figure 3, the ground-truth signal x ∗ has s ∗ = 5 non-zero components with their indices denoted by T ∗ = supp(x ∗ ). Apparently, GPSP gets themost accurate signal if we set s = s ∗ because it almost exactly recovers those non-zero22omponents. For s = s ∗ − T ∗ . While for s = s ∗ + 2 or s = s ∗ + 4, GPSP generates a solution x whose support set supp(x) covers T ∗ with extra incorrect indices.However, compared with the magnitude | x i | , i ∈ T ∗ , those redundant non-zero components | x i | , i ∈ supp(x) \ T ∗ are pretty small. If we remove those small parts and normalize thesignal to have a unit length, then the new signal is much closer to x ∗ . For simplicity, weset s = s ∗ in the sequel. -0.500.5 SNR = 7.469, s = s * - 2
100 200 300 400 500
Ground-TruthRecovered -0.500.5
SNR = 21.51, s = s * + 0
100 200 300 400 500
Ground-TruthRecovered -0.500.5
SNR = 15.26, s = s * + 2
100 200 300 400 500
Ground-TruthRecovered -0.500.5
SNR = 12.96, s = s * + 4
100 200 300 400 500
Ground-TruthRecovered
Figure 3: Effect to s for Example 5.1 with ( n, m, s ∗ , r ) = (500 , , , . SNR k r=0.025r=0.050 r=0.075r=0.100 HD k r=0.100r=0.075 r=0.050r=0.025 Time k r=0.025r=0.050 r=0.075r=0.100 Figure 4: Effect to k for Example 5.1. (IV) Selection of k . Note that k is the upper bound of the number of sign flipsof Φx, and is usually unknown beforehand. However, the model (1.9) does not requirean exact k . One could either fix it by a small integer (e.g. k = (cid:100) . m (cid:101) ) or start with23 slightly bigger value and reduce it iteratively. We tested GPSP for solving Example 5.1and Example 5.2 under both schemes and corresponding numerical performance does nothave a big difference. For instance, as indicated in Figure 4, where n = 500 , m = 250, η =10 − , s = 5 and (cid:15) = 10 − , we select k ∈ { , , · · · , } and then fix it for GPSP . Evidently,for each case of the flipping ratio r , the results SNR and HD do not vary significantly alongwith k altering. On the other hand, it seems that the CPU time gets increasing when k is rising, which indicates a smaller value of k would be better. Hence, in our numericalexperiments, we pick k = (cid:100) . m (cid:101) . Four state-of-the-art solvers are selected for comparisons, including
PDASC [15],
BIHT [17],
AOP [34] and
PBAOP [16]. Like our method, the last three ones need to specify the truesparsity level s ∗ . In addition, the last two solvers also require the number of the signflips k . To make the comparisons more fairly, we adopt a choice from [34], and set k = (cid:107) sgn(Φx) − c (cid:107) , where x is the solution generated by BIHT . The other parameters for eachmethod are chosen as defaults. Finally, all methods are initialized by x = 0, and theirfinal solutions are normalized to have a unit length. We now apply the five methods into solving two examples under different scenarios. Foreach scenario, we report average results over 500 instances if n ≤ n, m, s ∗ , r, v ), where v onlymakes sense for Example 5.2. In the following numerical comparisons, we shall see theeffect of each solver to these factors by altering one factor while fixing the others. (a) Effect to s ∗ . We first see the recovery performance of each method for solvingExample 5.2. In Figure 5, the blue circles represent the ground-truth signals and red starsstand for the recovered signals obtained by five methods. When s ∗ = 3, all methods succeedin recovering the true signal, while for the other two cases s ∗ = 4 and s ∗ = 5, the signalsobtained by GPSP have better qualities than those produced by the other methods.To proceed more, we employ five method to solve Example 5.1 and increase s ∗ from 2to 10 with fixing ( n, m, r, v ) = (500 , , . , . GPSP gets the highest
SNR , the smallest HD and HE for each s ∗ . The lines of SNR display declining trends, which means the signal is getting harder to recover when it has24ore non-zero components, namely, the sparsity level s ∗ is getting bigger. Apparently, AOP and
PBAOP deliver the similar results. Somehow, the comparison is not fair for
PDASC sinceit does not require the true sparsity level.
BIHT:SNR=19.5231
100 200 300 400
BIHT:SNR=0.93889
100 200 300 400
BIHT:SNR=5.5138
100 200 300 400
AOP:SNR=15.6465
100 200 300 400
AOP:SNR=5.8091
100 200 300 400
AOP:SNR=6.4328
100 200 300 400
PBAOP:SNR=15.4614
100 200 300 400
PBAOP:SNR=5.8091
100 200 300 400
PBAOP:SNR=6.9888
100 200 300 400
PDASC:SNR=21.0222
100 200 300 400
PDASC:SNR=8.5556
100 200 300 400
PDASC:SNR=8.8385
100 200 300 400 GP SP :SNR=25.6915
100 200 300 400 (a) s ∗ = 3 GP SP :SNR=19.7302
100 200 300 400 (b) s ∗ = 4 GP SP :SNR=15.7633
100 200 300 400 (c) s ∗ = 5 Figure 5: Effect to s ∗ for Example 5.2 with ( n, m, r, v ) = (500 , , . , . s * SNR
BIHTAOPPBAOPPDASCGP SP s * HD s * HE Figure 6: Effect to s ∗ for Example 5.1.25 b) Effect to m . To see the effect of each method on the sample size m , we selectit from the range { . , . , · · · , . } n and fix ( n, s ∗ , r, v ) = (500 , , . , . GPSP outperforms the others for solving Example 5.2 since it delivers muchhigher
SNR and lower HD and HE . It is evidently seen that all methods are behaving betteralong with the rising of the sample size m , this is because the signal is getting easier torecover when more samples are available. m/n SNR
BIHTAOPPBAOPPDASCGP SP m/n HD m/n HE Figure 7: Effect to m for Example 5.2. (c) Effect to r . To see the effect of each method to the flipping ratio r , we alter itfrom { , . , . } but fix ( n, m, s ∗ , v ) = (500 , , , . r is, the worse performance of each method, because morecorrect signs are flipped. This can be testified by SNR (resp. HD and HE ) whose median/meanobtained by each method is declining (resp. rising) when r ascends. Once again, GPSP behaves the best because it delivers the highest median of
SNR and the lowest median of HD and HE in each box. Similar results can be observed for Example 5.1 and are omitted here. (d) Effect to v . Note that in Example 5.2, the larger v , the more correlated betweeneach pair of samples (i.e., rows in Φ), which might lead to more difficulty to recover thesignal. To see this, we alter v from { . , . , · · · , . } but fix ( n, m, s ∗ , r ) = (500 , , , . v is, the more difficultthe recovery will be. It can be seen that GPSP is quite robust to v between 0 . . GPSP stay steadily when v ∈ [0 . , . v changes, GPSP always performs the best results among those five methods.26 S NR r=0 r=0.05 r=0.1 HD A1 A2 A3 A4 A500.10.20.3 H E A1 A2 A3 A4 A500.10.20.30.4 A1 A2 A3 A4 A500.10.20.30.4
Figure 8: Effect to r for Example 5.2, where A1-A5 stand for BIHT , AOP , PBAOP , PDASC , GPSP , respectively. v SNR v HD BIHTAOPPBAOPPDASCGP SP v HE Figure 9: Effect to v for Example 5.2.27 e) Effect to n . To see the computational speed of each method, we consider some big-ger values of n from { , , , } with fixing ( m, s ∗ , r, v ) = ( n/ , n/ , . , . GPSP achieves the best time effi-ciency, and the highest recovery accuracy in terms of the highest
SNR , the lowest HD and HE , against the other methods.Table 1: Effect to the bigger values of n .Example 5.1 Example 5.2 n BIHT AOP PBAOP PDASC GPSP BIHT AOP PBAOP PDASC GPSPSNR SNR
HD HD
HE HE
Time (in seconds)
Time (in seconds)5000 0.554 3.668 1.463 0.367 0.292 0.545 3.520 1.452 0.354 0.30710000 4.730 14.352 6.931 1.485 1.030 4.635 14.30 6.944 1.538 0.81015000 10.55 32.30 15.93 3.313 1.647 10.35 31.54 15.62 3.144 1.65220000 20.02 56.68 28.35 5.200 2.592 19.42 55.76 27.99 5.273 2.633
In this paper we have proposed a nonconvex optimization problem (1.9) to process theone-bitCS, in which the double-sparsity constrains the sparsity of the signal and the num-28er of sign flips. To conquer the hardness resulting from the nonconvex and discrete con-straints, we have established necessary and sufficient optimality conditions via the so-called τ -stationarity. These optimality conditions have facilitated the design of a gradient projec-tion subspace pursuit method GPSP , which has been shown to admit global convergence andhighly efficient numerical performance both in computation time and the recovery accuracyof signals.
References [1] A. Beck and Y. C. Eldar. Sparsity constrained nonlinear optimization: Optimalityconditions and algorithms.
SIAM Journal on Optimization , 23(3):1480–1509, 2013.[2] P. T. Boufounos. Greedy sparse signal reconstruction from sign measurements. In , pages 1305–1309. IEEE, 2009.[3] P. T. Boufounos and R. G. Baraniuk. 1-bit compressive sensing. In , pages 16–21. IEEE, 2008.[4] X. Cai, Z. Zhang, H. Zhang, and C. Li. Soft consistency reconstruction: a robust 1-bitcompressive sensing algorithm. In , pages 4530–4535. IEEE, 2014.[5] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signalreconstruction from highly incomplete frequency information.
IEEE Transactions onInformation Theory , 52(2):489–509, 2006.[6] E. J. Candes and T. Tao. Decoding by linear programming.
IEEE Transactions onInformation Theory , 51(12):4203–4215, 2005.[7] C.-H. Chen and J.-Y. Wu. Amplitude-aided 1-bit compressive sensing over noisywireless sensor networks.
IEEE Wireless Communications Letters , 4(5):473–476, 2015.[8] D.-Q. Dai, L. Shen, Y. Xu, and N. Zhang. Noisy 1-bit compressive sensing: modelsand algorithms.
Applied and Computational Harmonic Analysis , 40(1):1–32, 2016.[9] X. Dong and Y. Zhang. A map approach for 1-bit compressive sensing in syntheticaperture radar imaging.
IEEE Geoscience and Remote Sensing Letters , 12(6):1237–1241, 2015. 2910] D. L. Donoho. Compressed sensing.
IEEE Transactions on Information Theory ,52(4):1289–1306, 2006.[11] C. Feng, S. Valaee, and Z. Tan. Multiple target localization using compressive sensing.In
GLOBECOM 2009-2009 IEEE Global Telecommunications Conference , pages 1–6.IEEE, 2009.[12] N. Fu, L. Yang, and J. Zhang. Sub-nyquist 1 bit sampling system for sparse multibandsignals. In , pages 736–740. IEEE, 2014.[13] X. Fu, F.-M. Han, and H. Zou. Robust 1-bit compressive sensing against sign flips. In , pages 3121–3125. IEEE, 2014.[14] J. Haboba, M. Mangia, R. Rovatti, and G. Setti. An architecture for 1-bit localizedcompressive sensing with applications to EEG. In , pages 137–140. IEEE, 2011.[15] J. Huang, Y. Jiao, X. Lu, and L. Zhu. Robust decoding from 1-bit compressivesampling with ordinary and regularized least squares.
SIAM Journal on ScientificComputing , 40(4):A2062–A2086, 2018.[16] X. Huang, L. Shi, M. Yan, and J. A. Suykens. Pinball loss minimization for one-bitcompressive sensing: Convex models and algorithms.
Neurocomputing , 314:275–283,2018.[17] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk. Robust 1-bit compres-sive sensing via binary stable embeddings of sparse vectors.
IEEE Transactions onInformation Theory , 59(4):2082–2102, 2013.[18] J. N. Laska, Z. Wen, W. Yin, and R. G. Baraniuk. Trust, but verify: Fast and accuratesignal recovery from 1-bit compressive measurements.
IEEE Transactions on SignalProcessing , 59(11):5289–5301, 2011.[19] D. Lee, T. Sasaki, T. Yamada, K. Akabane, Y. Yamaguchi, and K. Uehara. Spectrumsensing for networked system using 1-bit compressed sensing with partial randomcirculant measurement matrices. In , pages 1–5. IEEE, 2012. 3020] F. Li, J. Fang, H. Li, and L. Huang. Robust one-bit bayesian compressed sensing withsign-flip errors.
IEEE Signal Processing Letters , 22(7):857–861, 2014.[21] Z. Li, W. Xu, X. Zhang, and J. Lin. A survey on one-bit compressed sensing: Theoryand applications.
Frontiers of Computer Science , 12(2):217–230, 2018.[22] J. Meng, H. Li, and Z. Han. Sparse event detection in wireless sensor networks usingcompressive sensing. In , pages 181–185. IEEE, 2009.[23] J. J. Mor´e and D. C. Sorensen. Computing a trust region step.
SIAM Journal onScientific and Statistical Computing , 4(3):553–572, 1983.[24] A. Movahed, A. Panahi, and G. Durisi. A robust rfpi-based 1-bit compressive sensingreconstruction algorithm. In , pages 567–571. IEEE, 2012.[25] A. Movahed, A. Panahi, and M. C. Reed. Recovering signals with variable sparsitylevels from the noisy 1-bit compressive measurements. In , pages 6454–6458.IEEE, 2014.[26] A. Movahed and M. C. Reed. Iterative detection for compressive sensing: Turbo CS.In , pages 4518–4523.IEEE, 2014.[27] D. Needell and J. A. Tropp. CoSaMP: Iterative signal recovery from incomplete andinaccurate samples.
Applied and Computational Harmonic Analysis , 26(3):301–321,2009.[28] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparse logistic regres-sion: A convex programming approach.
IEEE Transactions on Information Theory ,59(1):482–494, 2012.[29] L. Rencker, F. Bach, W. Wang, and M. D. Plumbley. Sparse recovery and dictionarylearning from nonlinear compressive measurements.
IEEE Transactions on SignalProcessing , 67(21):5659–5670, 2019. 3130] W. Tang, W. Xu, X. Zhang, and J. Lin. A low-cost channel feedback scheme in mmwavemassive mimo system. In , pages 89–93. IEEE, 2017.[31] H. Wang, X. Huang, Y. Liu, H. Sabine Van, and W. Qun. Binary reweighted l1-normminimization for one-bit compressed sensing. In
In Procedding of 8th InternationalJoint Conference on Biomedical Engineering Systems and Technologies , 2015.[32] P. Xiao, B. Liao, and J. Li. One-bit compressive sensing via schur-concave functionminimization.
IEEE Transactions on Signal Processing , 67(16):4139–4151, 2019.[33] J. Xiong and Q. Tang. 1-bit compressive data gathering for wireless sensor networks.
Journal of Sensors , 2014, 2014.[34] M. Yan, Y. Yang, and S. Osher. Robust 1-bit compressive sensing using adaptiveoutlier pursuit.
IEEE Transactions on Signal Processing , 60(7):3868–3875, 2012.[35] Z. Zhou, X. Chen, D. Guo, and M. L. Honig. Sparse channel estimation for massiveMIMO with 1-bit feedback per dimension. In2017 IEEE Wireless Communicationsand Networking Conference (WCNC)