[PDF] Separable Joint Blind Deconvolution and Demixing

Abstract

Blind deconvolution and demixing is the problem of reconstructing convolved signals and kernels from the sum of their convolutions. This problem arises in many applications, such as blind MIMO. This work presents a separable approach to blind deconvolution and demixing via convex optimization. Unlike previous works, our formulation allows separation into smaller optimization problems, which significantly improves complexity. We develop recovery guarantees, which comply with those of the original non-separable problem, and demonstrate the method performance under several normalization constraints.

Full PDF

11 Separable Joint Blind Deconvolution and Demixing

Dana Weitzner and Raja Giryes

Abstract —Blind deconvolution and demixing is the problemof reconstructing convolved signals and kernels from the sumof their convolutions. This problem arises in many applications,such as blind MIMO. This work presents a separable approach toblind deconvolution and demixing via convex optimization. Unlikeprevious works, our formulation allows separation into smalleroptimization problems, which signiﬁcantly improves complexity.We develop recovery guarantees, which comply with those of theoriginal non-separable problem, and demonstrate the methodperformance under several normalization constraints.

Index Terms —Blind deconvolution, demixing, low-rank.

I. I

NTRODUCTION C ONSIDER the task of restoring signals from a mixtureof their bilinear measurements, involving unknown en-vironment parameters. This problem is referred to as blinddeconvolution and demixing, where signals are convolved withunknown kernels. It appears in various domains, e.g., audioand image processing [1]–[3] and wireless communications[4], in which it is expected to play a central role in IoT [5].In the problem of joint deconvolution and demixing [6], thegoal is to reconstruct the signals x s and kernels w s from y = (cid:88) s ∈ [ S ] x s (cid:126) w s , (1)extending blind deconvolution to a sum of convolutions. Likethe classic blind deconvolution problem [7], this problemis ambiguous without further constraints on the signals andkernels (more on the ambiguities of one dimensional blinddeconvolution can be found in [8], [9]). Common assumptionsinclude peakiness (e.g., [10], [11]), sparsity (e.g., [12], [13]),and subspace priors (e.g., [14]).Our problem (1) was solved using convex Nuclear normminimization, exploiting the rank-1 structure of the liftedproblem [6], [15]–[17], assuming the subspace prior suggestedby [14]. Probabilistic linear guarantees for the relationshipbetween the amount of measurements in y , and the amountof signal-kernel pairs, S , were derived for this method [17].The convex Nuclear minimization approach allows thederivation of theoretical guarantees with minimal assump-tions, though combined with the common lifting procedure itmight result in high computational complexity. Non-convexapproaches were also explored in the context of blind de-convolution [18] and demixing [19]–[24], with signiﬁcantlower computation time. Though the theoretical result in [18]is in line with those achieved in the convex approach, theexpansions to demixing via the non-convex methods are still ©2021 IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promotionalpurposes, creating new collective works, for resale or redistribution to serversor lists, or reuse of any copyrighted component of this work in other works. with quadratic guarantees. A thorough review of nonconvexalgorithms in a broader context of general matrix completionproblems can be found in [25].A related variant of (1) considers a scenario where S sourcestransmit signals to R receivers. Each path is modeled by adifferent, unknown convolution kernel, yielding y r = (cid:88) s ∈ [ S ] x s (cid:126) w rs , r ∈ [ R ] , (2)where y r is the measurement of the r th out of R receivers.This is equivalent to the blind MIMO model presented in[26]. Their method also uses Nuclear norm minimizationemploying the rank-1 structure of the lifted problem, assumingthat the signals reside in a known subspace. This leads to anoptimization problem on a matrix consisting of rank-1 blocks,which do not share variables. However, the solution cannot beobtained separately for each block and requires solving the fulloptimization problem of dimension RK × SN , where K, N are the dimensions of the signals and kernels subspaces.

Contribution.

This work assumes that all signals lay in thesame subspace (as do all kernels). This allows solving separaterank- S problems instead of a single, large rank- problem,which reduces the computational complexity signiﬁcantly. Weshow linear performance in the reconstruction of each rank- S matrix and develop theoretical guarantees that match thoseof the rank-1 case [17]. The advantage of our solution is itsimproved computational complexity; Instead of a single largeproblem, we solve a few small optimization problems withbetter variables to degrees of freedom ratio. See Section III.Given the recovered rank- S matrix, the standard form toretrieve the originating vectors from it is by SVD [14].However, in the rank- S case, there is an ambiguity of thespanning base, i.e., the singular vectors are generally not theoriginating vectors. To overcome this, we suggest an algorithmthat uses the fact that the signals are shared between allreceivers, and can reconstruct all signals and kernels undersome normalization assumptions, to be described in SectionIII-B. Notations.

Unless stated otherwise, we use the follow-ing norm notations throughout our derivations. (cid:107)·(cid:107) withoutsubscript is the operator norm of the appropriate subject: (cid:107)A(cid:107) = (cid:107)A(cid:107) F → for sampling operators; (cid:107) A (cid:107) = (cid:107) A (cid:107) → formatrices. We use [ S ] to describe the set of integers , . . . , S . x s / y rs denotes the s column of a matrix X / Y r . To denote aninequality up to a constant depending on ω we use (cid:46) ω .II. T HE PROBLEM SETUP

Consider that R convolutions-sums are measured, sharingthe signals while the kernels are different per source-receiverpath. Let the convolution kernel matrix for each sensor be a r X i v : . [ ee ss . SP ] F e b W r ∈ C L × S , and let X ∈ C L × S be S signals of length L .Assume that the signals and kernels reside in a low ( N, K ) dimensional subspaces spanned by the columns of the knownmatrices B , C , such that X = CM, M ∈ C N × S , (3) W r = BH r , H r ∈ C K × S , (4)where B ∈ C L × K is assumed to have orthogonal columns,while the entries of C ∈ C L × N are independent and follow astandard circular-symmetric normal distribution. The standardsubspace prior [14] can be achieved in our framework withan appropriate choice of coding matrices. Therefore, our as-sumption is reasonable and holds no limitation to the commonapproach. Denote the ”column-wise” cyclic convolution of X and W r as X (cid:126) W r = ˜ Y r ∈ C L × S . Then the measurement ofeach sensor reads as y r = (cid:88) s x s (cid:126) w rs + e = (cid:88) s ˜ y rs + e, s ∈ [ S ] , (5)where e is additive noise. Let F be the L dimensional DFTmatrix, diag( f l ) the diagonal matrix consisting of the l thcolumn of F , f l , and A l = √ LB ∗ F ∗ diag( f l ) ¯ F ¯ C . Then fora given signal-kernel pair, the l th measurement component inFourier domain is given by [27] ˆ˜ y rsl (cid:44) ( F ( x s (cid:126) w rs )) l = (cid:104) A l , m s h ∗ rs (cid:105) , (6)Thus, the l th Fourier entry of the signal-kernel pairs sum is ˆ y rl (cid:44) ( F ( y r )) l = (cid:88) s ˆ˜ y rsl + ˆ e l = (cid:88) s (cid:104) A l , m s h ∗ rs (cid:105) + ˆ e l = (cid:104) A l , (cid:88) s m s h ∗ rs (cid:105) + ˆ e l = (cid:104) A l , M H ∗ r (cid:105) + ˆ e l . (7)The complete linear measurement operator A : C N × K → C L in Fourier domain is therefore deﬁned by A ( · ) = [ (cid:104) A , ·(cid:105) , . . . , (cid:104) A L , ·(cid:105) ] T , (8)which leads to writing the measured vector at receiver r as ˆ y r = A ( M H ∗ r ) + ˆ e = A ( Z r ) + ˆ e, (9)where Z r (cid:44) M H ∗ r . The problem of demixing convolvedsignals and kernels is hence the reconstruction of the signalssubspace coefﬁcients vectors M and the convolution kernelssubspace coefﬁcients vectors H r from the Fourier transformof the measurement vector y r .III. S EPARABLE OPTIMIZATION FOR

SVD

BASED JOINTDECONVOLUTION AND DEMIXING

Casting the problem as a matrix recovery problem, as in(9), allows the use of rank minimization algorithms since Z r is known to be a rank- S matrix. Yet, unlike the rank- case, therecovery of the matrix is insufﬁcient for the reconstruction ofthe actual signals and kernels. As we further explain hereafter,this is due to a wider ambiguity in the factorization of thematrix, that is not resolved in the SVD process, which is thestandard tool for vector recovery in the rank- case [14].Our framework considers a model in which S sources trans-mit signals to R receivers, while each channel is represented by a different convolution kernel. Thus, each receiver measuresthe mixture of S convolutions, y r . Our method has two stages:1. Matrix recovery : reconstruct Z r from y r via Nuclear normminimization at each receiver, separately.2. Vector recovery : estimate

M, H r from Z r = M H ∗ r . Thisstep uses the estimated Z r from all receivers and requiressolving a quadratic equation system of S variables, regardlessof L, K , and N , which may be generally much larger. A. Matrix recovery

Assume that all signals and kernels have the same codingmatrix. Thus, for each receiver, we may recover Z r ∈ C K × N ,which has rank S , by solving min Z (cid:107) Z (cid:107) ∗ s.t. (cid:107)A ( Z ) − ˆ y r (cid:107) ≤ τ. (10)Computationally, this is equivalent to the standard convexapproach to blind deconvolution, with no demixing [14].For comparison, [17] deal only with the case of one receiver( R = 1 ) so the total degrees of freedom (DoF) are S ( K + N ) .In the noiseless case, they solve an optimization problem for S rank-1 matrices, i.e. with SKN variables: min Z s (cid:88) s (cid:107) Z s (cid:107) ∗ s.t. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:88) s A s ( Z s ) − ˆ y (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ τ. (11)Compared to this approach, we have more measurements persignal ( R ≥ S ); This trades off the number of measurementsand the ability to solve smaller problems.Ahmed [26] presented a convex approach to blind MIMO,which is more similar to our case in the sense that he too hasmultiple receivers. Yet, he solves the problem directly (i.e. withrank one blocks) so he may have R < S . When R = 1 , this isexactly (11). When R ≥ S , this is equivalent to our problem.It has SN + RSK degrees of freedom ( S signals of length N and S kernels of length K in each of the R receivers, withoutthe normalization implications). He solves a similar problemto the former case, only for a sum of S rank-1 matrices of size RK × N , meaning with SRKN variables. As they also note,although the different matrices in the sum share no commonvariables, the problem cannot be separated.In our setup we use the shared information to be ableto separate the problems, but then the information is nolonger shared and there is no sample complexity gain. The”full” approach of the same scheme [26], is on the otherside of the trade-off - they show better sample complexityfor more receivers but solve a bigger problem. Note thoughthat they show it only empirically, where the dependencydecreases quite quickly with the increase in R . They alsoconjecture a linear sample complexity bound (which is whatwe formally prove). It would be interesting to think of anapproach in the middle of the trade-off, where we use ourshared subspace assumption in a way that has provably bettersample complexity. We leave this to future work.Table I summarizes the complexity of each method. Foreach of the methods mentioned above, we detail the desiredDoFs, the actual number of variables in the problem, the num-ber of sub-problems (” TABLE I: Complexity comparison of the different methods.

Method [17] [26] Ours

DoF S ( K + N ) S ( RK + N ) S ( RK + N ) Opt.vars

SKN SRKN KN R ext. op. - - S Total

SKN SRKN R × KN + S Ratio

KNK + N RKNRK + N RKN + S S ( RK + N ) R R = 1 R ≥ R ≥ S between the optimization variables and the desired DoF. Thebottom line includes the range of receivers regime handledby each method. Notice that we have fewer optimizationvariables, as we keep our matrices small. Moreover, we alsohave a better (lower) ratio between the optimization variablesand the degrees of freedom. B. Vector recovery: ﬁnd basis transformation

Once Z r is restored, we use its SVD decomposition toreconstruct the originating vectors (the signals and kernelscoefﬁcients). Thus Z r = U Λ V ∗ (cid:44) ˜ M r ˜ H ∗ r , where ˜ m ri = √ Λ ii u i (similarly for ˜ h ri ). In the rank-1 case, this is trivial: Z r = mh ∗ r = ˜ m r σσ − ˜ h ∗ r for any σ (cid:54) = 0 . Thus, theestimated vectors are m = ˜ m r , h r = ˜ h r up to scale and sign.These ambiguities of the rank-1 case become an ambiguity ofspanning base in higher ranks, which can be expressed by M H ∗ r = ˜ M r T r ( T r ) − ˜ H ∗ r , (12)where T r ∈ C S × S is the basis transformation from thecolumns of ˜ M r to the columns of the original M .To restore the original vector pairs, we measure R convolutions-sums, assuming that M is constant in all ofthem while only H r is changing. This corresponds to theblind MIMO scheme in [26] and allows us to pose enoughconstraints to determine T r , ∀ r ∈ [ R ] , as follows: To exploitthe fact that M is shared, we want to express the relation in(12) with the same transformation matrix T , for all r ∈ [ R ] .Thus, we choose an arbitrary r ∈ [ R ] . The relation betweenthe transformation matrices at different receivers is given by T r = ( T rr ) − T r , (13)where T rr (cid:44) ˜ M + r ˜ M r and + is the pseudo-inverse. Notethat we can always invert ˜ M ∗ r ˜ M r as ˜ M r has orthogonalcolumns (due to SVD). To recover the original coefﬁcients, itis sufﬁcient to ﬁnd the S entries of T r . Note that it satisﬁes M = ˜ M r (cid:122) (cid:125)(cid:124) (cid:123) ˜ M r ( T rr ) − T r , (14) H r = ˜ H r ( T rr ) ∗ (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) G r ( T r ) −∗ . (15)This equations system allows adding constraints on the samevariable basis transformation matrix, T r , by increasing R .Note that we do not know M and H r . Thus, we need to addsome constraints on them to be able to recover T r . One-sided constraints.

Assuming normalized kernel coef-ﬁcients (a standard assumption due to the scale ambiguity ofrank-1 case) and considering (14) leads to the equations (cid:107) h rs (cid:107) = 1 = (( T r ) −∗ s ) ∗ G ∗ r G r ( T r ) −∗ s , r ∈ [ R ] , (16)where T r s is the s th column of T r . Having the originalcoefﬁcients vectors equally normalized means that the equa-tion system is column separable w.r.t T r . Thus, we get R equations for the S variables of T r s for any s ∈ [ S ] .Posing l norm constraints on the original signals or kernelscoefﬁcients leads to a system of quadratic equations, whichhas an exponential number of solutions even in the fullydetermined case; A fully determined system with S equationsand S variables lead to S solutions. When R = S , thesystem has at least S − solutions (we are agnostic to globalphase ambiguity). While this might sufﬁce when S = 2 , itis insufﬁcient for more signal-kernel pairs. Empirically weobserved that solving this equation system by optimizationfor the entire matrix T r , i.e., simultaneously solving for allcolumns of T r , produces only the S correct results even for R = S + 1 , despite the exponential amount of valid solutions.This was resolved using the Matlab non-linear equation solverfsolve with the trust region Dogleg algorithm. Two-sided constraints.

Posing the normalization con-straints on both M and H r , and using it with (14) and (15),leads to the equation system (cid:107) m s (cid:107) = 1 = ( T r s ) ∗ ˜ M ∗ r ˜ M r T r s , (17) (cid:107) h rs (cid:107) = 1 = (( T r ) −∗ s ) ∗ G ∗ r G r ( T r ) −∗ s . (18)This adds an additional constraint. However, although eachsubproblem ((17), (18)) is column separable, the total systemis not, as the variables in each of the separate sets are theadjugate matrices of one another (up to the determinant factor).In fact, we are looking for the correct set of S solutions, outof (cid:0) S − S (cid:1) possible column sets that can solve (17), which canbe jointly inverted to the correct set of solutions of (18). Thissetup appears to impose ”hidden” constraints. We conjecturethat they resolve the ambiguity of the quadratic equations for R = S , as we empirically show in Section V. C. Matrix recovery guarantees

We turn now to provide theoretical guarantees for therecovery of Z r . Jung et al. [17] were the ﬁrst to present alinear guarantee for the uniqueness of recovery when solving(11). Ahmed [26], which presented a convex approach to blindMIMO, conjectured that also in his case L is linear w.r.t. max( R, S ) , without proof. Such guarantees [6], [14], [17] relyon the fact that the retrieved matrix is rank-1 , and, thus, donot apply in our case. The following provides a guarantee forthe reconstruction of higher rank matrices (by solving (10)),with the same assumptions on B, C and a similar linear result.

Theorem 1.

Let ω ≥ and let y ∈ C L be given by (5) , with (cid:107) e (cid:107) ≤ τ . Assume that L (cid:38) ω S ( Kµ log( Kµ ) + N µ H ) log L, (19) then with probability of at least − O ( L − ω ) the minimizer ˆ X of (10) satisﬁes (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) F (cid:46) ω τ (cid:115) S max (cid:26) , SKµ NL (cid:27) log ( L ) . (20)Compared to [17], we have the same lower bound on L but effectively more measurements per signal (all y -vectors).This is part of the tradeoff that enables us to have smallercomputational problems, as discussed in Section III-A.IV. P ROOF OF THE MAIN THEOREM

The structure of our proof is similar to the ones in [6], [14],[17]. We prove sufﬁcient conditions for recovery, assumingthe existence of an inexact dual certiﬁcate: We show that ourmeasurement operator fulﬁlls a Local Isometry Property (see(23)) on the relevant spaces (those deﬁned in Def. 2 and 4), andthen construct the dual certiﬁcate using the Golﬁng Scheme.

A. Preliminary Deﬁnitions

We start with preliminary deﬁnitions. The sgn function isdeﬁned in the functional sense, i.e. sgn( A ) = U diag (sgn( σ ) , . . . , sgn( σ r )) V ∗ , (21)where A = U Σ V ∗ is the SVD of A and { σ i } are the singularvalues of A . This differs from the deﬁnition in [17] and isimportant for proving the results for the reconstruction ofmatrices with rank exceeding one. We denote by ˆ H Λ ˆ M ∗ theSVD decomposition of X = HM ∗ , and deﬁne M = (cid:8) Z | Z ∈ C K × N (cid:9) , (22)to be the space of matrices of the appropriate size. Solution space.

We now turn to deﬁne the solution space.(Note that some other works refer to it as the tangent space.)

Deﬁnition 2 (Solution space) . Let ˆ H Λ ˆ M ∗ be the SVD de-composition of X = HM ∗ . Given T M = (cid:8) V ˆ M ∗ | V ∈ C K × S (cid:9) , T H = (cid:8) ˆ HU ∗ | U ∈ C N × S (cid:9) , the solution space is deﬁned as T = T M + T H . Local Isometry Property (LIP).

An operator A satisﬁesthe LIP with a constant δ if ∀ X ∈ T (1 − δ ) (cid:107) X (cid:107) F ≤ (cid:107)A ( X ) (cid:107) ≤ (1 + δ ) (cid:107) X (cid:107) F . (23) Partition of measurements and incoherence.

In ourproofs, we also use extended spaces that are slightly largerthan the solution space. These spaces are induced by themeasurements partitioning, required for the Golﬁng scheme[28]. Deﬁne the kernels subspace matrix coherence parameter µ = LK max l ∈ [ L ] (cid:107) b l (cid:107) , (24)where b l is the l th column of B T . Notice that ≤ µ ≤ LK .Using the Golﬁng Scheme [28] requires a division of the L measurements into P non-overlapping sets. We denote the indexing of each set by Γ p where p ∈ [ P ] . Thus, ∪ p Γ p = [ L ] .Each set is associated with its linear measurement operator A p ( Z ) = {(cid:104) A l , Z (cid:105)} l ∈ Γ p . (25)For convenience of writing, we deﬁne T p = LQ (cid:80) l ∈ Γ P b l b ∗ l , (26) S p = T − p = (cid:18) LQ (cid:80) l ∈ Γ P b l b ∗ l (cid:19) − , (27)where Q (cid:44) LP . To guarantee the convergence of the GolﬁngScheme with high probability, the partition must be chosensuch that T p ≈ I K for all p ∈ [ P ] . Thus, we require that max p ∈ [ P ] (cid:107) T p − I (cid:107) ≤ ν, (28)for a small enough ν . Moreover, the partition needs to be ω admissible in the following sense. Deﬁnition 3.

Let ω ≥ and let { Γ p } p ∈ P be a partition of [ L ] .The set { Γ p } p ∈ P is said to be ω -admissible if the followingconditions are satisﬁed: Q ≤ | Γ p | ≤ Q for all p ∈ P , where Q = LP

2) (28) is fulﬁlled with ν = log(8˜ γ √ S ) ≤ P ≤ log(8˜ γ √ S ) , where ˜ γ = 2 (cid:114) ω max (cid:8) , SKµ NL (cid:9) log( L + SKN ) . The existence of such partition is guaranteed by Lemma 3in [17]. For a ﬁxed ω -admissible partition we can deﬁne µ H = L max (cid:8) max l ∈ L,s ∈ S | b ∗ l h s | , max l ∈ L,s ∈ S,p ∈ P | b ∗ l S p h s | (cid:9) , (29)where ≤ µ H ≤ ( ) Kµ (cid:46) L . Now we can deﬁne theextended solution space for each p . Deﬁnition 4 (Extended solution space) . Fix p ∈ [ P ] . Theextended solution space is deﬁned as T p = T + T S p H , where T S p H = (cid:8) S p ˆ HU ∗ | U ∈ C N × S (cid:9) . (30) Orthogonal projection operators.

We can deﬁne the or-thogonal projection operator onto the solution space P T by P T ( Z ) = P ˆ H Z + Z P ˆ M − P ˆ H Z P ˆ M (31)and the orthogonal projection operator onto the complemen-tary space T ⊥ by P T ⊥ ( Z ) = ( I − P T ) Z = ( I K − P ˆ H ) Z ( I N − P ˆ M ) , (32)where P ˆ H = ˆ H ˆ H ∗ , P ˆ M = ˆ M ˆ M ∗ , (33) T ⊥ = span { vu ∗ | u ⊥ { h s } s ∈ [ S ] , v ⊥ { m s } s ∈ [ S ] } . (34) B. Sufﬁcient conditions for recovery

We ﬁrst ﬁnd sufﬁcient conditions for recovery in the pres-ence of noise.

Lemma 5.

Suppose that A satisﬁes the δ -local isometryproperty on T and set γ = (cid:107)A(cid:107) . Furthermore, suppose thatthere is Y = A ∗ z for some z ∈ C L such that (cid:107)P T ( Y ) − sgn( X ) (cid:107) F ≤ α (35) (cid:107)P T ⊥ ( Y ) (cid:107) ≤ β (36) where α, β ≥ are constants such that − β − αγ √ − δ ≥ , α ≤ and √ − δ ≥ . If ˆ X is a minimizer of min X (cid:107) X (cid:107) ∗ s.t (cid:107)A ( X ) − ˆ y (cid:107) ≤ τ, (37) then (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) F (cid:46) τ (1 + γ )(1 + (cid:107) z (cid:107) ) . (38) Proof.

Set V = ˆ X − X . We want to bound (cid:107) V (cid:107) F ≤(cid:107)P T ( V ) (cid:107) F + (cid:107)P T ⊥ ( V ) (cid:107) F . Since ˆ X is the minimizer of (37)we have (cid:107)A ( V ) (cid:107) ≤ (cid:13)(cid:13)(cid:13) A ( ˆ X ) − ˆ y (cid:13)(cid:13)(cid:13) + (cid:107) ˆ y − A ( X ) (cid:107) ≤ τ. (39)Combined with the local isometry property (23), γ being theoperator norm of A and the triangle inequality we get (cid:107)P T ( V ) (cid:107) F ≤ √ − δ (cid:107)A ( P T ( V )) (cid:107) ≤ √ − δ ( (cid:107)A ( P T ⊥ ( V )) (cid:107) + (cid:107)A ( V ) (cid:107) ) ≤ √ − δ ( γ (cid:107)P T ⊥ ( V )) (cid:107) + 2 τ ) . (40)To upper bound (cid:107)P T ⊥ ( V ) (cid:107) , we choose Z ∈ T ⊥ suchthat (cid:107) Z (cid:107) ≤ − β and (cid:104) Z , V (cid:105) F = (1 − β ) (cid:107)P T ⊥ ( V ) (cid:107) ∗ .This is possible due to the duality of the norms (cid:107)·(cid:107) → and (cid:107)·(cid:107) ∗ . Note that (cid:107) sgn( X ) + P T ⊥ ( Y ) + Z (cid:107) ≤ since sgn( X ) ⊥ P T ⊥ ( Y ) + Z , the mentioned bound on (cid:107) Z (cid:107) ,(36) and (cid:107) sgn( X ) (cid:107) ≤ . Using this duality again, we get (cid:107) X + V (cid:107) ∗ = sup Z ∈ C K × N , (cid:107) Z (cid:107)≤ |(cid:104) Z, X + V (cid:105) F | (41) ≥ Re ( (cid:104) sgn( X ) + P T ⊥ ( Y ) + Z , X + V (cid:105) F )= (cid:107) X (cid:107) ∗ + Re ( (cid:104)P T ⊥ ( Y ) + Z , X (cid:105) F )+ Re ( (cid:104) sgn( X ) + P T ⊥ ( Y ) , V (cid:105) F ) + Re ( (cid:104) Z , V (cid:105) F )= (cid:107) X (cid:107) ∗ + Re ( (cid:104) sgn( X ) + P T ⊥ ( Y ) , V (cid:105) F )+(1 − β ) (cid:107)P T ⊥ ( Y ) (cid:107) ∗ = (cid:107) X (cid:107) ∗ + (1 − β ) (cid:107)P T ⊥ ( V ) (cid:107) ∗ + Re ( (cid:104) sgn( X ) − P T ( Y ) , V (cid:105) F + (cid:104) Y, V (cid:105) F ) , where the ﬁrst equality is due to (cid:107) X (cid:107) ∗ = (cid:104) sgn( X ) , X (cid:105) F (see (21)) and the third equality follows Re ( (cid:104)P T ⊥ ( V ) + Z , X (cid:105) F ) = 0 . Notice, that our deﬁnition of the sign functionin (21), which differs from the one in [17], is essential for thisstep. The last step is due to Y = P T ( Y ) + P T ⊥ ( Y ) . We now examine the term Re ( (cid:104) sgn( X ) − P T ( Y ) , V (cid:105) F ) inthe last line of (41). By Cauchy-Schwarz, the upper bound for (cid:107)P T ( V ) (cid:107) F in (40) and the assumption in (35), we get Re ( (cid:104) sgn( X ) − P T ( Y ) , V (cid:105) F ) ≥ − (cid:107) sgn( X ) − P T ( Y ) (cid:107) F (cid:107)P T ( V ) (cid:107) F ≥ − α √ − δ ( γ (cid:107)P T ⊥ ( V )) (cid:107) + 2 τ ) (42)where in the ﬁrst inequality we have also used the fact that sgn( X ) − P T ( Y ) ∈ T . To bound the term Re ( (cid:104) Y, V (cid:105) F ) inthe last line of (41), note that by Cauchy-Schwarz and (39), Re ( (cid:104) Y, V (cid:105) F ) = Re ( (cid:104)A ∗ ( z ) , V (cid:105) F )= Re ( (cid:104) z, A ( V ) (cid:105) ) ≥ − (cid:107) z (cid:107) τ (43)Putting (42) and (43) back into (41), we get ˆ (cid:107) X (cid:107) ∗ = (cid:107) X + V (cid:107) ∗ ≥ (cid:107) X (cid:107) ∗ + (cid:18) − β − αγ √ − δ (cid:19) (cid:107)P T ⊥ ( V ) (cid:107) ∗ − τ (cid:18) (cid:107) z (cid:107) + α √ − δ (cid:19) . (44)Since ˆ X is the minimizer of the Nuclear norm, we have ˆ (cid:107) X (cid:107) ∗ ≤ (cid:107) X (cid:107) ∗ and therefore (cid:107)P T ⊥ ( V ) (cid:107) ∗ ≤ τ (cid:18) (cid:107) z (cid:107) + α √ − δ (cid:19)(cid:18) − β − αγ √ − δ (cid:19) . (45)Considering our assumptions on the constants we get (cid:107)P T ⊥ ( V ) (cid:107) F (cid:46) τ ( (cid:107) z (cid:107) + 1) . Finally, we can bound (cid:107) V (cid:107) F ≤ (cid:107)P T ( V ) (cid:107) F + (cid:107)P T ⊥ ( V ) (cid:107) F (46) (cid:46) (1 + γ ) (cid:107)P T ⊥ ( V ) (cid:107) F + τ (cid:46) τ (1 + γ )( (cid:107) z (cid:107) + 1) The error bound requires us to bound also γ , the operatornorm of the measurement operator A . This is done in thefollowing lemma, with its proof in App. B-A. Lemma 6 (Operator norm bound) . Let ω ≥ . Then withprobability of at least − L − ω , (cid:107)A(cid:107) ≤ (cid:26) , (cid:114) N KL µ (cid:27)(cid:112) log( L + SKN ) (47) C. Local Isometry Property

We now show that the measurement operators A , A p ((8),(25)) act as approximate isometries on T , T p (Def. 2, 4). Theorem 7.

Fix ω ≥ . Suppose that Q ≥ C ω δ − S (cid:0) Kµ log( L ) log ( Kµ ) + N µ H (cid:1) , (48) then with probability −O ( L − ω ) the operator A satisﬁes (23) (LIP), and for all p ∈ [ P ] , every Y ∈ T p = T + T S p H fulﬁlls (1 − δ ) (cid:13)(cid:13)(cid:13) T p Y (cid:13)(cid:13)(cid:13) F ≤ LQ (cid:107)A p ( Y ) (cid:107) ≤ (1 + δ ) (cid:13)(cid:13)(cid:13) T p Y (cid:13)(cid:13)(cid:13) F , (49) where T / p denotes the unique positive, self-adjoint matrixwhose square is equal to T p . To prove this we need to deﬁne the following norms.

Deﬁnition 8.

For any vector z ∈ C K and matrix Z ∈ C K × N : (cid:107) z (cid:107) B = √ L max l ∈ [ L ] | z ∗ b l | , (cid:107) Z (cid:107) B = √ L max l ∈ [ L ] (cid:107) Z ∗ b l (cid:107) . (50)Our strategy is to use the following proposition, proven inApp. A and based on Th. 16 regarding suprema of chaos pro-cesses. This involves the γ functional, a geometric quantityintroduced by Talagrand [29] and deﬁned here in Def. 15, andthe distance d B ( Z ) = sup Z ∈Z (cid:107) Z (cid:107) B (similarly for the Frobeniusnorm). This is further discussed in App. A. Proposition 9.

Let

Z ⊂ M be a symmetric set and E = γ ( Z , (cid:107) · (cid:107) B ) √ Q (cid:18) γ ( Z , (cid:107) · (cid:107) B ) √ Q + d F ( Z ) (cid:19) V = d B ( Z ) √ Q (cid:18) γ ( Z , (cid:107) · (cid:107) B ) √ Q + d F ( Z ) (cid:19) U = 1 Q d B ( Z ) Then for t ≥ and all p ∈ P , P (cid:18) sup Z ∈Z (cid:12)(cid:12)(cid:12)(cid:12) LQ (cid:107)A p ( Z ) (cid:107) − (cid:13)(cid:13)(cid:13) T / P Z (cid:13)(cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12) ≥ c E + t (cid:19) ≤ exp (cid:18) − c min (cid:18) t V , tU (cid:19)(cid:19) (51) P (cid:18) sup Z ∈Z (cid:12)(cid:12)(cid:12)(cid:12) LQ (cid:107)A ( Z ) (cid:107) − (cid:107) Z (cid:107) F (cid:12)(cid:12)(cid:12)(cid:12) ≥ c E + t (cid:19) ≤ exp (cid:18) − c min (cid:18) t V , tU (cid:19)(cid:19) (52) given that { Γ P } p ∈ P is an ω -admissible partition of [ L ] . We now apply the proposition on the appropriate sets forproving Th. 7. We deﬁne the solution subspaces B M = (cid:8) X ∈ T M | (cid:107) X (cid:107) F ≤ (cid:9) (53) B H = (cid:8) X ∈ T H | (cid:107) X (cid:107) F ≤ (cid:9) B S p H = (cid:8) X ∈ T S p H | (cid:107) X (cid:107) F ≤ (cid:9) . Note, that these are sets of rank- S matrices. The LIP in Th. 7follows by applying Proposition 9 on the set (in place of Z ) W = B M + B H (54)and in a similar way we get (49) by applying it on the set W p = W + B S p H . (55)Thus, we only need to estimate the γ -functional, d B ( Z ) and d F ( Z ) on these sets, which is provided by the followinglemma, with its proof in App. B-C. Lemma 10.

Suppose that X = W or X = W p for some p ∈ [ P ] . Then d F ( X ) ≤ , (56) d B ( X ) ≤ √ Kµ, (57) γ ( X , (cid:107)·(cid:107) B ) (cid:46) (cid:113) S ( Kµ log( L ) log ( Kµ ) + N µ H ) . (58) Now we can prove Th. 7. Proof of Th. 7.

Fix p ∈ [ P ] . Using Lemma 10 and choosingthe constant C ω in (48) large enough we get E ≤ δ c , V ≤ δ √ c ω log L and U ≤ δc ω log L , where X ⊂ W p . The inequality(51) in Proposition 9 with t = δ shows that (49) in Th. 7holds with probability of −O ( L − ω ) (same holds for (52) and(23), with X ⊂ W ). Replacing ω by ω + 1 and using a unionbound argument shows that (49) and (23) are satisﬁed for all p ∈ [ P ] with a probability of at least − ( P + 1) O ( L − ω − ) =1 − P +1 L O ( L − ω ) = 1 −O ( L − ω ) , which ﬁnishes the proof. D. Constructing the Dual Certiﬁcate.

We now turn to prove that the assumptions in Lemma 5hold. As in previous works, we construct the dual certiﬁcatevia the Golﬁng Scheme, presented in [28]. Thus, we build thedual certiﬁcate with the iterative process Y = 0 ,Y p = Y p − + LQ ( A p ) ∗ A p S p (sgn( X ) − P T ( Y p − )) , where the ﬁnal certiﬁcate Y is given by Y = Y P = P (cid:88) p =1 LQ ( A p ) ∗ A p S p W p − , (59)and W p (cid:44) sgn( X ) − P T ( Y p ) . (60)First, we show that Y ∈ Range( A ∗ ) . Recall that A p isdeﬁned in (25) by taking only the measurements indexed by l ∈ Γ p , while having zeros in the other entries. Thus, ( A p ) ∗ A p S p W p − = A ∗ A p S p W p − (61)and we can write Y as Y = A ∗ P (cid:88) p =1 LQ A p S p W p − , (62)which is clearly in Range( A ∗ ) . It thus remains to show thatassumptions (35), (36) in Lemma 5 hold.

1) Solution Space Bound:

We now show that condition (35)holds. Despite the different deﬁnitions and rank, the proof in[17] is adequate also in our case, with very slight modiﬁca-tions. We start with stating a private case of a technical lemmaand continue to prove this section’s main claim.

Lemma 11 (a private case of Lemma 30 in [17]) . Let ν ≤ .Then for all p ∈ [ p ] , (cid:13)(cid:13)(cid:13) I − T / p (cid:13)(cid:13)(cid:13) ≤ (63) (cid:107) ( I − S p ) X (cid:107) F ≤ (cid:107) X (cid:107) F (64) (cid:107) S p X (cid:107) F ≤ (cid:107) X (cid:107) F (65)This allows us to prove the following lemma. Lemma 12.

Suppose that A p satisﬁes the δ -local isometryproperty on T p with δ = for all p ∈ [ P ] . Then, for all p ∈ [ P ] , (cid:107) W p (cid:107) F ≤ − p √ S (66) and, in particular, if P ≥ log(8 γ √ S ) , (cid:107) sgn( X ) − P T ( Y ) (cid:107) F ≤ γ . (67) Proof.

By (28) and the triangle inequality we have (1 − ν ) (cid:107) X (cid:107) F ≤ (cid:13)(cid:13)(cid:13) T / p X (cid:13)(cid:13)(cid:13) F ≤ (1 + ν ) (cid:107) X (cid:107) F . Combined with the δ -local isometry property on T p in (49), (1 − ν ) (1 − δ ) (cid:107) X (cid:107) F ≤ LQ (cid:107)A p ( X ) (cid:107) F ≤ (1+ ν ) (1+ δ ) (cid:107) X (cid:107) F for all X ∈ T p . With δ = ν = , this implies (cid:12)(cid:12)(cid:12)(cid:12) LQ (cid:107)A p ( X ) (cid:107) − (cid:107) X (cid:107) F (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:107) X (cid:107) F for all X ∈ T p , which is equivalent to (cid:13)(cid:13)(cid:13)(cid:13) P T p − LQ P T p ( A p ) ∗ A p P T p (cid:13)(cid:13)(cid:13)(cid:13) ≤ . (68)Notice, that by its deﬁnition in (60), we have that W p = W p − − LQ P T ( A p ) ∗ A p S p W p − (69)and also that (cid:107) W p − − P T ( X ) (cid:107) F ≤ (cid:107) W p − − P T p ( X ) (cid:107) F forall X ∈ M since W p − ∈ T and T ⊂ T p . This implies that (cid:107) W p (cid:107) F ≤ (cid:13)(cid:13)(cid:13)(cid:13) W p − − LQ P T p ( A p ) ∗ A p S p W p − (cid:13)(cid:13)(cid:13)(cid:13) F = (cid:13)(cid:13)(cid:13)(cid:13) W p − − LQ P T p ( A p ) ∗ A p P T p S p W p − (cid:13)(cid:13)(cid:13)(cid:13) F , (70)where the equality is due to S p W p − ∈ T p and W p − ∈ T .Combining this with (64), (65) and (68), leads to (cid:107) W p (cid:107) F ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) I − LQ P T p ( A p ) ∗ A p (cid:19) S p W p − (cid:13)(cid:13)(cid:13)(cid:13) F (71) + (cid:107) ( I − S p ) W p − (cid:107) F ≤ (cid:107) S p W p − (cid:107) F + 116 (cid:107) W p − (cid:107) F ≤ (cid:107) W p − (cid:107) F . Thus, ∀ p ∈ [ P ] , (cid:107) W p (cid:107) F ≤ (1 / p (cid:107) W (cid:107) F = (1 / p √ S whichproves (66). As P ≥ log(8 γ √ S ) (Def. 3), we get (67).

2) Outer Space Bound:

We now turn to show that condition(36) in Lemma 5 holds. Thus, we bound the operator norm: (cid:107)P T ⊥ ( Y P ) (cid:107) ≤ P (cid:88) p =1 (cid:13)(cid:13)(cid:13)(cid:13) P T ⊥ (cid:18) LQ ( A p ) ∗ A p S p W p − − W p − (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ P (cid:88) p =1 (cid:13)(cid:13)(cid:13)(cid:13) LQ ( A p ) ∗ A p S p W p − − W p − (cid:13)(cid:13)(cid:13)(cid:13) , where we use W p − ∈ T and the fact that the operator normof a projection is bounded by . Thus, in order to show thatcondition (36) holds, it remains to prove that (cid:13)(cid:13)(cid:13)(cid:13) LQ ( A p ) ∗ A p S p W p − − W p − (cid:13)(cid:13)(cid:13)(cid:13) ≤ p +1 , (72)for p ∈ [ P ] . By deﬁning µ p = √ L max l ∈ γ p (cid:13)(cid:13) W ∗ p S p +1 b l (cid:13)(cid:13) , (73)we state the outer space bound lemma (proven in App. B-B). Lemma 13.

Let ω ≥ . Assume that µ p ≤ − p µ H and (cid:107) W p (cid:107) F ≤ − p √ S . If Q (cid:38) S ( Kµ + N µ H ) log L, (74) then with probability − O ( L − ω ) , (cid:13)(cid:13)(cid:13)(cid:13) LQ ( A p ) ∗ A p S p W p − − W p − (cid:13)(cid:13)(cid:13)(cid:13) ≤ p +1 , ∀ p ∈ [ P ] . (75)

3) An Upper Bound for the Dual Certiﬁcate:

The upperbound in (38) scales with (cid:107) z (cid:107) , where z equals z = LQ P (cid:88) p =1 A p S p W p − , (76)such that Y = A ∗ z as in (62). We thus need to upper bound (cid:107) z (cid:107) to obtain the total error bound. Lemma 14.

Let z ∈ C L be given by (76) and assume (cid:107) W p (cid:107) F ≤ − p √ S . Suppose that A p satisﬁes (49) with δ ≤ on T p for all p ∈ [ P ] . Then (cid:107) z (cid:107) (cid:46) P √ S .Proof. By its deﬁnition (76), (cid:107) z (cid:107) = LQ P (cid:88) p =1 (cid:107)A p S p W p − (cid:107) F (cid:46) P P (cid:88) p =1 (cid:107) W p − (cid:107) F (cid:46) P √ S, where we have used (49), (65) and P = L/Q (Def. 3).Now we can ﬁnally prove the main Theorem.

E. Proof of Th. 1

Combining the conditions on Q given in Th. 7, Lemmas 13and 19, we have that Q (cid:38) S (cid:0) Kµ log( Kµ ) + N µ H (cid:1) log ( L ) . (77)Let Γ p be an admissible partition of the measurements and let ω > . Then by Def. 3 we have P ≤ log(8 γ √ S ) (cid:46) log L, (78)where γ ≤ (cid:8) , (cid:113) NKL µ (cid:9)(cid:112) ω (log( L + SKN )) . As L = P Q , we have that if L (cid:38) ω S (cid:0) Kµ log( Kµ ) + N µ H (cid:1) log ( L ) (79)then we can assume that Th. 7 and Lemmas 13 and 19 hold.Thus, we can assume that conditions (23) and (49) hold withprob. − O ( L − ω ) and constant δ = 1 / . By applyingLemma 5 with α = 1 / γ , β = 1 / and δ = 1 / , it isenough to construct a dual certiﬁcate Y ∈ Range A ∗ , which Fig. 1: Phase transition: linear empirical dependence of L inS, in accordance with our theoretical results.Fig. 2: Phase transition limits for a ﬁxed L , varying thesubspace dimensions N and K. The cutoffs match the theory.satisﬁes conditions (35) and (36). These conditions are metby the Golﬁng scheme in Lemmas 12 and 13 for a ﬁxed p ∈ [ P ] . The assumptions of Lemma 12 are given by (49),thus, condition (35) applies. The assumptions of Lemma 13 aremet by Lemma 19, which holds by (77) and so condition (36)holds. Thus, Y deﬁned in (62) satisﬁes conditions (35) and(36). Using a union bound we conclude that with probability − O ( L − ω ) the approximate dual certiﬁcate satisﬁes theconditions of Lemma 5 and thus if ˆ X is the minimizer of(37) then we can bound the estimation error by (38). Thiserror is bounded by lemmas 6 and 14, resulting in (cid:13)(cid:13)(cid:13) ˆ X − X (cid:13)(cid:13)(cid:13) F (cid:46) τ (1 + γ )(1 + (cid:107) z (cid:107) ) (cid:46) ω τ (cid:115) S max (cid:26) , SKµ NL (cid:27) log ( L ) (80)V. E XPERIMENTS

In all the experiments,

C, M , and H r are drawn froma random Gaussian distribution. B consists of the K ﬁrststandard basis vectors, representing the blind MIMO scenarioin [26] and in accordance with the results presented in [6]. Thematrix reconstruction phase is done using the Matlab solverminfunc, using the heuristic solver developed by Burer andMonteiro [30] (similar to [14]). For the vector recovery, weused the Matlab non-linear equation solver fsolve with thetrust-region Dogleg algorithm. All the results are measuredend to end, ie. with an average error of less than . for Fig. 3: Phase transition for one sided constraints. More con-straints improve the basis transformation optimization.each vector. We repeat each experiment and report the successfraction out of 10 runs.First, we show the phase transition of the empirical recon-struction probability, changing the number of measurementsat each receiver, L , for a different number of signals. Weﬁx N = 30 , K = 25 , and report the fraction of successfulreconstructions (i.e. with an average error of less than . for each vector) out of 10 experiments. The results for thetwo-sided constraints scenario, where R = S , are shown inFig. 1, demonstrating linear dependency in accordance withthe theoretical guarantees. These results are very similar to thephase transition empirical results presented in [6] but requireless computational resources to achieve them.Next, we repeat a similar experiment, only this time ﬁxing L = 2048 and changing N and K . We consider the two-sided constraints case, with R = S . Fig. 2 shows phasetransition lines given the same success criterion as before,for a different number of sources. The area below the lineindicates successful recovery and shows the maximal N, K for a given amount of measurements per receiver. For S = 2 , L ∼ N + K ) . For a larger S = 7 , L ∼ N + K ) . The”cutoff” N, K appears to be nonlinear w.r.t. S and differs morefor smaller values of S . This result is in accordance with ourlinear theoretical guarantees.Fig. 3 shows the phase transition results for the one sidedconstraints. We use N = 6 , K = 5 , L = 10 , ..., . Fig. 3apresents the results for R = S + 1 , the minimal R that allowscorrect recovery. The partial success areas (gray rubrics) in thischart are mostly failures to converge to some solution of thequadratic system (16) (the vector recovery stage), as opposedto wrong ambiguous solutions. A different algorithm mightsuggest further improved performance. Figs. 3b ( R = S + 2 )and 3c ( R = 2 S ) show that adding constraints, improves theconvergence and precision. The better performance of the two- sided setup (has fewer constraints compared to the ones sidedcase with R > S + 1 ) might indicate that its structure doesadd up to more than the sum of its parts, and imposes some”hidden” constraints. VI. C

ONCLUSION

This work presents a separable approach to blind decon-volution and demixing via convex optimization. We measurethe same signals at different receivers, with different convo-lution kernels. Assuming all signals and kernels reside in thesame subspaces, allows formulating the problem as low-rankmatrices recovery. Using the assumption that the signals arenormalized allows us to recover them blindly.Our formulation allows lower complexity than [26] becausewe keep our matrices small. We solve a few small problemsinstead of a single large one. The stage we add to resolve thespanning base ambiguity of the rank-S case, is of a constantcomplexity of S , regardless of the number of receivers, andthe other dimensions of the problem, i.e. N, K, L , which areusually much larger. Although we do not support the case of

R < S , for the full case we have a much better complexity.We believe that our formulation can be combined with non-convex schemes [19]–[24] to further improve their complexity.We leave this to future work.We derive sample complexity conditions for the matrixrecovery problem given in (10). This is the ﬁrst work tosolve this problem for rank- S matrices and to supply adequateproof, which stands in line with the previous near-optimalresults of the rank- case [17]. Future work should analyze ourconjecture resolving the spanning base ambiguity. We expectthe bounds to remain linear, in line with our empirical results.A PPENDIX AS UPREMA OF C HAOS P ROCESSES AND C OVERING N UMBERS

This section describes the necessary preliminaries usedthroughout the proof. We refer to [17, Sec. IV-B,C] for furtherreading and references. First, we deﬁne the γ functional, ageometric quantity introduced by Talagrand [29]. Deﬁnition 15.

Let ( X, ||| · ||| ) be a Banach space and supposethat S ⊂ X . A sequence ( S n ) n ≥ of subsets of S is admissible,if | S | = 1 and | S n | = 2 n for n ≥ . Then we set γ ( S, ||| · ||| ) = inf ( S n ) n ≥ sup s ∈ S ∞ (cid:88) n =0 n/ inf s n ∈ S n ||| s − s n ||| , (81) where the inﬁmum is over all admissible sequences ( S n ) n ≥ . Furthermore, we deﬁne the distances d F ( X ) = sup X ∈X (cid:107) X (cid:107) F , d → ( X ) = sup X ∈X (cid:107) X (cid:107) , (82)where X is any set of matrices. Together with the notion ofthe γ functional, we can state the following theorem that isused in the proof of Th. 7. Theorem 16 (Suprema of Chaos Processes; Th. 13 in [17]) . Let X be a symmetric set of matrices and let ξ be a randomvector whose entries are ξ i ∼ CN (0 , are independent. Set E = γ ( X , (cid:107) · (cid:107) ) (cid:0) γ ( X , (cid:107) · (cid:107) ) + d F ( X ) (cid:1) V = d → ( X ) (cid:0) γ ( X , (cid:107) · (cid:107) ) + d F ( X ) (cid:1) U = d → ( X ) Then for t ≥ , P (cid:18) sup X ∈X (cid:12)(cid:12)(cid:12)(cid:12) (cid:107) Aξ (cid:107) − E (cid:107) Aξ (cid:107) (cid:12)(cid:12)(cid:12)(cid:12) ≥ c E + t (cid:19) ≤ exp (cid:18) − c min (cid:18) t V , tU (cid:19)(cid:19) (83) where the constants c , c are universal. The γ functional can be bounded using Dudley’s inequality,involving covering numbers. Recall that the covering number N ( S, ||| · ||| , (cid:15) ) is the minimum number of ||| · ||| -balls withradius (cid:15) to cover the set S . Theorem 17 (Dudley’s Inequality, see [29] Prop. 2.2.10, [31]) . Given a set S in a Banach space ( X, ||| · ||| ) , we have that γ ( S, ||| · ||| ) (cid:46) (cid:90) d |||·||| ( S )0 (cid:112) log N ( S, ||| · ||| , (cid:15) ) d(cid:15), where d |||·||| ( S ) = sup x ∈ S ||| x ||| . We now prove Proposition (9), which is a modiﬁcation toTh. 16 and used to prove Th. 7.

Proof of Proposition 9.

First we prove (51). Fix p ∈ [ P ] . For Z ∈ Z , let H Z ∈ C L ×| Γ p | N be a ”block diagonal” matrix,where each block in it indexed by l ∈ Γ p is the row vector (cid:113) LQ b ∗ l Z ∈ C × N . Notice that (cid:107) H Z (cid:107) F = LQ (cid:88) l ∈ Γ p (cid:107) Z ∗ b l (cid:107) = Tr ( ZZ ∗ T p ) = (cid:13)(cid:13)(cid:13) T / p Z (cid:13)(cid:13)(cid:13) F , (cid:107) H Z (cid:107) = (cid:115) LQ max l ∈ Γ p (cid:107) b ∗ l Z (cid:107) ≤ √ Q (cid:107) Z (cid:107) B . Let ξ ( p ) be the concatenation of all c l where l ∈ Γ p and c l isthe l th column of C T . Then LQ (cid:107)A p ( Z ) (cid:107) = LQ (cid:88) l ∈ Γ p |A p ( Z )( l ) | = LQ (cid:88) l ∈ Γ p | b ∗ l Zc l | = (cid:13)(cid:13)(cid:13) H Z ξ ( p ) (cid:13)(cid:13)(cid:13) . (84)Notice that E (cid:20) (cid:13)(cid:13)(cid:13) H Z ξ ( p ) (cid:13)(cid:13)(cid:13) (cid:21) = (cid:13)(cid:13)(cid:13) T / p Z (cid:13)(cid:13)(cid:13) F = (cid:107) H Z (cid:107) F , (85)and thus sup Z ∈Z (cid:12)(cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:13) H Z ξ ( p ) (cid:13)(cid:13)(cid:13) − E (cid:20) (cid:13)(cid:13)(cid:13) H Z ξ ( p ) (cid:13)(cid:13)(cid:13) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) =sup Z ∈Z (cid:12)(cid:12)(cid:12)(cid:12) LQ (cid:107)A p ( Z ) (cid:107) − (cid:13)(cid:13)(cid:13) T / P Z (cid:13)(cid:13)(cid:13) F (cid:12)(cid:12)(cid:12)(cid:12) . (86) To get (51) we just need to apply Th. 16. The proof for (52)is similar, where the ”diagonal” elements in H Z are non zerofor l ∈ [ L ] , and T p is replaced with (cid:80) l ∈ [ L ] b l b ∗ l = I .A PPENDIX BP ROOF OF D IFFERENT L EMMAS

A. Proof of Lemma 6Proof of Lemma 6.

Recall A ( Z ) = [ (cid:104) A , Z (cid:105) , . . . , (cid:104) A L , Z (cid:105) ] T where A l = √ LB ∗ F ∗ diag ( F l ) ¯ F ¯ C = ˆ b l ˆ c ∗ l . In order to bound γ , we will estimate the norms of the expected operator normsof A ∗ A and AA ∗ . Starting with the former, E [ A ∗ A ( Z )] = (cid:88) l ∈ [ L ] E [ A ( Z )( l )ˆ b l ˆ c ∗ l ] = (cid:88) l ∈ [ L ] E [ˆ b l ˆ b ∗ l Z ˆ c l ˆ c ∗ l ] = (cid:88) l ∈ [ L ] ˆ b l ˆ b ∗ l Z = Z, (87)meaning that E [ A ∗ A ] = I . Moving to the latter, E [ AA ∗ y ( l )] = E [ˆ b ∗ l A ∗ ( y )ˆ c l ] = (cid:88) l ‘ ∈ [ L ] E [ˆ b ∗ l ˆ b l ‘ y ( l ‘ )ˆ c ∗ l ‘ ˆ c l ]= y ( l ) (cid:88) l ‘ ∈ [ L ] E [ˆ b ∗ l ˆ b l ‘ ˆ c ∗ l ‘ ˆ c l ] = y ( l ) N ˆ (cid:107) b l (cid:107) . (88)Thus, E [ AA ∗ ] = diag ( N ˆ (cid:107) b (cid:107) , . . . , N ˆ (cid:107) b L (cid:107) ) . Combined withthe deﬁnition of µ in (24), we get (cid:107) E [ AA ∗ ] (cid:107) ≤ NKµ L . Thus, σ = max {(cid:107) E [ A ∗ A ] (cid:107) , (cid:107) E [ AA ∗ ] (cid:107)} ≤ max (cid:26) , N Kµ L (cid:27) . (89)Applying Corollary 10 in [17] with t = ω log( L ) , (cid:107)A(cid:107) ≤ max (cid:26) , (cid:114) N KL µ (cid:27)(cid:112) (2 ω log L + log( L + SKN )) (90)with probability exceeding − L − ω . B. Proof of Lemma 13

We will use Th. 9 in [17] to bound (72).

Lemma 18 (Matrix Bernstein Inequality, Th. 9 in [17]) . Let α ∈ [1 , ∞ ) and let X , . . . , X n ∈ C d × d be in-dependent random matrices that satisfy E [ X i ] = 0 forall i ∈ [ n ] . Set R ψ α = max i ∈ [ n ] (cid:107)(cid:107) X i (cid:107)(cid:107) ψ α and σ =max (cid:40) (cid:107) (cid:80) ni =1 E [ X i X ∗ i ] (cid:107) , (cid:107) (cid:80) ni =1 E [ X ∗ i X i ] (cid:107) (cid:41) . Set Z = (cid:80) ni =1 X i . Then with probability at least − e − t , (cid:107) Z (cid:107) (cid:46) max (cid:26) σ (cid:112) t + log( d + d ) , (91) R ψ α (cid:18) log (cid:18) nR ψ α σ (cid:19)(cid:19) /α ( t + log( d + d )) (cid:27) . Proof of Lemma 13.

First, notice that for all l ∈ Γ p , ( A p S p W p − )( l ) = b ∗ l S p W p − c l , (92) (( A p ) ∗ A p S p W p − ) = (cid:88) l ∈ Γ p b l b ∗ l S p W p − c l c ∗ l . Since by its deﬁnition, S p = ( T p ) − (see (27)), we have that W p − = T p S p W p − = LQ b l b ∗ l S p W p − . (93)For simplicity of notation, deﬁne w l = W ∗ p − S p b l . (94)Thus, we can write LQ ( A p ) ∗ A p S p W p − − W p − = LQ (cid:88) l ∈ Γ p b l w ∗ l c l c ∗ l − LQ (cid:88) l ∈ Γ p b l w ∗ l = LQ (cid:88) l ∈ Γ p b l w ∗ l ( c l c ∗ l − I ) = (cid:88) l ∈ Γ p Z l , (95)where Z l (cid:44) LQ b l w ∗ l ( c l c ∗ l − I ) . (96)We now asses the the relevant components to use Lemma 18.We start with the expectation values of Z l Z ∗ l and Z ∗ l Z l . E [ Z l Z ∗ l ] = E [ L Q b l w ∗ l ( c l c ∗ l − I ) w l b ∗ l ]= L Q b l w ∗ l E [( c l c ∗ l − I ) ] w l b ∗ l = L Q N (cid:107) w l (cid:107) b l b ∗ l , (97)since E [( c l c ∗ l − I ) ] = N I by Lemma 11 in [14]. E [ Z ∗ l Z l ] = E [ L Q ( c l c ∗ l − I ) w l b ∗ l b l w ∗ l ( c l c ∗ l − I )] = (98) L Q (cid:107) b l (cid:107) E [( c l c ∗ l − I ) w l w ∗ l ( c l c ∗ l − I )] = L Q (cid:107) b l (cid:107) (cid:107) w l (cid:107) I, since E [( c l c ∗ l − I ) w l w ∗ l ( c l c ∗ l − I )] = (cid:107) w l (cid:107) I by Lemma 12in [14]. Furthermore, we have that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p E [ Z l Z ∗ l ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ L NQ max l ∈ Γ p ( (cid:107) w l (cid:107) ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p b l b ∗ l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ NQ µ p − (cid:107) T p (cid:107) (cid:46) − p N µ H Q , (99)due to the lemma’s assumptions and (28). (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p E [ Z ∗ l Z l ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ L Q max l ∈ Γ p ( (cid:107) b l (cid:107) ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p (cid:107) w l (cid:107) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ LKµ Q (cid:88) l ∈ Γ p Tr( W ∗ p − S p b l b ∗ l S p W p − )= Kµ Q (cid:13)(cid:13)(cid:13) S / p W p − (cid:13)(cid:13)(cid:13) F (cid:46) − p SKµ Q . (100)Thus, we have σ = max (cid:40) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p E [ Z l Z ∗ l ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p E [ Z ∗ l Z l ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:41) (101) (cid:46) − p Q max( SKµ , N µ H ) ≤ − p Q ( SKµ + N µ H ) . Now we estimate R ψ α = max (cid:107)(cid:107) Z l (cid:107)(cid:107) ψ α , where (cid:107)·(cid:107) ψ α is theOrlicz norm (Def. 7 in [17]). (cid:107)(cid:107) Z l (cid:107)(cid:107) ψ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ b l w ∗ l ( c l c ∗ l − I ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ (102) ≤ LQ (cid:107)(cid:107) b l w ∗ l ( c l c ∗ l − I ) (cid:107) (cid:107) ψ ≤ LQ (cid:107) b l (cid:107) (cid:107)(cid:107) w ∗ l ( c l c ∗ l − I ) (cid:107) (cid:107) ψ (cid:46) L √ NQ (cid:107) b l (cid:107) (cid:107) w l (cid:107) (cid:46) L √ NQ (cid:114) KL µ µ p − √ L (cid:46) − p √ N Kµµ H Q (cid:46) − p ( Kµ + N µ H ) Q , where the second inequality is due to Lemma 39 in [17],and the last step follows from this lemma’s assumptions. Wecontinue to asses the size | Γ p | R ψα σ . We have that | Γ p | R ψ α σ (cid:46) Q − p NKµ µ H Q − p Q max( SKµ , N µ H ) (103) (cid:46) Kµ N µ H N µ H = Kµ ≤ L. Finally, we can set these sizes in Lemma 18 with t = ( ω +1) log L , α = 1 and get with probability − O ( L − ω − ) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) l ∈ Γ p Z l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:46) ω − p max (cid:26) (cid:113) ( SKµ + Nµ H ) log LQ , (104) Kµ + Nµ H Q log L (cid:27) , for a p ∈ [ P ] . Taking the union bound for all p ∈ [ P ] , we get (cid:13)(cid:13)(cid:13) LQ ( A p ) ∗ A p S p W p − − W p − (cid:13)(cid:13)(cid:13) ≤ p +1 with prob. of at least − P O ( L − ω − ) = 1 −O ( L − ω ) , which ﬁnishes the proof.It thus remains to prove that µ p ≤ µ p − . Lemma 19.

Let ω ≥ . if Q (cid:38) S max( Kµ , N µ H ) log L (105) then with probability − O ( L − ω ) it holds that µ p ≤ µ p − for all p ∈ [ P − .Proof. Recall that by the deﬁnition of µ p in (73), we need toshow that for all l ∈ Γ p and for all p ∈ [ P − , √ L (cid:13)(cid:13) W ∗ p S p +1 b l (cid:13)(cid:13) ≤ µ p − . (106)By (69), we have that W p = W p − − LQ P T ( A p ) ∗ A p S p W p − . Thus, we can use (31), (33) together with (95) to write W p = LQ (cid:88) j ∈ Γ p ( ˆ H ˆ H ∗ b j w ∗ j ( I − c j c ∗ j ) + (107) ( I − ˆ H ˆ H ∗ ) b j w ∗ j ( I − c j c ∗ j ) ˆ M ˆ M ∗ ) , where w j = W ∗ p − S p b j as in (94) and W p − ∈ T . Using thetriangle inequality, we have that (cid:13)(cid:13) W ∗ p S p +1 b l (cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ (cid:88) j ∈ Γ p ( I − c j c ∗ j ) w j b ∗ j ˆ H ˆ H ∗ S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ (cid:88) j ∈ Γ p ˆ M ˆ M ∗ ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:44) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p u j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p v j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (108)We now use Lemma 18 again to bound the two summands.For the ﬁrst part, we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p E [ u j u ∗ j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = L Q (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p w j b ∗ j ˆ H ˆ H ∗ S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (109) ≤ L Q (cid:88) j ∈ Γ p ,s ∈ [ S ] (cid:13)(cid:13)(cid:13) w j b ∗ j ˆ h s ˆ h ∗ s S p +1 b l (cid:13)(cid:13)(cid:13) = L Q (cid:88) s ∈ [ S ] | ˆ h ∗ s S p +1 b l | (cid:88) j ∈ Γ p | b ∗ j ˆ h s | (cid:107) w j (cid:107) ≤ µ H µ p − QL (cid:88) s ∈ [ S ] (cid:13)(cid:13)(cid:13) T / p ˆ h s (cid:13)(cid:13)(cid:13) (cid:46) Sµ H µ p − QL , where we have used Lemma 12 in [14] again and by (63) andthe deﬁnitions in (29), (73), (26) and (33). Similarly, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p E [ u ∗ j u j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ L NQ (cid:88) j ∈ Γ p ,s ∈ [ S ] (cid:13)(cid:13)(cid:13) w j b ∗ j ˆ h s ˆ h ∗ s S p +1 b l (cid:13)(cid:13)(cid:13) (cid:46) SN µ H µ p − QL , (110)again using Lemma 11 in [14], leading to σ (cid:46) SNµ H µ p − QL .Furthermore, R ψ = max j ∈ Γ p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ ( I − c j c ∗ j ) w j b ∗ j ˆ H ˆ H ∗ S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) ψ (111) = LQ max j ∈ Γ p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) s ∈ [ S ] ( I − c j c ∗ j ) w j b ∗ j ˆ h s ˆ h ∗ s S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ ≤ LQ µ H L S max j ∈ Γ p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( I − c j c ∗ j ) w j (cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ψ (cid:46) LQ µ H L S max j ∈ Γ p √ N (cid:107) w j (cid:107) ≤ S √ N µ H µ p − Q √ L .

Now, we can assess | Γ p | R ψ σ (cid:46) Q S Nµ H µ p − Q L QLSNµ H µ p − = Sµ H (cid:46) SL. (112)Finally, we can write with t = ( ω + 1) log L , α = 1 (cid:13)(cid:13)(cid:13) LQ (cid:80) j ∈ Γ p ( I − c j c ∗ j ) w j b ∗ j ˆ H ˆ H ∗ S p +1 b l (cid:13)(cid:13)(cid:13) (cid:46) (113) µ p − √ L max (cid:26)(cid:113) SNµ H Q log L, S √ Nµ H Q log L (cid:27) with a probability of at least − O ( L − ω − ) for a ﬁxed p ∈ [ P ] . Taking the union bound for all p ∈ [ P ] , we get thatif Q (cid:38) SN µ H log L , then the ﬁrst summand in (108) isbounded as required by (106) with probability of at least − P O ( L − ω − ) = 1 − O ( L − ω ) .We now repeat the same process for the second summandin (108). We use the Z j notation deﬁned in (96). (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p E [ v j v ∗ j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:88) j ∈ Γ p E (cid:107) ˆ M ˆ M ∗ Z ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l · b ∗ l S p +1 ( I − ˆ H ˆ H ∗ ) Z j ˆ M ˆ M ∗ (cid:107)≤ (cid:88) j ∈ Γ p E (cid:107) Z ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l b ∗ l S p +1 ( I − ˆ H ˆ H ∗ ) Z j (cid:107) = L Q (cid:88) j ∈ Γ p E (cid:107) ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l · b ∗ l S p +1 ( I − ˆ H ˆ H ∗ ) b j w ∗ j ( I − c j c ∗ j ) (cid:107) = L Q (cid:88) j ∈ Γ p (cid:13)(cid:13)(cid:13) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13) ≤ L Q (cid:88) j ∈ Γ p (cid:107) w j (cid:107) | b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l | (114) ≤ L Q µ p − L QL (cid:13)(cid:13)(cid:13) T / p ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13) ≤ L Q µ p − L QL (cid:13)(cid:13)(cid:13) T / p (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) ( I − ˆ H ˆ H ∗ ) (cid:13)(cid:13)(cid:13) (cid:107) S p +1 (cid:107) (cid:107) b l (cid:107) (cid:46) L Q µ p − L QL (cid:107) b l (cid:107) ≤ L Q µ p − L QL Kµ L = Kµ µ p − QL where we have used Lemma 12 in [14] again and by (63) andthe deﬁnitions in (29), (73), (26) and (33).Similarly, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) j ∈ Γ p E [ v ∗ j v j ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (115) ≤ L Q (cid:88) j ∈ Γ p E (cid:107) b ∗ l S p +1 ( I − ˆ H ˆ H ∗ ) b j w ∗ j ( I − c j c ∗ j ) ˆ M ˆ M ∗ · ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:107) = L SQ (cid:88) j ∈ Γ p (cid:13)(cid:13)(cid:13) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13) (cid:46) SKµ µ p − QL again using Lemma 12 in [14]. Thus, σ (cid:46) SKµ µ p − QL .Furthermore, R ψ = max j ∈ Γ p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ ˆ M ˆ M ∗ ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) ψ = max j ∈ Γ p (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) LQ ˆ M ∗ ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) ψ ≤ LQ max j ∈ Γ p | b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l | (cid:88) s ∈ [ S ] (cid:13)(cid:13) ˆ m ∗ s ( I − c j c ∗ j ) w j (cid:13)(cid:13) ψ ≤ LQ Kµ L (cid:88) s ∈ [ S ] (cid:107) ˆ m ∗ s (cid:107) max j ∈ Γ p (cid:107) w j (cid:107) ≤ SKµ µ p − Q √ L , (116) where we have used Lemma 39 in [17]. Now, we can assess | Γ p | R ψ σ (cid:46) Q S K µ µ p − Q L QLSKµ µ p − = SKµ ≤ SL. (117)Finally, we can write with t = ( ω + 1) log L , α = 1 (cid:13)(cid:13)(cid:13) LQ (cid:80) j ∈ Γ p ˆ M ˆ M ∗ ( I − c j c ∗ j ) w j b ∗ j ( I − ˆ H ˆ H ∗ ) S p +1 b l (cid:13)(cid:13)(cid:13) (cid:46) µ p − √ L max (cid:26)(cid:113) SKµ log LQ , SKµ Q log L (cid:27) , (118)with a probability of at least − O ( L − ω − ) for a ﬁxed p ∈ [ P ] . Taking the union bound for all p ∈ [ P ] , we get thatif Q (cid:38) SKµ log L , then the second summand in (108) isbounded as required by (106) with probability of at least − P O ( L − ω − ) = 1 − O ( L − ω ) , which ﬁnishes the proof. C. Proof of Lemma 10Proof of Lemma 10.

To prove this lemma, we use the follow-ing two technical lemmas. The proof of the ﬁrst appears inAppendix B-D and of the second in [17]. First, let us denotethe unit ball with respect to (cid:107)·(cid:107) by B (0 , (throughout thepaper). Lemma 20.

Let B M be deﬁned by (53) . Then N ( B M , (cid:107)·(cid:107) B , (cid:15) ) ≤ N (cid:16) B (0 , ⊂ R S , (cid:107)·(cid:107) , (cid:15) √ Kµ (cid:17) (119) · N S (cid:16) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:17) Lemma 21. (A private case of Lemma 27 in [17]). log N (cid:16) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:17) (cid:46) Kµ (cid:15) log L. (120)We now turn to prove Lemma 10. The ﬁrst inequality (56)follows the fact that d F ( X ) = sup X ∈X (cid:107) X (cid:107) F ≤ sup X ∈W p (cid:107) X (cid:107) F ≤ , (121)where the last inequality holds since W p = B M + B H + B S p H and all the elements in these sets are normalized.To prove the second inequality (57), we use the deﬁnitionsof d B ( X ) and (cid:107)·(cid:107) B , to get d B ( X ) = sup X ∈X (cid:107) X (cid:107) B = sup X ∈X √ L max l ∈ [ L ] (cid:107) X ∗ b l (cid:107) F ≤ (122) sup X ∈X √ L (cid:107) X (cid:107) F max l ∈ [ L ] (cid:107) b l (cid:107) ≤ sup X ∈X (cid:107) X (cid:107) F √ Kµ ≤ √ Kµ, where the ﬁrst inequality is due to the Frobenius normproperties, the second is due to the deﬁnition of µ in (24)and the last is due to (56).For (58), we can use Lemma 12 in [17] to obtain γ ( W p , (cid:107)·(cid:107) B ) (cid:46) (123) γ ( B H , (cid:107)·(cid:107) B ) + γ ( B M , (cid:107)·(cid:107) B ) + γ ( B S p H , (cid:107)·(cid:107) B ) , where γ ( W , (cid:107)·(cid:107) B ) is bounded analogously. First we bound γ ( B H , (cid:107)·(cid:107) B ) . Let U = ˆ HU ∗ , V = ˆ HV ∗ ∈B M . Then (cid:107) U − V (cid:107) B = (cid:13)(cid:13)(cid:13) ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) B = (124) = L max l ∈ [ L ] (cid:13)(cid:13)(cid:13) ( U − V ) ˆ H ∗ b l (cid:13)(cid:13)(cid:13) ≤ L max l ∈ [ L ] (cid:88) s ∈ [ S ] (cid:13)(cid:13)(cid:13) ˆ h ∗ s b l ( u s − v s ) (cid:13)(cid:13)(cid:13) = L max l ∈ [ L ] (cid:88) s ∈ [ S ] | ˆ h ∗ s b l | (cid:107) ( u s − v s ) (cid:107) ≤ µ H (cid:88) s ∈ [ S ] (cid:107) ( u s − v s ) (cid:107) = µ H (cid:107) ( U − V ) (cid:107) F = µ H (cid:107) U − V (cid:107) F where the inequality is due to (29) and the last equality holdsbecause the Frobenius norm is unitary invariant. Using (124)followed by Dudley’s inequality (Th. 17), implies γ ( B H , (cid:107)·(cid:107) B ) ≤ µ H γ ( B H , (cid:107)·(cid:107) F ) (cid:46) (125) µ H (cid:90) (cid:113) log N ( B H , (cid:107)·(cid:107) F , (cid:15) ) d(cid:15) (cid:46) µ H √ SN , where the last inequality holds since ( B H , (cid:107)·(cid:107) F ) is isometricto ( B (0 , ⊂ R SN , (cid:107)·(cid:107) ) and a standard volumetric estimate.Similarly, let U = S p ˆ HU ∗ , V = S p ˆ HV ∗ ∈ B S p H . Then (cid:107) U − V (cid:107) B = (cid:13)(cid:13)(cid:13) S p ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) B = (126) L max l ∈ [ L ] (cid:13)(cid:13)(cid:13) ( U − V ) ˆ H ∗ S p b l (cid:13)(cid:13)(cid:13) = L max l ∈ [ L ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) s ∈ [ S ] ˆ h ∗ s S p b l ( u s − v s ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ L max l ∈ [ L ] (cid:88) s ∈ [ S ] | ˆ h ∗ s S p b l | (cid:107) u s − v s (cid:107) ≤ µ H (cid:88) s ∈ [ S ] (cid:107) u s − v s (cid:107) = µ H (cid:107) U − V (cid:107) F = µ H (cid:13)(cid:13)(cid:13) ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) F = µ H (cid:13)(cid:13)(cid:13) T p S p ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) F , where the last equality holds since T p S p = I and the rest isas in (124). The ﬁnal norm can be split into µ H (cid:13)(cid:13)(cid:13) T p S p ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) F ≤ µ H (cid:107) T p (cid:107) (cid:13)(cid:13)(cid:13) S p ˆ H ( U ∗ − V ∗ ) (cid:13)(cid:13)(cid:13) F ≤ µ H (1 + ν ) (cid:107) U − V (cid:107) F (cid:46) µ H (cid:107) U − V (cid:107) F , (127)where the second inequality is due to (cid:107) T p (cid:107) ≤ ν . UsingDudley’s inequality as in (125), we get γ ( B S p H , (cid:107)·(cid:107) B ) (cid:46) µ H √ SN . (128)To bound γ ( B M , (cid:107)·(cid:107) B ) , notice that d B ( B M ) ≤ √ Kµ , soby Dudley’s inequality we get γ ( B M , (cid:107)·(cid:107) B ) (cid:46) (cid:90) √ Kµ (cid:113) log N ( B M , (cid:107)·(cid:107) B , (cid:15) ) d(cid:15). (129)For the rhs, we use Lemma 20 and have γ ( B M , (cid:107)·(cid:107) B ) (cid:46) (cid:90) √ Kµ (cid:115) log N (cid:18) B (0 , ⊂ R S , (cid:107)·(cid:107) , (cid:15) √ Kµ (cid:19) d(cid:15) + (cid:90) √ Kµ (cid:115) S log N (cid:18) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:19) d(cid:15) (130) Thus, the ﬁrst integral is bounded by (cid:90) √ Kµ (cid:115) log N (cid:18) B (0 , ⊂ R S , (cid:107)·(cid:107) , (cid:15) √ Kµ (cid:19) d(cid:15) ≤√ S (cid:90) √ Kµ (cid:115) log (cid:18) √ Kµ(cid:15) (cid:19) d(cid:15) (cid:46) √ SKµ, (131)where we have used a standard volumetric estimate and achange of variables. For the second integral in (130), we pro-vide a private case of the derivation in [17] for completeness.We now split the second integral in (130) to two integrationintervals: [0 , and [1 , √ Kµ ] . For (cid:15) ∈ (0 , , we deﬁne B (0 , ⊂ √ KµB (cid:107)·(cid:107) B (0 , (cid:44) { x ∈ C K | (cid:107) x (cid:107) B ≤ √ Kµ } . (132)This implies that N ( B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) ) ≤ (133) N (cid:18) B (0 , (cid:107)·(cid:107) B ⊂ C K , (cid:107)·(cid:107) B , (cid:15) √ Kµ (cid:19) ≤ (cid:18) √ Kµ(cid:15) (cid:19) K , where the last inequality is a standard bound for the coveringnumber. For the interval [0 , we get the following bound (cid:90) (cid:115) S log N (cid:18) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:19) d(cid:15) (134) ≤ √ KS (cid:90) (cid:115) log (cid:18) √ Kµ(cid:15) (cid:19) d(cid:15) ≤ (cid:113) KS log( e (1 + 2 √ Kµ )) , where the ﬁrst inequality is due to (133) and the second one isdue to Lemma C.9 in [32]. Now we deal with the case where (cid:15) ∈ (1 , √ Kµ ) . Using Lemma 21, we get (cid:90) √ Kµ (cid:115) SN (cid:18) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:19) d(cid:15) (cid:46) (135) (cid:90) √ Kµ (cid:112) SK log( L ) µ(cid:15) d(cid:15) (cid:46) (cid:112) SK log( L ) µ log( Kµ ) . Combining (134) with (135) provides us with (cid:90) √ Kµ (cid:115) S log N (cid:18) B (0 , ⊂ C K , (cid:107)·(cid:107) B , (cid:15) (cid:19) d(cid:15) (cid:46) (cid:112) SK log( L ) µ log( Kµ ) , (136)where we use the fact that (135) is the dominant interval.Plugging (131) and (136) in (130), and considering again thedominant part, leads to γ ( B M , (cid:107)·(cid:107) B ) (cid:46) (cid:112) SK log( L ) µ log( Kµ ) . (137)The result stated in (58) is given by the summation of thethree bounds in (125), (128) and (137). D. Proof of Lemma 20Proof of Lemma 20.

For all s ∈ [ S ] , let N s be an (cid:15) -coverof B (0 , ⊂ C K with respect to the (cid:107)·(cid:107) B -norm and O bean (cid:15) √ Kµ -cover of B (0 , ⊂ R S with respect to the (cid:107)·(cid:107) norm. We will show now that any X = U ˆ M ∗ ∈ B M canbe approximated by Y = (cid:80) s ∈ [ S ] σ s v s ˆ m ∗ s where σ ∈ O and v s ∈ N s . Notice that the number of such Y s is bounded bythe right hand side of the inequality in (119). Thus, it remainsto show that such a construction is possible. Since σ ∈ O , wemay pick it to satisfy (cid:115) (cid:88) s ∈ [ S ] ( (cid:107) u s (cid:107) − σ s ) ≤ (cid:15) √ Kµ . (138)Notice, that ( (cid:107) u (cid:107) , . . . , (cid:107) u S (cid:107) ) ∈ B (0 , , since X ∈ B M and ˆ M is orthonormal. In a similar way, since v s ∈ N s , weselect it such that (cid:13)(cid:13)(cid:13)(cid:13) (cid:107) u s (cid:107) u s − v s (cid:13)(cid:13)(cid:13)(cid:13) B ≤ (cid:15) (139)for all s ∈ S . Thus, for ˆ Y = (cid:80) s ∈ [ S ] (cid:107) u s (cid:107) v s ˆ m ∗ , we have (cid:13)(cid:13)(cid:13) X − ˆ Y (cid:13)(cid:13)(cid:13) B = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) s ∈ [ S ] ( u s − (cid:107) u s (cid:107) v s ) ˆ m ∗ s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B (140) = L max l ∈ [ L ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) s ∈ [ S ] ˆ m s ( u s − (cid:107) u s (cid:107) v s ) ∗ b l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = L max l ∈ [ L ] (cid:88) s,k ∈ [ S ] b ∗ l ( u s − (cid:107) u s (cid:107) v s ) ˆ m ∗ s ˆ m k ( u k − (cid:107) u k (cid:107) v k ) ∗ b l = L max l ∈ [ L ] (cid:88) s ∈ [ S ] b ∗ l ( u s − (cid:107) u s (cid:107) v s )( u s − (cid:107) u s (cid:107) v s ) ∗ b l where the second equality is due to the deﬁnition of (cid:107)·(cid:107) B in(50) combined with the non-negativity of the norm, and thelast step follows the orthonormality of ˆ M . Next, we continueto bound L max l ∈ [ L ] (cid:88) s ∈ [ S ] b ∗ l ( u s − (cid:107) u s (cid:107) v s )( u s − (cid:107) u s (cid:107) v s ) ∗ b l (141) ≤ (cid:88) s ∈ [ S ] L max l ∈ [ L ] ( b ∗ l ( u s − (cid:107) u s (cid:107) v s )( u s − (cid:107) u s (cid:107) v s ) ∗ b l )= (cid:88) s ∈ [ S ] (cid:107) u s − (cid:107) u s (cid:107) v s (cid:107) B = (cid:88) s ∈ [ S ] (cid:13)(cid:13)(cid:13)(cid:13) (cid:107) u s (cid:107) (cid:18) (cid:107) u s (cid:107) u s − v s (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) B ≤ (cid:15) (cid:88) s ∈ [ S ] (cid:107) u s (cid:107) = (cid:15) (cid:107) U (cid:107) F ≤ (cid:15) where the ﬁrst equality is again due to due to the deﬁnition of (cid:107)·(cid:107) B in (50) combined with the non-negativity of the norm,the second inequality is due to (139) and the last step holdssince X = U ˆ M ∗ ∈ B M is normalized and ˆ M is orthonormal.To conclude this step, by (140), (141) we have (cid:13)(cid:13)(cid:13) X − ˆ Y (cid:13)(cid:13)(cid:13) B ≤ (cid:15) (142) To complete the proof, we similarly have (cid:13)(cid:13)(cid:13) ˆ Y − Y (cid:13)(cid:13)(cid:13) B = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) s ∈ [ S ] ( (cid:107) u s (cid:107) − σ s ) v s ˆ m ∗ s (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B ≤ (143) (cid:88) s ∈ [ S ] (cid:107) ( (cid:107) u s (cid:107) − σ s ) v s (cid:107) B = (cid:88) s ∈ [ S ] ( (cid:107) u s (cid:107) − σ s ) (cid:107) v s (cid:107) B ≤ (cid:88) s ∈ [ S ] Kµ ( (cid:107) u s (cid:107) − σ s ) ≤ (cid:15) , where again we used the orthonormlity of ˆ M , the non-negativity of the norm and the fact that (cid:107) v s (cid:107) B = √ L max l ∈ L | v ∗ s b l | (144) ≤ √ L (cid:107) v s (cid:107) max l ∈ L (cid:107) b l (cid:107) ≤ √ Kµ, which holds since v s ∈ N s . Finally, by combining (142) and(143), we get (cid:107) X − Y (cid:107) B ≤ (cid:15) .A CKNOWLEDGMENT

We thank the anonymous reviewers for their useful com-ments that helped to improve the paper. This research wassupported by ERC-StG grant no. 757497 (SPADE).R

EFERENCES[1] J. Liu, J. Xin, Y. Qi, and F.-G. Zheng, “A time domain algorithmfor blind separation of convolutive sound mixtures and l1 constraintedminimization of cross correlations,”

Communications in MathematicalSciences - COMMUN MATH SCI , vol. 7, 01 2009.[2] N. Shamir, Z. Zalevsky, L. Yaroslavsky, and B. Javidi, “Blind sourceseparation of images based on general cross correlation of linearoperators,”

Journal of Electronic Imaging - J ELECTRON IMAGING ,vol. 20, 04 2011.[3] S. Shwartz, Y. Y. Schechner, and M. Zibulevsky, “Efﬁcient separation ofconvolutive image mixtures,” in

Independent Component Analysis andBlind Signal Separation , J. Rosca, D. Erdogmus, J. C. Pr´ıncipe, andS. Haykin, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006,pp. 246–253.[4] Xiaodong Wang and H. V. Poor, “Blind equalization and multiuserdetection in dispersive cdma channels,”

IEEE Transactions on Com-munications , vol. 46, no. 1, pp. 91–103, Jan 1998.[5] G. Wunder, H. Boche, T. Strohmer, and P. Jung, “Sparse signal process-ing concepts for efﬁcient 5g system design,”

Access, IEEE , vol. 3, 112014.[6] S. Ling and T. Strohmer, “Blind deconvolution meets blind demixing:Algorithms and performance bounds,”

IEEE Transactions on Informa-tion Theory , vol. 63, no. 7, pp. 4497–4520, July 2017.[7] S. Haykin,

Blind Deconvolution , ser. Prentice-Hall information andsystem sciences series. PTR Prentice Hall, 1994. [Online]. Available:https://books.google.sh/books?id=KO1SAAAAMAAJ[8] S. Choudhary and U. Mitra, “Fundamental limits of blind deconvolutionpart I: ambiguity kernel,”

CoRR , vol. abs/1411.3810, 2014. [Online].Available: http://arxiv.org/abs/1411.3810[9] P. Walk, P. Jung, G. Pfander, and B. Hassibi, “Ambiguities on con-volutions with applications to phase retrieval,” in , 2016, pp. 1228–1234.[10] K. Lee, Y. Wu, and Y. Bresler, “Near-optimal compressed sensing of aclass of sparse low-rank matrices via sparse power factorization,”

IEEETransactions on Information Theory , vol. 64, no. 3, pp. 1666–1698,2018.[11] K. Lee, Y. Li, M. Junge, and Y. Bresler, “Blind recovery of sparse sig-nals from subsampled convolution,”

IEEE Transactions on InformationTheory , vol. 63, no. 2, pp. 802–821, 2017.[12] S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi, “Simul-taneously structured models with application to sparse and low-rankmatrices,”

IEEE Transactions on Information Theory , vol. 61, no. 5, pp.2886–2908, 2015. [13] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding andevaluating blind deconvolution algorithms,” in , 2009, pp. 1964–1971.[14] A. Ahmed, B. Recht, and J. K. Romberg, “Blind deconvolution us-ing convex programming,” IEEE Transactions on Information Theory ,vol. 60, pp. 1711–1732, 2014.[15] E. J. Cand`es, T. Strohmer, and V. Voroninski, “Phaselift: Exact andstable signal recovery from magnitude measurements via convex pro-gramming,”

Communications on Pure and Applied Mathematics , vol. 66,no. 8, pp. 1241–1274, 2013.[16] E. J. Cand`es and B. Recht, “Exact matrix completion via convexoptimization,”

Foundations of Computational Mathematics , vol. 9,no. 6, p. 717, Apr 2009. [Online]. Available: https://doi.org/10.1007/s10208-009-9045-5[17] P. Jung, F. Krahmer, and D. St¨oger, “Blind demixing and deconvolutionat near-optimal rate,”

IEEE Transactions on Information Theory , vol. 64,no. 2, pp. 704–727, Feb 2018.[18] X. Li, S. Ling, T. Strohmer, and K. Wei, “Rapid, robust, andreliable blind deconvolution via nonconvex optimization,”

Applied andComputational Harmonic Analysis

Informationand Inference: A Journal of the IMA , vol. 8, no. 1, pp. 1–49, 03 2018.[Online]. Available: https://doi.org/10.1093/imaiai/iax022[20] J. Dong and Y. Shi, “Nonconvex demixing from bilinear measurements,”

IEEE Transactions on Signal Processing , vol. 66, no. 19, pp. 5152–5166,2018.[21] T. Strohmer and K. Wei, “Painless breakups—efﬁcient demixing of lowrank matrices,”

Journal of Fourier Analysis and Applications , vol. 25,no. 1, pp. 1–31, 2019.[22] J. Dong, K. Yang, and Y. Shi, “Blind demixing for low-latency com-munication,”

IEEE Transactions on Wireless Communications , vol. 18,no. 2, pp. 897–911, 2019.[23] J. Dong and Y. Shi, “Blind demixing via wirtinger ﬂow with ran-dom initialization,” in

Proceedings of Machine Learning Research ,ser. Proceedings of Machine Learning Research, K. Chaudhuri andM. Sugiyama, Eds., vol. 89. PMLR, 16–18 Apr 2019, pp. 362–370.[24] J. Dong, Y. Shi, and Z. Ding, “Blind over-the-air computation anddata fusion via provable wirtinger ﬂow,”

IEEE Transactions on SignalProcessing , vol. 68, pp. 1136–1151, 2020.[25] Y. Chi, Y. M. Lu, and Y. Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,”

IEEE Transactions on SignalProcessing , vol. 67, no. 20, pp. 5239–5269, 2019.[26] A. Ahmed, “A convex approach to blind mimo communications,”

IEEEWireless Communications Letters , vol. 7, no. 5, pp. 812–815, Oct 2018.[27] K. Lee, Y. Li, M. Junge, and Y. Bresler, “Stability in blind deconvolutionof sparse signals and reconstruction by alternating minimization,” in , 2015, pp. 158–162.[28] D. Gross, “Recovering low-rank matrices from few coefﬁcients in anybasis,”

IEEE Transactions on Information Theory , vol. 57, no. 3, pp.1548–1566, 2011.[29] M. Talagrand,

Upper and Lower Bounds for Stochastic Processes:Modern Methods and Classical Problems , ser. Ergebnisse derMathematik und ihrer Grenzgebiete. 3. Folge / A Series of ModernSurveys in Mathematics. Springer Berlin Heidelberg, 2016. [Online].Available: https://books.google.co.il/books?id=JjQGvgAACAAJ[30] S. Burer and R. Monteiro, “A nonlinear programming algorithm forsolving semideﬁnite programs via low-rank factorization,”

MathematicalProgramming, Series B , vol. 95, pp. 329–357, 02 2003.[31] R. Dudley, “The sizes of compact subsets of hilbert space andcontinuity of gaussian processes,”

Journal of Functional Analysis