11 Homomorphic Sensing
Manolis C. Tsakiris Liangzu Peng
Abstract —A recent line of research termed unlabeled sensing and shuffled linear regression has been exploring under greatgenerality the recovery of signals from subsampled and permuted measurements; a challenging problem in diverse fields of datascience and machine learning. In this paper we introduce an abstraction of this problem which we call homomorphic sensing . Given alinear subspace and a finite set of linear transformations we develop an algebraic theory which establishes conditions guaranteeingthat points in the subspace are uniquely determined from their homomorphic image under some transformation in the set. As a specialcase, we recover known conditions for unlabeled sensing, as well as new results and extensions. On the algorithmic level we exhibittwo dynamic programming based algorithms, which to the best of our knowledge are the first working solutions for the unlabeledsensing problem for small dimensions. One of them, additionally based on branch-and-bound, when applied to image registrationunder affine transformations, performs on par with or outperforms state-of-the-art methods on benchmark datasets.
Index Terms —Homomorphic Sensing, Unlabeled Sensing, Shuffled Linear Regression, Abstract Linear Algebra, Algebraic Geometry,Branch and Bound, Dynamic Programming, Linear Assignment Problem, Image Registration, Affine Transformation (cid:70)
NTRODUCTION
In a recent line of research termed unlabeled sensing , it has beenestablished that uniquely recovering a signal from shuffled andsubsampled measurements is possible as long as the number ofmeasurements is at least twice the intrinsic dimension of the signal[1]. The special case where the signal is fully observed but subjectto a permutation of its values is known as shuffled linear regression [2], [3], [4], [5]. In its simplest form, it consists of solving a linearsystem of equations, with the entries of the right hand side vectorpermuted [6], [7].The unlabeled sensing or shuffled linear regression prob-lems and their variations naturally arise in many applicationsin data science and engineering, such as 1) record linkage fordata integration [8], [9], a particularly important problem inmedical data analysis where publicly available health recordsare anonymized, 2) image registration [10], multi-target tracking[11] and pose/correspondence estimation [12], [13], 3) header-free communications in Internet-Of-Things networks [14], [4],[15] and user de-anonymization [16], [17], 4) acoustic wave fieldreconstruction [18], 5) system identification under asynchronousinput/output samples [19], and many more, e.g., see [1], [3].
Theory.
Suppose that y = Π ∗ Ax ∗ + ε ∈ R m is a noisy andshuffled version of some signal Ax ∗ , where x ∗ ∈ R n is someunknown regression vector, Π ∗ is some unknown permutation, and ε is noise. What can be said about the estimation of x ∗ and Π ∗ given y , A and the distribution of ε ? This shuffled linear regressionproblem has been a classic subject of research in the area of recordlinkage, where predominant methods study maximum likelihoodestimators under the working hypothesis that an accurate estimatefor the probabilities of transpositions between samples is available • The authors are with the School of Information Science and Technology,ShanghaiTech University, Shanghai, China. • Correspondence to M.C. Tsakiris, [email protected] [20], [8]. However, this is a strong hypothesis that does not extendto many applications beyond record linkage.Very recently important theoretical advances have been madetowards understanding this problem in greater generality. Specifi-cally, [21], [22] and [2] have demonstrated that in the absence ofany further assumptions, the maximum likelihood estimator (cid:98) x ML given by ( (cid:98) Π ML , (cid:98) x ML ) = argmin Π ,x (cid:107) y − Π Ax (cid:107) , (1)where Π ranges over all permutations, is biased. On the otherhand, if the SNR is large enough, [23], [3] have asserted that (cid:98) Π ML = Π ∗ with high probability. If Π ∗ is sparse enough, i.e.,only a small percentage of entries of Ax ∗ have been shuffled(this is the support of Π ∗ ), [21] have shown that under weakerSNR conditions the supports of (cid:98) Π ML , Π ∗ coincide. Moreover, theyprovide well behaved error bounds for (cid:107) (cid:98) x ML − x ∗ (cid:107) as well as for (cid:107) (cid:98) x RR − x ∗ (cid:107) , where (cid:98) x RR is the solution to the convex (cid:96) robustregression problem min x,e (cid:107) y − Ax − √ me (cid:107) + mλ (cid:107) e (cid:107) , λ > , (2)in which the support of the sparse error e is meant to capture thesupport of the sparse permutation Π ∗ .Another interesting line of work related more to algebraicgeometry rather than statistics, is that of [4], which for thenoiseless case ( ε = 0 ) has proposed the use of symmetricpolynomials towards extracting permutation-invariant constraintsthat x ∗ ∈ R n must satisfy. Such a self-moment estimator hadalready been briefly investigated by [22] from a statistical pointof view, where the authors noted that in the presence of noiseit is unclear whether the resulting system of equations has anysolutions. Perhaps surprisingly, working with n non-homogeneouspolynomials of degrees , , . . . , n in n variables, the work of [7]has established that regardless of the value of the noise ε and underthe sole requirement that A is generic , the polynomial system
1. A rigorous definition of generic will be given in § a r X i v : . [ c s . I T ] A p r always has a solution and in fact at most n ! of them, thus provingthe existence of a purely algebraic estimator for x ∗ .Much less is known for the more challenging and realistic caseof unlabeled sensing, where now y ∈ R k consists of a shufflednoisy subset of the entries of Ax ∗ ∈ R m , i.e., there is no longera 1-1 correspondence between y and Ax ∗ . The main theoreticalfinding up to date comes from the seminal work of [1], accordingto which, in the absence of noise, x ∗ is uniquely recoverable from y and A as long as 1) k ≥ n and 2) A is generic. Inspiredby a certain duality between compressed and unlabeled sensing, arecovery condition for noisy data has further been given by [24] interms of a restricted isometry property. However, this approach isvalid only for the special case of y obtained by subsampling Ax ∗ while maintaining the relative order of the samples. Algorithms.
Towards computing a solution to the shuffled linearregression problem, which can be solved by brute force in O ( m !) ,the algorithms presented by [2], [14] are conceptually importantbut applicable only for noiseless data or they have a complexityof at least O ( m ) . When the ratio of shuffled data is small, onemay apply the (cid:96) robust regression method of (2) [21]. Otherapproaches use alternating minimization or multi-start gradientdescent to solve (1) [22], [5], an NP-hard problem for n > [3].Due to the high non-convexity such methods are very sensitiveto initialization. This is remedied by the algebraically-initializedexpectation-maximization method of [7], which uses the solutionto the polynomial system of equations mentioned above to obtaina high-quality initialization. This approach is robust to small levelsof noise, efficient for n ≤ , and is able to handle fully shuffleddata; its main drawback is its exponential complexity in n .In the unlabeled sensing case, which may be thought of asshuffled linear regression with outliers, the above methods inprinciple break down. Instead, we are aware of only two relevantalgorithms, which nevertheless are suitable under strong structuralassumptions on the data. The O ( nm n +1 ) method of [25] appliesa brute-force solution, which explicitly relies on the data beingnoiseless and whose theoretical guarantees require a particular exponentially spaced structure on A . On the other hand, [24]attempt to solve min S,x (cid:107) y − SAx (cid:107) , (3)via alternating minimization, with S in (3) being a selectionmatrix . Their main algorithmic insight is to solve for S given x via dynamic programming. However, their algorithm worksonly for order-preserving selection matrices S , a rather stronglimitation, and seems to fail otherwise. It is thus fair to concludethat, to the best of our knowledge, there does not seem to exist asatisfactory algorithm for unlabeled sensing, even for small n . Theory.
In this work we adopt an abstract view of the shuffledlinear regression and unlabeled sensing problems, which naturallyleads us to a more general formulation that we refer to as homomorphic sensing . In homomorphic sensing one is given afinite set T of linear transformations R m → R m (to be called endomorphisms ) and a linear subspace V ⊂ R m of dimension n ,
2. An order-preserving selection matrix is a row-submatrix of the iden-tity matrix. A selection matrix is a row-permutation of an order-preservingselection matrix. This is equivalent to a permutation matrix composed by acoordinate projection. and asks under what conditions the image τ ( v ) of some unknown v ∈ V under some unknown τ ∈ T is enough to uniquelydetermine what v is. This is equivalent to asking under what con-ditions the relation τ ( v ) = τ ( v ) implies v = v whenever τ , τ ∈ T and v , v ∈ V . E.g., in shuffled linear regressionthese endomorphisms are permutations, while in unlabeled sensingthey are compositions of permutations with coordinate projections,and the unlabeled sensing theorem of [1] asserts that a sufficientcondition for unique recovery is that 1) the coordinate projectionspreserve at least n coordinates and 2) V is generic.The first theoretical contribution of this paper is a generalhomomorphic sensing result (Theorem 1) applicable to arbitraryendomorphisms, and thus of potential interest in a broad spectrumof applications. For generic V , the key condition asks that thedimension n of V does not exceed the codimension of a certainalgebraic variety associated with pairs of endomorphisms from T .This in turn can be used to obtain within a principled frameworkthe unlabeled sensing result of [1]. Our second theoretical contri-bution is a recovery result (Theorem 2) for generic points in V (as opposed to all points), which for the unlabeled sensing casesays that the coordinate projections need to preserve at least n + 1 coordinates (as opposed to n ). Algorithms.
Inspired by [24], [25] and [26] we make threealgorithmic contributions. First, we introduce a branch-and-boundalgorithm for the unlabeled sensing problem by globally min-imizing (3). Instead of branching over the space of selectionmatrices, which is known to be intractable [27], our algorithm onlybranches over the space of x ∈ R n , relying on a locally optimalcomputation of the selection matrix via dynamic programming.Second, it is this dynamic programming feature that also allowsus to modify the purely theoretical algorithm of [25] into a robustand efficient method for small dimensions n . These two algorithmsconstitute to the best of our knowledge the first working solutionsfor the unlabeled sensing problem. Third, when customized forimage registration under an affine transformation, our branch-and-bound algorithm is on par with or outperforms state-of-the-artmethods [10], [28], [29] on benchmark datasets. OMOMORPHIC S ENSING : A
LGEBRAIC T HEORY
The main results of this section are Theorems 1, 2. To avoidobscuring the main ideas by algebraic arguments exceeding thescope of a computer science paper, we only sketch some proofsand refer the reader to [30] for the details. All results in this sectionrefer to the noiseless case; an analysis for corrupted data is left tofuture research.
For an integer k , [ k ] = { , . . . , k } . For non-negative real number α , (cid:98) α (cid:99) is the greatest integer k such that k ≤ α . R versus C We work over the complex numbers C . This does not contradictthe fact that in this paper we are primarily interested in R m ,rather it facilitates the analysis. E.g., a matrix T ∈ R m × m maybe diagonalizable over C but not over R . This is the case withpermutations, whose eigenvalues are associated with the complexroots of unity. Hence our philosophy is “check the conditions over C , then draw a conclusion over R ”; see Remark 1. We adopt the terminology of abstract linear algebra [31], since theideas we discuss in this paper are best delivered in a coordinate-free way. The reader who insists on thinking in terms of matricesmay safely replace linear transformations, kernels and images bymatrices, nullspaces and rangespaces, respectively.We work in C m . For a subspace V we denote by dim( V ) itsdimension. For subspaces V , W we say that “ V , W do not inter-sect” if V ∩ W = 0 . An endomorphism is a linear transformation τ : C m → C m ; an automorphism is an invertible endomorphism.We denote by i the identity map i ( w ) = w, ∀ w ∈ C m . If τ isan endomorphism, its kernel ker( τ ) is the set of all v ∈ C m suchthat τ ( v ) = 0 , and its image im( τ ) is the set of all τ ( w ) for w ∈ C m . By rank( τ ) we mean dim(im( τ )) . The preimage of τ ( v ) is the set of all w ∈ C m such that τ ( v ) = τ ( w ) , i.e., the setof all v + ξ , for all ξ ∈ ker( τ ) . If V is a linear subspace, by τ ( V ) we mean the set of all vectors τ ( v ) for all v ∈ V . We denote by E τ,λ the eigenspace of τ associated to eigenvalue λ , i.e., the set ofall v ∈ C m such that τ ( v ) = λv . For τ , τ endomorphisms thegeneralized eigenspace of the pair ( τ , τ ) of eigenvalue λ is theset of all w ∈ C m for which τ ( w ) = λτ ( w ) . By a projection ρ of C m we mean an idempotent ( ρ = ρ ) endomorphism. By acoordinate projection we mean a projection that sets to zero certaincoordinates of C m while preserving the rest. By an algebraic variety (or variety) of C m we mean the zero locusof a set of polynomials in m variables. The study of such varietiesis facilitated by the use of the Zariski topology, in which everyvariety is a closed set. In particular, there is a well-developedtheory in which topological and algebraic notions of dimensioncoincide [32]. This allows us to assign dimensions to sets such asthe intersection of a variety with the complement of another, calledquasi-variety. A linear subspace S is an algebraic variety and itslinear-algebra dimension coincides with its algebraic-geometricdimension. The union A = ∪ i ∈ [ (cid:96) ] S i of (cid:96) linear subspaces is alsoan algebraic variety and dim( A ) = max i ∈ [ (cid:96) ] dim( S i ) . For a vari-ety Y which is defined by homogeneous polynomials dim( Y ) canbe characterized as the smallest number of hyperplanes throughthe origin that one needs to intersect Y with to obtain the origin.For Y a variety of C m , we set codim( Y ) = m − dim( Y ) . The setof all n -dimensional linear subspaces of C m is itself an algebraicvariety of C ( mn ) called Grassmannian and denoted by Gr( n, m ) .By a generic subspace V of dimension n we mean a non-emptyopen subset U ⊂ Gr( n, m ) in the Zariski topology of Gr( n, m ) .Such a U is dense in Gr( n, m ) and if one endows Gr( n, m ) witha continuous probability measure, then U has measure [33].Hence the reader may safely think of a generic subspace as a random subspace . When we say “for a generic V property P istrue”, we mean that the set of all V for which property P is truecontains a non-empty Zariski open subset of Gr( n, m ) . Hence fora randomly drawn V property P will be true with probability .We will make repeated use of the following fact: Lemma 1.
Let Y be a variety defined by homogeneous polynomi-als and V a generic linear subspace. Then dim( Y ∩ V ) = max { dim( V ) − codim( Y ) , } . (4)A property of the Zariski topology that we need is that theintersection of finitely many non-empty open sets of an irreduciblespace (such as Gr( n, m ) ) is open and non-empty. Let
V ⊂ C m be a linear subspace of dimension n and let T be afinite set of endomorphisms of C m . Let v be some point (vector)in V . Suppose that we know the image τ ( v ) ∈ C m of v for someunspecified τ ∈ T . Can we uniquely recover v from knowledgeof V , T and τ ( v ) ? Example 1.
In shuffled linear regression T consists of the m ! permutations on the m coordinates of C m . In unlabeled sensing T consists of the set of all possible combinations of permutationscomposed with coordinate projections. In both cases V is therange-space of some matrix A ∈ C m × n and v = Ax forsome x ∈ C n . The meaning of A and x may vary dependingon the application. E.g., in signal processing/control systems x may be the impulse response of some linear filter, while inimage registration x may represent the parameters of some affinetransformation. The above question motivates the following definition.
Definition 1.
Let
V ⊂ C m be a subspace and T a finite setof endomorphisms of C m . We say that “ v ∈ V is uniquelyrecoverable in V under T ” if whenever τ ( v ) = τ ( v ) forsome τ , τ ∈ T and v ∈ V , then v = v . If this holds forevery v ∈ V , we have “unique recovery in V under T ”. Remark 1.
Let T be a set of endomorphisms of R m . These canalso be viewed as endomorphisms of C m (Theorem 2.29 of [31]).Let V be a subspace of R m with basis v , . . . , v n and V C =Span C ( v , . . . , v n ) the subspace of C m generated by the basis of V . Then unique recovery in V C under T implies unique recoveryin V under T . To build some intuition about the notion of unique recoveryin V under T , consider first the case where T = { i, τ } with i the identity map and τ some automorphism. As a first step, wecharacterize the purely combinatorial condition of Definition 1given in terms of points by the geometric condition of the nextproposition given in terms of subspaces. Lemma 2.
Let τ be any automorphism of C m and let V be anylinear subspace. Then we have unique recovery in V under { i, τ } if and only if V ∩ τ ( V ) ⊂ E τ, .Proof. (Necessity) Suppose that
V ∩ τ ( V ) (cid:54)⊂ E τ, . Then thereexists some v ∈ V ∩ τ ( V ) not inside E τ, . Note that this alsoimplies that v (cid:54)∈ E τ − , . Since v ∈ τ ( V ) there exists some v ∈ V such that v = τ ( v ) . Since v (cid:54)∈ E τ − , we must havethat v (cid:54) = τ − ( v ) = v . Hence v = τ ( v ) with v (cid:54) = v . (Sufficiency) Suppose that v = τ ( v ) for some v , v ∈ V ,whence v ∈ V ∩ τ ( V ) . By hypothesis v ∈ E τ, . Hence τ ( v ) = v . Hence τ ( v ) = τ ( v ) . Since τ is invertible, v = v .Notice that condition V ∩ τ ( V ) ⊂ E τ, of Lemma 2 prevents V ∩ τ ( V ) from intersecting E τ,λ for any λ (cid:54) = 1 . Hence, anecessary condition for unique recovery is V ∩ τ ( V ) to notintersect E τ,λ (cid:54) =1 . Notice that V ∩ τ ( V ) ∩ E τ,λ = 0 if and onlyif V ∩ E τ,λ = 0 . This places a restriction on the dimension of V , for if dim( V ) > codim( E τ,λ ) then V , E τ,λ will necessarilyintersect. Hence, a necessary condition for unique recovery in V under { i, τ } is dim( V ) ≤ min λ (cid:54) =1 codim E τ,λ . (5) Since the algebraic-geometric dimension of a finite union oflinear subspaces is the maximum among the subspace dimensions( § dim( V ) ≤ codim (cid:0) ∪ λ (cid:54) =1 E τ,λ (cid:1) . (6)Then for a generic V satisfying condition (6), Lemma 1 guaranteesthat V ∩ (cid:0) ∪ λ (cid:54) =1 E τ,λ (cid:1) = 0 . It is not hard to show that when τ is diagonalizable and n =dim( V ) is small enough compared to m , condition (6) is alsosufficient for unique recovery in V under { i, τ } : Lemma 3.
Let τ be a diagonalizable automorphism of C m and V generic subspace, dim( V ) ≤ (cid:98) m/ (cid:99) . We have unique recoveryin V under { i, τ } if and only if (6) is true.Proof. ( ⇒ ) By Lemma 2
V ∩ τ ( V ) ⊂ E τ, . Hence V does notintersect E τ,λ for every λ (cid:54) = 1 . This implies (6). ( ⇐ ) Supposefirst that dim( E τ, ) ≤ m − n . The subset X of Gr( n, m ) onwhich V ∩ τ ( V ) = 0 contains the open subset U τ on which dim( V + τ ( V )) = 2 n . Then U τ is non-empty. To see this, notethat there exist n linearly independent eigenvectors w , . . . , w n of τ such that no more than n of them correspond to the sameeigenvalue. Ordering the w i such that eigenvectors correspondingto the same eigenvalue are placed consecutively, we then define v i = w i + w i + n , ∀ i ∈ [ n ] . Then V = Span( v , . . . , v n ) ∈ U τ .Next, suppose that dim( E τ, ) > m − n . Since n ≤ (cid:98) m/ (cid:99) wehave dim( E τ, ) ≥ n . Suppose that v = τ ( v ) for v , v ∈ V .Let B be a basis of C m on which τ is represented by a diagonalmatrix T ∈ C m × m , and let V ∈ C m × n and V ξ , V ξ ∈ C m bethe corresponding representations of a basis of V , and of v , v respectively, with ξ , ξ ∈ C n . Then the equation v = τ ( v ) is equivalent to V ξ = T V ξ . Since dim( E τ, ) ≥ n we mayassume without loss of generality that the first n diagonal elementsof T are equal to . Letting V ∈ C n × n be the top n × n submatrixof V , this implies that V ξ = V ξ . Then V is invertible on anon-empty open subset U (cid:48) τ of Gr( n, m ) , on which v = v . Thus V ∩ τ ( V ) ⊂ E τ, , ∀V ∈ U (cid:48) τ . In conclusion, there is an openset U = U τ or U = U (cid:48) τ , such that for any V ∈ U we have V ∩ τ ( V ) ⊂ E τ, , and so we are done by Lemma 2.The extension to multiple automorphisms follows fromLemma 3 and the fact that the intersection of finitely many non-empty open sets of Gr( n, m ) is non-empty and open: Proposition 1.
Let T be a finite set of automorphisms of C m suchthat for any τ , τ ∈ T we have that τ − τ is diagonalizable.Let V be a generic subspace with dim( V ) ≤ (cid:98) m/ (cid:99) . We haveunique recovery in V under T if and only if (6) is true for every τ = τ − τ with τ , τ ∈ T . Even though the invertibility and diagonalizability requirementof Proposition 1 may seem too strong, it is satisfied by ourcanonical example where T is the set of m ! permutations on the m coordinates of C m . In fact, more is true:
3. Lemma 3 (with a different proof) is the main insight of the parallel andindependent work of [18]. That work studies the same problem as the presentpaper, but focuses only on diagonalizable automorphisms. On the other hand,it has the advantage that it considers countably many automorphisms, whilehere we only consider finitely many.
Proposition 2.
Let T be the permutations on the m coordinates of C m . Then dim( E π,λ ) ≤ m − (cid:98) m/ (cid:99) , ∀ π ∈ T , ∀ λ (cid:54) = 1 . Hence,for generic subspace V with dim( V ) ≤ (cid:98) m/ (cid:99) we have uniquerecovery in V under T .Proof. This follows from basic structural facts about permuta-tions. Let π ∈ T be a permutation. Then π is the product of c ≥ disjoint cycles, say π = π · · · π c . Suppose that cycle π i cycles m i coordinates, i.e., it has length m i . Since the cycles are disjointwe have m = (cid:80) ci =1 m i . Now, each cycle is diagonalizable with m i eigenvalues equal to the m i complex roots of unity, i.e., theroots of the equation x m i = 1 . Since the cycles are disjoint, thedimensions of the eigenspaces of π are counted additively acrosscycles. Hence for λ (cid:54) = 1 the dimension of E π,λ is at most equalto the number of cycles of length at least . But the number ofsuch cycles is at most (cid:98) m/ (cid:99) . Hence dim( E π,λ ) ≤ (cid:98) m/ (cid:99) . But (cid:98) m/ (cid:99) ≤ m − (cid:98) m/ (cid:99) , i.e., dim( E π,λ ) ≤ m − (cid:98) m/ (cid:99) . The restof the statement is a corollary of Proposition 1. The arguments that led to Proposition 1 relied heavily on theinvertibility of the endomorphisms in T . This is because in thatcase unique recovery in V under { τ , τ } is equivalent to uniquerecovery in V under { i, τ − τ } , where i is the identity map. It wasthis feature that helped us understand the homomorphic sensingproperty of Definition 1 in terms of V intersecting its image τ − τ ( V ) . In turn, the key objects controlling this intersectionturned out to be the eigenspaces of τ = τ − τ correspondingto eigenvalues different than , as per Lemma 3, whose proofhowever made explicit use of the diagonalizability of τ . As aconsequence, generalizing Proposition 1 to arbitrary endomor-phisms for which τ might not even be invertible, let alone τ − τ diagonalizable, is not straightforward. Example 2.
In unlabeled sensing, a permutation composed with acoordinate projection is in general neither invertible nor diagonal-izable. E.g., consider a cycle π of length , a coordinate projection ρ onto the first two coordinates, and their composition ρπ : π = , ρ = , ρπ = . First, rank( ρπ ) = 2 so that ρπ is not invertible. Secondly, ρπ is nilpotent, i.e., ( ρπ ) = 0 , and so the only eigenvalue of ρπ iszero. This means that ρπ is similar to a × Jordan block ofeigenvalue , i.e., ρπ is far from diagonalizable. We proceed by developing two devices. The first one is ageneralization of Lemma 3 and overcomes the challenge of thepotential non-diagonalizability of the endomorphisms.
Lemma 4.
Let V be a generic subspace with dim( V ) ≤ (cid:98) m/ (cid:99) ,and τ any endomorphism of C m for which (6) is true. Then wehave unique recovery in V under { i, τ } .Proof. (Sketch) The arguments are similar in spirit with those inthe proof of Lemma 3 but technically more involved. Let n =dim( V ) . The difficult part is when dim( E τ, ) ≤ m − n , wherewe prove the existence a non-empty open set U of Gr( n, m ) , suchthat for every V ∈ U we have dim( V + τ ( V )) = n + rank( τ ) ,which is the maximal dimension that the subspace V + τ ( V ) canhave. In analogy with the diagonalizable case, this can be done by working with the Jordan canonical form of τ and constructing a V = Span( v , . . . , v n ) ∈ U for which the v i are suitably paired(generalized) eigenvectors of τ .Our second device overcomes the challenge of potential lack ofinvertibility. We need some notation. Let τ , τ be endomorphismsof C m and let ρ be a projection onto im( τ ) . Define the variety Y ρτ ,τ as the set of w ∈ C m for which ρτ ( w ) and τ ( w ) arelinearly dependent, i.e., Y ρτ ,τ = (cid:8) w : dim (cid:0) Span( ρτ ( w ) , τ ( w )) (cid:1) ≤ (cid:9) . (7)This is indeed a variety because if τ , τ , ρ are represented bymatrices T , T , P , then Y ρτ ,τ is defined by the vanishingof all × minors of the matrix [ P T w T w ] , which arequadratic polynomials in w . If w ∈ Y ρτ ,τ then there ex-ists some λ w ∈ C such that either τ ( w ) = λ w ρτ ( w ) or ρτ ( w ) = λ w τ ( w ) . Hence Y ρτ ,τ is the union of all generalizedeigenspaces of the endomorphism pairs ( ρτ , τ ) and ( τ , ρτ ) .Note that ker( ρτ − τ ) is the generalized eigenspace correspond-ing to eigenvalue , while ker( ρτ ) , ker( τ ) are the generalizedeigenspaces of ( ρτ , τ ) , ( τ , ρτ ) respectively, of eigenvalue .In analogy with the automorphism case of Lemma 2 where theeigenspace of eigenvalue was irrelevant for unique recovery,it turns out that in general the same is true for the generalizedeigenspaces of eigenvalues and . Removing their union Z ρτ ,τ = ker( τ ) ∪ ker( ρτ ) ∪ ker( ρτ − τ ) , (8)from Y ρτ ,τ yields the quasi-variety U ρτ ,τ = Y ρτ ,τ \ Z ρτ ,τ . (9) U ρτ ,τ plays an analogous role with ∪ λ (cid:54) =1 E τ,λ when τ is anautomorphism. More precisely, in analogy with (6), the nexttheorem shows that the condition that controls homomorphicsensing in general is dim( V ) ≤ codim (cid:0) U ρτ ,τ (cid:1) . (10) Theorem 1.
Let T be a finite set of endomorphisms of C m suchthat for every τ ∈ T we have rank( τ ) ≥ n , for some n ≤(cid:98) m/ (cid:99) . Then for a general subspace V of dimension n , we haveunique recovery in V under T as long as for every τ , τ ∈ T there is a projection ρ onto im( τ ) such that for U ρτ ,τ definedin (9) condition (10) is true.Proof. (Sketch) The key idea for the case T = { τ , τ } is to view V as a generic n -dimensional subspace of a generic k -dimensionalsubspace H , where k = rank( τ ) . Then τ | H is an isomorphismfrom H onto im( τ ) , and so unique recovery in V under { τ , τ } follows from unique recovery in V under { i | H , τ H } , with τ H = (cid:0) τ | H (cid:1) − ρτ | H endomorphism of H . By Lemma 4 we are doneif dim (cid:0) E τ H ,λ (cid:1) ≤ k − n, ∀ λ (cid:54) = 1 . Let τ H ( w ) = λw , then τ (cid:0) τ | H (cid:1) − ρτ ( w ) = λτ ( w ) . Now, τ (cid:0) τ | H (cid:1) − ρ = ρ , thus ρτ ( w ) = λτ ( w ) . Hence, E τ H ,λ ⊂ (cid:0) U ρτ ,τ ∩ H (cid:1) and the restfollows from dimension considerations.In unlabeled sensing the endomorphisms in T have the form ρπ , where π is a permutation and ρ is a coordinate projection.Then as per Theorem 1 if dim( V ) ≤ codim (cid:0) U ρ ρ π ,ρ π (cid:1) one has unique recovery in V under { ρ π , ρ π } . Furthermore,via a combinatorial algebraic-geometric argument we obtain aconvenient lower bound on codim (cid:0) U ρ ρ π ,ρ π (cid:1) : Proposition 3.
Let π , π be permutations on the m coordinatesof C m and ρ , ρ coordinate projections. For U ρ ρ π ,ρ π de-fined in (9) we have (cid:98) rank( ρ ) / (cid:99) ≤ codim (cid:0) U ρ ρ π ,ρ π (cid:1) . (11)As a consequence of Theorem 1 and Proposition 3 one hasunique recovery in the unlabeled sensing case as long as thedimension of V does not exceed half the number of the coordinatespreserved by each coordinate projection. This is precisely theresult of [1] which they obtained by attacking the problem directlyvia ingenious yet complicated combinatorial arguments. Eventhough our proof is not necessarily less complicated, it has theadvantage of using a framework that generalizes relatively easily.For example, one may consider entry-wise sign corruptions on topof coordinate projections and permutations. In such a case, it is nothard to show that for the same condition as for unlabeled sensingone has unique recovery up to a sign. In general, even thoughanalytically computing codim (cid:0) U ρτ ,τ (cid:1) may be challenging,performing this computation in an algebraic geometry softwareenvironment such as Macaulay2 is in principle straightforward.
Often the requirement that every v ∈ V is uniquely recoverable isunnecessarily strict. Instead, it may be of interest to ask whetherunique recovery holds true for a generic v ∈ V . In such a situationa less demanding technical analysis gives unique recovery undermuch weaker conditions: Theorem 2.
Let T be a finite set of endomorphisms of C m . Thena generic point v inside a generic subspace V of dimension n isuniquely recoverable in V under T as long as 1) rank( τ ) ≥ n +1 for every τ ∈ T , and 2) no two endomorphisms in T are a scalarmultiple of each other.Proof. (Sketch) Let V ∈ C m × n be a basis of V . If τ ( v ) = τ ( v ) then τ ( v ) ∈ τ ( V ) and so rank([ T V T V ξ ]) ≤ n for ξ ∈ C n with v = V ξ . The proof then proceeds by exhibiting,for τ (cid:54) = τ , a V and a ξ for which rank([ T V T V ξ ]) = n + 1 .This implies that for generic V and v ∈ V , τ ( v ) = τ ( v ) for v ∈ V only when τ = τ . In that case v − v ∈ ker( τ ) , andLemma 1 implies that v − v = 0 .A consequence of Theorem 2 is the unique recovery of ageneric vector in the unlabeled sensing case as soon as thecoordinate projections preserve at least n + 1 entries: Corollary 1.
Let T be the set of endomorphisms of C m such thatevery τ ∈ T has the form τ = ρπ , where π is a permutationand ρ a coordinate projection. Then for a generic subspace V ofdimension n , and a generic v ∈ V , we have unique recovery of v in V under T , as long as rank( ρ ) ≥ n + 1 for every ρπ ∈ T . LGORITHMS AND A PPLICATION
In this section we propose a globally optimal method for theunlabeled sensing problem, by minimizing (3) via a dynamic-programming based branch-and-bound scheme. Both ingredientsare standard, and we just describe how to combine them; see[34], [26] for transparent discussions of branch-and-bound inrelated contexts. Set f ( x, S ) = (cid:107) y − SAx (cid:107) , with x ∈ R n and S a selection matrix. As branching over the space of selec-tion/permutation matrices S is known to be inefficient [27], the crucial aspect of our approach is to branch only over the spaceof x , while relying on a local computation of the optimal S ,say S x , given x . Here is where dynamic programming comesinto play: [24] showed that if there exists an order-preserving S x such that f ( x, S x ) = min S (cid:107) y − SAx (cid:107) , then S x can becomputed via dynamic programming at a complexity O ( mk ) .At first sight this does not generalize to any y, A, x as noneof the minimizers over S is expected to be order-preserving.However, if we order y, Ax in descending order to obtain say y ↓ , ( Ax ) ↓ , then 1) there is an order-preserving selection matrix S (cid:48) x such that (cid:13)(cid:13) y ↓ − S (cid:48) x ( Ax ) ↓ (cid:13)(cid:13) = min S (cid:13)(cid:13) y ↓ − S ( Ax ) ↓ (cid:13)(cid:13) , and2) S x can be easily obtained from S (cid:48) x . In conclusion, given x we can compute S x in O ( mk ) ; this is in sharp contrast to otherlinear assignment algorithms such as the Hungarian algorithm,most of which have complexity O ( m ) [36]. Finally, our strategybecomes that of computing an upper bound of f in a hypercubewith center x via alternating minimization between x and S ,initialized at x . Computing a tight lower bound (cid:96) of f for agiven hypercube is challenging and our choice here is a crude one: (cid:96) = (cid:107) y − S x Ax (cid:107) − σ ( A ) (cid:15) , where (cid:15) is half the hypercubediagonal and σ ( A ) is the largest singular value of A . We refer tothis as Algorithm-A . It turns out that the dynamic programming trick of § ¯ y of y of length n , and for each A i out of the m ! / ( m − n )! many n × n matrices that can bemade by concatenating different rows of A in any order we let x i = A † i ¯ y . We then use dynamic programming to select the x i with the lowest assignment error min S (cid:107) y − SAx i (cid:107) . This is analgorithm of complexity O ( km n +1 ) , we call it Algorithm-B . We compare the proposed 1)
Algorithm-A of § Algorithm-B of § § A, x, ε with n = 3 , m = 100 and σ = 0 . for the noise. For shuffled linear regression ( k = m ) wecompare with 3) Slwasky19 that solves (2), 4)
Tsakiris18 which is the algebraic-geometric method of [7] and 3)
Abid18 which performs alternating minimization on (1) via least-squaresand sorting [5] . For unlabeled sensing ( k ≤ m ) we compare with4) Haghighatshoar18 [24]. As seen from Fig. 1 the proposedmethods perform uniformly better and often by a large marginthan the other methods, when tested in their robustness againstthe percentage of shuffled data, outlier ratio and noise level. Inparticular, we see that for the unlabeled sensing problem of Figs.1b-1d
Algorithm-A and
Algorithm-B are the only workingsolutions. Encouraging as these results may be, we do note thatan important weakness of these methods is their scalability: for
Algorithm-B this is more of an inherent issue due to its brute-force nature, while for
Algorithm-A it is due to its naive lowerbounding scheme: the consequence of it being far from tightmanifests itself at higher dimensions ( n ≥ ) or large outlier
4. That such an assignment problem can be solved via dynamic program-ming was already known by [35].5. The soft-EM algorithm of [5] consistently fails in our experiment, thuswe do not include it in the figure. ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1
0% 20% 40% 60% 80% 100%
Percentage of Shuffled Data E s t i m a t i on E rr o r ● Tsakiris18Abid18Slawski19Algorithm−AAlgorithm−B (a) shuffled linear regression ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1
0% 20% 40% 60% 80% 100%
Percentage of Shuffled Data E s t i m a t i on E rr o r ● Haghighatshoar18Algorithm−AAlgorithm−B (b) unlabeled sensing ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1
0% 10% 20% 30% 40% 50%
Outlier Ratio E s t i m a t i on E rr o r ● Haghighatshoar18Algorithm−AAlgorithm−B (c) unlabeled sensing ● ● ● ● ● ● ● ● ● ● −3 −2 −1 Noise Level E s t i m a t i on E rr o r ● Haghighatshoar18Algorithm−AAlgorithm−B (d) unlabeled sensing
Fig. 1: Relative error vs. % of shuffled data, outlier ratio ( − k/m )and noise level ( σ ). n = 3 , m = 100 , k = 80 , σ = 0 . , trials. Proposed algorithms in red.ratios ( k (cid:28) m ), in which the method becomes too slow: it runs in sec for k = m = 100 , sec for k = 80 and min for k = 50 . In contrast, Algorithm-B is immune to k and runs inabout sec. Registering point sets
P, Q between two images is a classicalproblem in computer vision. Assuming that
P, Q are related byan affine transformation T and that each point in P ( modelset ) has a counterpart in Q ( scene set ) [10], jointly searchingfor the affine transformation and the registration can be doneby minimizing the function F ( T, S ) = (cid:107)
P T − SQ (cid:107) F , where Q ∈ R m × , P ∈ R k × , T ∈ R × , with homogeneous co-ordinates used for P , and S is a selection matrix. This is amatrix version of the unlabeled sensing objective function (3).Our contribution here is to adjust algorithm of § -dimensional space to compute T , i.e., n = 6 . This does notcontradict the remark of the previous section regarding scalability:the key here is that each point correspondence imposes twoconstraints on T (as opposed to one constraint in the general case),so that, loosely speaking, the effective outlier ratio is − k/m (as opposed to − k/m ). As we will soon see, this has a crucialeffect on performance. Finally, as dynamic programming is notapplicable to obtain S given T , we employ a standard linearassignment algorithm of complexity O ( m ) [38]. We refer to thisalgorithm as Algorithm-C .We compare 1)
Algorithm-C with state-of-the-art imageregistration techniques, i.e., 2)
CPD [29], 3)
GMMREG [28], and4)
APM [10], using a subset of the benchmark datasets used by[10]. Since
APM is the most competitive among these last three,we let it run to convergence with a tolerance parameter of .
6. In this experiment
Algorithm-A stops splitting a hypercube when a depth for that hypercube has been reached. Run on an Intel(R) i7-8650U,1.9GHz, 16GB machine. ● ● ● ● ● ● ● ● ● ● ● −2 −1 R eg i s t r a t i on E rr o r ● GMMREGCPDAPMAlgorithm−C (a) 2D rotation ● ● ● ● ● ● ● ● ● ● −2 −1 R eg i s t r a t i on E rr o r ● GMMREGCPDAPMAlgorithm−C (b) affine ● ● ● ● ● ● ● ● ● ● ● −2 −1
0% 10% 20% 30% 40% 50%Outlier Ratio R eg i s t r a t i on E rr o r ● GMMREGCPDAPMAlgorithm−C (c) affine
Category R eg i s t r a t i on E rr o r GMMREGCPDAPMAlgorithm−C (d) real images ● ● ● ● ● ● ● ● ● ● ● R unn i ng T i m e ( S e c ) ● APMAlgorithm−C (e) affine ● ● ● ● ● ● ● ● ● ● ● −16 −2
0% 10% 20% 30% 40% 50%Outlier Ratio R eg i s t r a t i on E rr o r ● APMAlgorithm−C (f) affine
Fig. 2: Image registration using synthetic benchmark dataset
TheChinese Character [37] (2a-2c,2e-2f, trials) and the collectionof real images used in [10] (2d). Proposed method in red.and set its running time as a time budget for our method;
CPD and
GMMREG are local methods and they run very fast. Whenthe affine transformation is a rotation (Fig. 2a)
CPD , GMMREG only work for small angles, while they fail for general affinetransformations (Figs. 2b-2c). On the other hand,
Algorithm-C performs comparably to APM with the following twist in Fig.2c: when the outlier ratio is small,
APM converges very quicklyresulting to an inadequate time budget for our method. Conversely,when the outlier ratio is large,
APM ’s accuracy becomes inadequatewhile it is slow enough for our method to perform even better thanfor fewer outliers. These running times for
APM are shown in Fig.2e where the same experiment is run for noiseless data:
APM stilluses a tolerance of . while to make a point we set the toleranceof our method to zero and let it terminate. As seen in Figs. 2e-2f, our method terminates significantly faster than APM for largeoutlier ratios, suggesting that its branch-and-bound structure mayhave an advantage over that of
APM . A CKNOWLEDGEMENT
The first author is grateful to Dr. Aldo Conca for first noting andsharing the phenomenon of Lemmas 2 and 3. We also thank Dr.Laurent Kneip for suggesting the case study of image registrationunder general affine transformations. R EFERENCES [1] J. Unnikrishnan, S. Haghighatshoar, and M. Vetterli, “Unlabeled sensingwith random linear measurements,”
IEEE Transactions on InformationTheory , vol. 64, no. 5, pp. 3237–3253, 2018.[2] D. J. Hsu, K. Shi, and X. Sun, “Linear regression without correspon-dence,” in
Advances in Neural Information Processing Systems (NIPS) ,2017, pp. 1531–1540.[3] A. Pananjady, M. J. Wainwright, and T. A. Courtade, “Linear regressionwith shuffled data: Statistical and computational limits of permutationrecovery,”
IEEE Transactions on Information Theory , vol. 64, no. 5, pp.3286–3300, 2018.[4] X. Song, H. Choi, and Y. Shi, “Permuted linear model for header-freecommunication via symmetric polynomials,” in
International Symposiumon Information Theory (ISIT) , 2018.[5] A. Abid and J. Zou, “Stochastic em for shuffled linear regression,”arXiv:1804.00681v1 [stat.ML], Tech. Rep., 2018.[6] J. Unnikrishnan, S. Haghighatshoar, and M. Vetterli, “Unlabeled sensing:solving a linear system with unordered measurements,” in ,2015.[7] M. C. Tsakiris, L. Peng, A. Conca, L. Kneip, Y. Shi, and H. Choi,“An algebraic-geometric approach to shuffled linear regression,”arXiv:1810.05440v1 [cs.LG], Tech. Rep., 2018.[8] P. Lahiri and M. D. Larsen, “Regression analysis with linked data,”
Journal of the American Statistical Association , vol. 100, no. 469, pp.222–230, 2005.[9] X. Shi, X. Li, and T. Cai, “Spherical regression under mismatchcorruption with application to automated knowledge translation,”arXiv:1810.05679v1 [stat.ME], Tech. Rep., 2018.[10] W. Lian, L. Zhang, and M.-H. Yang, “An efficient globally optimalalgorithm for asymmetric point mathcing,”
IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 39, no. 7, pp. 1281–1293, 2017.[11] A. B. Poore and S. Gadaleta, “Some assignment problems arisingfrom multiple target tracking,”
Mathematical and Computer Modelling ,vol. 43, no. 9, pp. 1074 – 1091, 2006.[12] P. David, D. DeMenthon, R. Duraiswami, and H. Samet, “Softposit:Simultaneous pose and correspondence determination,”
InternationalJournal of Computer Vision , vol. 59, no. 3, pp. 259–284, 2004.[13] M. Marques, M. Stosic, and J. Costeira, “Subspace matching: Uniquesolution to point matching with geometric constraints,” in
InternationalConference on Computer Vision (ICCV) , 2009, pp. 1288–1294.[14] A. Pananjady, M. J. Wainwright, and T. A. Courtade, “Denoising linearmodels with permuted data,” in
IEEE International Symposium onInformation Theory (ISIT) , 2017, pp. 446–450.[15] L. Peng, X. Song, M. Tsakiris, H. Choi, L. Kneip, and Y. Shi,“Algebraically-initialized expectation maximization for header-free com-munication,” in
IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP) , 2019, pp. 5182–5186.[16] A. Narayanan and V. Shmatikov, “Robust de-anonymization of largesparse datasets,” in
IEEE Symposium on Security and Privacy , 2008,pp. 111–125.[17] L. Keller, M. J. Siavoshani, C. Fragouli, K. Argyraki, and S. Diggavi,“Identity aware sensor networks,” in
IEEE INFOCOM , 2009, pp. 2177–2185.[18] I. Dokmani´c, “Permutations unlabeled beyond sampling unknown,”
IEEESignal Processing Letters , vol. 26, no. 6, pp. 823–827, 2019.[19] A. Balakrishnan, “On the problem of time jitter in sampling,”
IRETransactions on Information Theory , vol. 8, no. 3, pp. 226–236, 1962.[20] I. P. Fellegi and A. B. Sunter, “A theory for record linkage,”
Journal ofthe American Statistical Association , vol. 64, no. 328, pp. 1183–1210,1969.[21] M. Slawski and E. Ben-David, “Linear regression with sparsely permuteddata,”
Electronic Journal of Statistics , vol. 13, no. 1, pp. 1–36, 2019.[22] A. Abid, A. Poon, and J. Zou, “Linear regression with shuffled labels,”arXiv:1705.01342v2 [stat.ML], Tech. Rep., 2017.[23] A. Pananjady, M. J. Wainwright, and T. A. Courtade, “Linear regressionwith shuffled data: Statistical and computational limits of permutationrecovery,” in , 2016, pp. 417–424.[24] S. Haghighatshoar and G. Caire, “Signal recovery from unlabeled sam-ples,”
IEEE Transactions on Signal Processing , vol. 66, no. 5, pp. 1242–1257, 2018.[25] G. Elhami, A. Scholefield, B. B. Haro, and M. Vetterli, “Unlabeledsensing: Reconstruction algorithm and theoretical guarantees,” in
IEEEInternational Conference on Acoustics, Speech and Signal Processing(ICASSP) , 2017, pp. 4566–4570. [26] J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-icp: A globally optimalsolution to 3d icp point-set registration,”
IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 38, no. 11, pp. 2241–2254, 2016.[27] H. Li and R. Hartley, “The 3d-3d registration problem revisited,” in
International Conference on Computer Vision (ICCV) , 2007, pp. 1–8.[28] B. Jian and B. C. Vemuri, “Robust point set registration using gaussianmixture models,”
IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 33, no. 8, pp. 1633–1645, 2011.[29] A. Myronenko and X. Song, “Point set registration: coherent point drift,”
IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 32,no. 12, pp. 2262–2275, 2010.[30] M. C. Tsakiris, “Eigenspace conditions for homomorphic sensing,”arXiv:1812.07966v3 [math.CO], Tech. Rep., 2019.[31] S. Roman,
Advanced Linear Algebra . Springer, 2008.[32] R. Hartshorne,
Algebraic Geometry . Springer, 1977.[33] P. B¨urgisser and F. Cucker, “The geometry of numerical algorithms,”
Springer Science & Business Media , vol. 349, 2013.[34] V. Emiya, A. Bonnefoy, L. Daudet, and R. Gribonval, “Compressed sens-ing with unknown sensor permutation,” in
IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) , 2014, pp. 1040–1044.[35] A. Aggarwal, A. Bar-Noy, S. Khuller, D. Kravets, and B. Schieber, “Ef-ficient minimum cost matching using quadrangle inequality,” in
AnnualSymposium on Foundations of Computer Science , 1992, pp. 583–592.[36] R. E. Burkard, M. Dell’Amico, and S. Martello,
Assignment Problems .Springer, 2009.[37] H. Chui and A. Rangarajan, “A new point matching algorithm for non-rigid registration,”
Comput. Vis. Image Underst. , vol. 89, no. 2-3, pp.114–141, 2003.[38] R. Jonker and A. Volgenant, “A shortest augmenting path algorithm fordense and sparse linear assignment problems,”