[PDF] Localizing Unsynchronized Sensors with Unknown Sources

Abstract

We propose a method for sensor array self-localization using a set of sources at unknown locations. The sources produce signals whose times of arrival are registered at the sensors. We look at the general case where neither the emission times of the sources nor the reference time frames of the receivers are known. Unlike previous work, our method directly recovers the array geometry, instead of first estimating the timing information. The key component is a new loss function which is insensitive to the unknown timings. We cast the problem as a minimization of a non-convex functional of the Euclidean distance matrix of microphones and sources subject to certain non-convex constraints. After convexification, we obtain a semidefinite relaxation which gives an approximate solution; subsequent refinement on the proposed loss via the Levenberg-Marquardt scheme gives the final locations. Our method achieves state-of-the-art performance in terms of reconstruction accuracy, speed, and the ability to work with a small number of sources and receivers. It can also handle missing measurements and exploit prior geometric and temporal knowledge, for example if either the receiver offsets or the emission times are known, or if the array contains compact subarrays with known geometry.

Full PDF

11 Localizing Unsynchronized Sensors with UnknownSources

Dalia El Badawy, Viktor Larsson, Marc Pollefeys, and Ivan Dokmani´c

Abstract —We propose a method for sensor array self-localization using a set of sources at unknown locations. Thesources produce signals whose times of arrival are registeredat the sensors. We look at the general case where neither theemission times of the sources nor the reference time framesof the receivers are known. Unlike previous work, our methoddirectly recovers the array geometry, instead of ﬁrst estimatingthe timing information. The key component is a new lossfunction which is insensitive to the unknown timings. We castthe problem as a minimization of a non-convex functional of theEuclidean distance matrix of microphones and sources subject tocertain non-convex constraints. After convexiﬁcation, we obtaina semideﬁnite relaxation which gives an approximate solution;subsequent reﬁnement on the proposed loss via the Levenberg-Marquardt scheme gives the ﬁnal locations. Our method achievesstate-of-the-art performance in terms of reconstruction accuracy,speed, and the ability to work with a small number of sources andreceivers. It can also handle missing measurements and exploitprior geometric and temporal knowledge, for example if eitherthe receiver offsets or the emission times are known, or if thearray contains compact subarrays with known geometry.

I. I

NTRODUCTION

In MIMO radar [1], ultrasound imaging [2], acoustic imag-ing [3], underwater acoustics [4], and room acoustics [5], [6],a collection of sources emit signals that are then capturedby the receivers. In these applications, we often need anaccurate knowledge of the geometry of the source and receiverpositions to proceed with common array processing algorithmssuch as beamforming and source localization. Knowing thesensor array geometry similarly enables the reconstructionof physical ﬁelds in environmental monitoring using ad hocsensor networks [7], [8]. Further, knowing device locationsenables a host of location-based services in the context of theInternet of Things [9]. The recurring quintessential problem isthus to efﬁciently estimate the geometry of a set of nodes.We study estimating the geometry of the nodes from thetimes of arrival (TOAs) of the source signals. The setup isillustrated in Figure 1 where receivers register TOAs of sourceevents. Some events can be (near-)collocated with the nodes,such is the case with transceivers. If the sources are anchorswith known positions, locating the nodes becomes an exercisein geometry: intersecting spheres or hyperboloids depending

In line with the philosophy of reproducible research, all code and datato reproduce the results of this paper are available at http://github.com/swing-research/xtdoa.Work done while D. El Badawy was a student at EPFL, Switzerland.V. Larsson is with the Department of Computer Science, ETH Zurich.M. Pollefeys is with the Department of Computer Science, ETH Zurich andMicrosoft Mixed Reality and AI Lab Zurich.I. Dokmani´c is with the Department of Mathematics and Computer Science,University of Basel. on whether or not the devices are synchronized. However,the most general and practically appealing setup is whenthe sources are at arbitrary, unknown locations. This enablesone to use signals of opportunity such as speech, transientsounds, or radio signals [10]. But it also means that receiversare no longer synchronized with the sources. To complicatethings further, receivers themselves do not have to be mutuallysynchronized. While in many audio applications, microphonesare connected to a common interface, we are also surroundedby ad hoc networks of smartphones and voice-based assistants.Thus, not only do we measure times of arrival as opposed toabsolute times of ﬂight, but those registered times of arrivalfurther depend on the unknown time reference frame of eachreceiver. In this paper, we introduce an algorithm for jointlylocalizing a set of receivers and a set of sources from measuredTOAs in the general case when all nodes are not synchronized:sources go off at unknown times and the reference time frameof each receiver can be different and unknown.

A. Related Work

Among the earliest work on self-localization are the papersof Rockah and Schultheiss [11], [12] and Weiss and Fried-lander [13] from the 1980s. Plinge et al. [14] provide an in-depth overview of microphone array localization techniques.A systematization of existing approaches to self-localizationis also given by Wang et al. [15]. In the following, we reviewsome of the major points, related to localization in differentsetups.In some cases, the pairwise distance between all the nodescan be estimated. This happens for example when the nodescan both send and receive [16] or by measuring the diffusenoise coherence [17]. Localization then amounts to multidi-mensional scaling (MDS) [18], [19], [20], [21].A more common situation in audio applications is that thenodes can only receive or only send. The “sending” nodes neednot be real devices; they can be any acoustic events or signalsof opportunity. We can distinguish the case when the sourcesand the receivers are synchronized so we can estimate thesource–receiver distances, or the various cases when sources,receivers, or both sources and receivers are not synchronized. a) Known Source Emission Times and Receiver Offsets:

If the emission times happen to be known and the nodes andevents all have a common time reference (that is, they are syn-chronized) , then the TOAs correspond to times of ﬂight anddirectly give the distances between the nodes and the events.Given these distances, the joint localization problem reduces Additionally, the internal delays of the receivers are known. a r X i v : . [ ee ss . SP ] F e b

12 3 321

Fig. 1. Setup. Source (people or loudspeakers) events are captured by microphones. The times of arrival t mk at each microphone depend on the sourceemission times τ k as well as the microphone’s own time of reference. Some entries might be missing. to multidimensional unfolding [22], [23]. Some methods arebased on direct optimization of the maximum likelihood (ML)criterion [13], but these often fail due to the non-convexityof the objective and the existence of bad local minima [15].Other methods are based on Euclidean distance matrices andsemideﬁnite programming (SDP) [24].Crocco et al. [23] proceed by constructing a certain low-rank matrix of differences of squared TOAs from which thepositions can be recovered by solving a low-dimensional non-convex optimization problem. The low-rank matrix of differ-ences of squared TOAs [23] is related to the (translated) sourcepositions S and receiver positions R as H ≈ ¯ R T ¯ S , wherethe latter is the matrix of inner products between centered receiver points and centered source points. Since we have that H = ¯ R T QQ − ¯ S for any invertible matrix Q , ﬁnding thecorrect relative geometry amounts to identifying the right Q and the difference vector between the center of R and thecenter of S . In our approach, we avoid this step by workingwith the full point set X = [ R ; S ] and the correspondingGram matrix, instead of splitting X into R and S . The Grammatrix and its constraints then automatically glue together thesources and the receivers (see Section III-A). b) When One Set of Times is Known: A more realisticscenario is when the source emission times are unknown anddifferent but the microphones are synchronized. The reversecase is also relevant: when the receivers are not synchronizedbut the source emission times are known. This is the case ifwe use a distributed sensor array.The two unknowns (geometry and one set of times) canbe estimated by directly optimizing an objective, for exampleusing majorization [25]. Again, this often fails due to the non-convexity of the problem and cannot work with near-minimalconﬁgurations.An alternative approach is to estimate the unknowns sequen-tially, ﬁrst recovering the unknown times, and then using themto recover the geometry [26], [27]. Early work of Pollefeys andNister [28] exploits the low rank of a certain matrix of squaredTOA differences. Their work is a near-ﬁeld generalization ofthe work of Thrun [29], which was also adapted to workwith missing measurements [30], [31]. Heusdens and Gaubitchpropose a more robust scheme based on structured total-least- squares [32] to reconstruct the times. c) Unknown Emission Times and Receiver Offsets: Theclosest to our work is the method of Wang et al. [15] who treatthe most general case with unknown source emission times andreceiver offsets. Same as the methods that address the caseof one unknown set of offsets [32], Wang et al. proceed intwo stages. The timing information is estimated via structuredleast-squares (Gauss–Newton), noting that with the correcttiming estimates a certain matrix computed from the measuredT(D)OA measurements must become low rank. This timingestimation procedure may require numerous random restarts.Once the timings are estimated, they proceed as Crocco et al.[23] to recover the geometry. d) Convex Relaxations:

In the context of sensor networklocalization (with ﬁxed anchor nodes), Biswas et al. [33],[34] proposed to use SDP relaxations to handle the non-convexity introduced by the square root in the distancemeasurements. Similar relaxations have since also been usedfor source node localization from TDOA measurements (seee.g. Yang et al. [35], Vagheﬁ and Beuhrer [36]) and mixedTDOA/FDOA measurements (Wang et al. [37]). Yang et al.[35] also proposed a SOCP relaxation; however, it requiresthat the target node lies within the convex hull of the anchors.Jiang et al. [38] propose to use Truncated Nuclear NormRegularization (TNNR) to solve the TDOA self-calibrationproblem. Their optimization scheme consists of solving asequence of convex surrogate problems based on the (convex)nuclear norm. We propose SDP relaxations for the full self-calibration problem, where both senders are receivers areunknown and not synchronized. e) Minimal Estimation Problems:

A series of workconsiders minimal problems in TOA, where the goal is toestimate the unknowns from as few measurements as possible.Following up on their work on minimal problems for TOAmeasurements [39], Kuang et al. [40] propose a stratiﬁedapproach for estimating the unknown time offsets from TDOAmeasurements. Once the offsets are recovered, the solversfrom their previous work [39] can be applied to recover thesender and receiver positions. Zhayida et al. [41] (and later Heusdens and Gaubitch in fact address the case when the microphonedelays are unknown and the source is periodic; this is equivalent to onlyhaving unknown microphone delays; see Section IV-C.

Farmani et al. [42]) considered the minimal solutions to thespecial case of dual microphone setups. Burgess et al. [43]proposed solutions for settings where either the sender or thereceivers lie in a lower dimensional space. Batstone et al. [44]consider the case of constant offset TDOA self-calibration(i.e. transmissions have a known period but unknown offset);this is also a stratiﬁed approach which ﬁrst solves for theunknown time offset [40]. We show that our approach canalso succeed in near-minimal conﬁgurations.

B. Contributions

Our work is motivated by the fact that the two-stageprocedure is suboptimal. The (valid) reason to adopt sequentialestimation are poor local minima in the loss function whichinvolves both the times and the positions. From a statisticalpoint of view, however, joint estimation is preferred. Further,timing estimation can fail or require many random restarts;the same holds for the non-convex optimization to recover therelative conﬁgurations from ¯ R T ¯ S . The two-stage proceduremakes it challenging to exploit prior geometric informationsuch us known distances or distance bounds.In light of related work and the above discussions, wesummarize our contributions as follows: • What we are usually interested in are the sensor po-sitions. We thus formulate a new self-localization losswhich is invariant to offsets and delays. The proposedloss uses non-squared distances as opposed to the usualsquared; we prove that minimizing this objective yieldsthe maximum likelihood estimate of the positions underGaussian noise. It enables us to recover the geometrywithout worrying about the times and without randomrestarts. • We formulate a non-convex semideﬁnite optimizationproblem in terms of this new loss. We work with thefull point set X = [ R ; S ] and the corresponding Grammatrix. The Gram matrix and its constraints intrinsically glue together the sources and the receivers and thusobviate the need for the non-convex optimization stepof [23] (see Section III-A); • Although our method does not require synchronization, itcan leverage synchronization among the receivers or thesources if it is fully or partially present [32], [45], [46]. Itcan further leverage geometric side information such aswhen the receivers contain subarrays with known relativegeometry or some sources and receivers are collocated.Finally, it can handle missing measurements.The proposed method achieves state-of-the art results, notonly in terms of localization accuracy, runtime, or exploitingside information, but also in the ability to solve near-minimalproblems with few nodes. Finally, we note that as suggestedby Ono [25], our method can also be interpreted as a virtualsynchronization method.II. P

ROBLEM S TATEMENT

We wish to localize a set of M receivers with unknownpositions (cid:126)r , . . . , (cid:126)r M ∈ R d using a set of K sources with un-known positions (cid:126)s , . . . , (cid:126)s K ∈ R d . We assume that the sources emit pulses whose times of arrival (TOA) can be measuredat the receivers. Since we do not assume synchronizationbetween the sources and the receivers, the absolute emissiontime τ k of the k th source is unknown. Since the sources are notnecessarily synchronized among themselves, the differences τ k − τ k (cid:48) are also unknown. Similarly, since we do not assumesynchronization among receivers (nor knowing their internaldelays), the temporal frames of reference of each receiver areshifted by an unknown σ m with respect to some referenceclock. The times of arrival of the k th source signal at the m threceiver thus become t mk = v − (cid:107) (cid:126)r m − (cid:126)s k (cid:107) + σ m + τ k , (1)where for simplicity we let both σ m and τ k have arbitrarysign. We assume the transmission speed v to be known andw.l.o.g. let v = 1 for the remainder of the paper.Note that if we instead use the time-difference-of-arrival(TDOA), then with the ﬁrst sensor as the reference, the TDOAis ˜ t mk = t mk − t k = (cid:107) (cid:126)r m − (cid:126)s k (cid:107) + ( σ m − σ ) − (cid:107) (cid:126)r − (cid:126)s k (cid:107) = (cid:107) (cid:126)r m − (cid:126)s k (cid:107) + ˜ σ m + ˜ τ k . And so we can still think of the TDOA ˜ t mk as a TOA butwith modiﬁed emission times and offsets. A. Minimal Number of Sources and Receivers

Common sensor localization scenarios are either in thehorizontal plane ( d = 2) or in 3D space ( d = 3 ). The numberof the degrees of freedom in the positions of sources andreceivers is then d ( M + K ) . However, since rigid motionsof the entire setup cannot be distinguished from TOA data,we can choose the associated d ( d + 1) / degrees of freedomfreely ( d translational and (cid:0) d (cid:1) rotational). Each source andreceiver come with an additional unknown time, which givesanother M + K degrees of freedom. However, we can choose aglobal time offset arbitrarily, which is another gauge invarianceand subtracts one degree of freedom. The total number of thedegrees of freedom is thus DOF ) = ( d + 1)( M + K − d ) − As measurements we get

M K

TOAs. In order to get adimension-zero solution set (a solution set that is a ﬁniteor countable set of points, but not necessarily unique), thisnumber should at least match the number of the degrees offreedom. Solving for the minimal number of sources as afunction of the number of receivers, we obtain the followingrelation K ≥ ( d + 1) M − d ( d + 1) / − M − ( d + 1) . For d = 3 this gives K ≥ (4 M − / ( M − implyingthat the smallest number of receivers that can be localized is M = 5 for which at least K = 13 sources are required to Note that these measurements can be obtained without physically emittingpulses. In audio, for example, one often emits long chirps which is followedby pulse compression. get a dimension-zero solution. Some other minimal cases are: ( M = 7 , K = 7) , ( M = 13 , K = 5) .As we demonstrate with real and numerical experiments inSection VI, our algorithms can operate in the vicinity of thisminimal regime, where previous methods either fail or performpoorly.III. A L OSS I NSENSITIVE TO O FFSETS AND D ELAYS

Instead of proceeding sequentially by ﬁrst estimating theunknown times as in the case of the state-of-the-art methods[32], [15], we note that the primary task is usually positionestimation rather than timing estimation. Besides, when thepositions are known, estimating the timings { σ m } Mm =1 and { τ k } Kk =1 boils down to a simple algebraic exercise (see SectionIII-D). We thus devise a data ﬁdelity metric which is insensi-tive to the unknown timings, yet is only minimized with thecorrect positions.We begin by writing (1) in matrix form as T = ∆ + (cid:126)σ TK + M (cid:126)τ T , (2)where ∆ = ( δ mk ) and δ mk = (cid:107) (cid:126)r m − (cid:126)s k (cid:107) , (cid:126)σ = [ σ , . . . , σ M ] T and (cid:126)τ = [ τ , . . . , τ K ] T . Now we make the key observation:multiplying (2) on both sides by a matrix which has an all-ones vector in the nullspace will annihilate the two terms thatdepend on the unknown times.Given an integer L , let J L be a geometric centering matrixof size L , J L := I L − L L TL . (3)It is easy to verify that J M M = (cid:126) M and TK J K = (cid:126) TK sothat the matrix P = J M T J K = J M ∆J K (4)does not depend on the unknown times. In fact, we can saymore. Proposition 1.

We have J M T J K = J M T J K if and onlyif there exist (cid:126)σ (cid:48) and (cid:126)τ (cid:48) such that T = T + (cid:126)σ (cid:48) TK + M (cid:126)τ (cid:48) T .In particular if T = ∆ .Remark: If ∆ happened to be a Euclidean (squared) distancematrix (EDM) of a point set X , for example D = ( d ij ) , d ij = (cid:107) (cid:126)x i − (cid:126)x j (cid:107) , then − J N DJ N would equal the Gram matrixof the (centered) X . No similar statement can be made heresimply because ∆ holds non-squared distances, and becauseit only corresponds to one off-diagonal block of D . Proof.

The nullspace of the linear operator J : R M × K → R M × K A J (cid:55)→ J M AJ K is spanned by N ( J ) = span { M (cid:126)e (cid:62) , . . . , M (cid:126)e TK , (cid:126)e TK , . . . (cid:126)e M TK } . Note that not all of those

M K matrices are linearly inde-pendent, but the argument does not hinge on independence. We often indicate matrix and vector sizes in subscripts. While making thenotation a bit clunkier, it helps keep track of dimensions.

Every matrix in the argument of span correspond to addinga constant offset to either a source or a receiver. As aconsequence, any nullspace vector can be written as a linearcombination of source / receiver offsets which proves theproposition.What Proposition 1 says is that J M ∆J K is sufﬁcient todetermine the positions in the sense that the only locationssuch that this matches measurements are the true ones (if theoriginal problem is solvable).We can then propose the following timing-invariant objec-tive for localizationmin R , S (cid:107) J M ( ∆ ( R , S ) − T ) J K (cid:107) F , (5)where (cid:107) · (cid:107) F denotes the Frobenius norm. This objective isjustiﬁed by the following direct corollary of Proposition 1. Corollary 1.

If the unsynchronized localization problem has aunique solution (up to a rigid transformation), then vanishingloss in (5) implies correct localization.

Further, the loss (5) is in fact equivalent to the ML loss given the offsets (cid:126)σ and (cid:126)τ . Namely, assuming i.i.d. Gaussiannoise on the TOA measurements, we have f ML ( ∆ , (cid:126)σ, (cid:126)τ ) = (cid:107) ∆ + (cid:126)σ TK + M (cid:126)τ T − T (cid:107) F , and the loss in (5) is simply f ( ∆ ) = min (cid:126)σ,(cid:126)τ f ML ( ∆ , (cid:126)σ, (cid:126)τ ) . To see this, note that ∇ (cid:126)σ f ML = 2( ∆ + (cid:126)σ TK + M (cid:126)τ T − T ) K = 0 (6) = ⇒ (cid:126)σ (cid:63) = − TK K ( ∆ + M (cid:126)τ T − T ) K . (7)Inserting into the original cost, we get f ML ( ∆ , (cid:126)σ (cid:63) , (cid:126)τ ) = (cid:107) ( ∆ + M (cid:126)τ T − T ) ( I K − K K TK ) (cid:124) (cid:123)(cid:122) (cid:125) J K (cid:107) F . Solving for (cid:126)τ (cid:63) similarly yields f ML ( ∆ , (cid:126)σ (cid:63) , (cid:126)τ (cid:63) ) = f ( ∆ ) . A. Characterization in Terms of the Full Gram Matrix

As described in Section I-A, a number of state-of-the-artmethods based on low-rank matrix decompositions recover thematrix H = ¯ R T ¯ S , with ¯ R and ¯ S being some translated(centered) versions of R and S [23], [32], [15]. Matrixfactorizations such as singular value decomposition can onlydetermine ¯ R and ¯ S up to a multiplication by an arbitraryinvertible matrix Q , since for any such Q we have that H = (cid:98) R T (cid:98) S , with (cid:98) R = Q T R , (cid:98) S = Q − S . Finding the right Q (which can be assumed to be upper-triangular which ﬁxesthe rotational gauge freedom) and the (one) translation vectoris then achieved via direct non-convex optimization, thoughover a lower dimensional space than the original problem.We sidestep this problem by writing the measurements interms of the full point set X := [ R , S ] ∈ R d × N , where N = M + K , and always working with the entire X . Wedenote the corresponding full Gram matrix by G = X T X ∈ R N × N . The off-diagonal blocks of the Gram matrix G encodethe relative geometry between R and S —by recovering G ,we know that relative geometry.The matrix of squared distances (the EDM) between thecolumn vectors in X can be written as a linear function ofthe Gram matrix. Letting D = ( (cid:107) (cid:126)x n − (cid:126)x n (cid:48) (cid:107) ) Nn,n (cid:48) =1 , one has D = D ( G ) := diag ( G ) TN − G + N diag ( G ) , (8)with diag ( G ) = [ g , . . . , g NN ] T . This means that the matrixof distances ∆ between sources and receivers can be writtenas an entrywise square root of a linear function of G , since ∆ • = L ( G ) def = S row D ( G ) S col , (9)where S row := [ I M , M × K ] and S col := [ K × M , I K ] T arethe appropriate row and column selection matrices, and ( · ) • denotes an entrywise square. In view of Proposition 1, thisleads us to the ﬁrst reformulation of the original problem: min G (cid:13)(cid:13)(cid:13) J M ( • (cid:112) L ( G ) − T ) J K (cid:13)(cid:13)(cid:13) F (10a)subject to G (cid:23) , (10b) G N = (cid:126) , (10c) rank( G ) = d. (10d)The constraint (10c) resolves the translation ambiguity; it isequivalent to the point set being centered around the origin N (cid:80) n (cid:126)x n = 0 .Note that now the receivers and sources are being trans-lated together, unlike in earlier work where they were split.Further, the estimated locations are directly read out from afactorization of G , without the need for stitching. B. Semideﬁnite Relaxation

The formulation (10a) remains nonconvex. Thus, directapproaches such as randomly-initialized ﬁrst-order methods(e.g., projected gradient descent) are likely to get stuck in alocal minimum. We thus propose to instead solve a convexrelaxation of the problem.There are two sources of nonconvexity: the entrywise squareroot in the objective, and the rank constraint on the Gram ma-trix. We begin by replacing the entrywise square root • (cid:112) L ( G ) by a new matrix variable B , and adding the constraint that theentrywise square of B must equal L ( G ) .Concretely, we write the objective as min G , B (cid:107) J M ( B − T ) J K (cid:107) F , and ask that B • = L ( G ) to obtain an equivalent formulation.We now proceed by reformulating the added constraint asa rank constraint on some semideﬁnite matrix (or matrices).Since the entrywise square does not mix the entries of B ,we can add such constraints on a per-entry basis. This iscomputationally more efﬁcient (and empirically works better)than working with the entire vectorization of B at once. We can equivalently write the constraint b mk = L ( G ) mn as L mk := (cid:20) L ( G ) mk b mk b mk (cid:21) (cid:23) (cid:126) (11) b mk ≥ (12) rank( L mk ) = 1 . (13)All the non-convexity is now lumped into the rank con-straints, rank( G ) = d and rank( L mk ) = 1 for ≤ m, k ≤ M, K . The ﬁnal step is to relax all of the rank constraints toget min G , B (cid:107) J M ( B − T ) J K (cid:107) F (14a)subject to G (cid:23) , (14b) G N = (cid:126) , (14c) L mk (cid:23) (cid:126) , for all ≤ m, n ≤ M, K, (14d) B ≥ . (14e)We can ﬁnally reconstruct the point set from the re-covered G using singular value decomposition. We have G = U Λ V (cid:62) , where Λ = diag ( σ , . . . , σ N ) withthe singular values σ i assumed to be sorted in decreas-ing order. We reconstruct the point set using ˆ X =[ diag ( σ , . . . , σ d ) ,(cid:126) d × ( N − d ) ] V (cid:62) . If the rank of the estimated G is truly d , as it ideally should be since it describes a d -dimensional point set, then the trailing singular values wouldbe zero anyway. C. Reﬁnement by Levenberg-Marquardt

In general, solving the above semideﬁnite program givespoint conﬁgurations which are good coarse approximationsto the true geometry, although they are not good enough formost applications. The reason can be tracked down to thefact that after relaxing the rank constraints, the estimatedmatrices have higher rank than desired. Still, the positivesemideﬁnite constraints constrain geometry sufﬁciently to getdecent estimates. One strategy would be to employ variousrank minimization strategies. Informally, we tried this and itworks decently but it is very slow.Instead, we propose to reﬁne the output of the SDR usingthe Levenberg-Marquardt (LM) algorithm. It is fast, accurate,and it works better. We also note that simple gradient descentis much slower.To derive the LM updates, we need to compute the Jacobianof the loss in order to linearize it. The loss J ( R , S ) = (cid:107) J M ( ∆ ( R , S ) − T ) J K (cid:107) F can be written as J ( R , S ) = (cid:107) f ( R , S ) (cid:107) , where f ( R , S ) = A (cid:126)δ ( R , S ) − (cid:126)t , A = J TK ⊗ J M , (cid:126)t = A vec ( T ) and (cid:126)δ = vec ( ∆ ) . Since ∆ : R d × ( M + K ) → R M × K , theJacobian is a tensor in R M × K × d × ( M + K ) . We can compute [ D ∆ ] m,k, : ,(cid:96) =  (cid:126)r m − (cid:126)s k (cid:107) (cid:126)r m − (cid:126)s k (cid:107) (cid:96) = m, − (cid:126)r m − (cid:126)s k (cid:107) (cid:126)r m − (cid:126)s k (cid:107) (cid:96) = M + k, otherwise . with the understanding that the ﬁrst M slices in the last indexof D correspond to the receivers and the last K to the sources.To make things easier to compute with, we rewrite T as afunction of a single vector (cid:126)x = (cid:126)x ( R , S ) = (cid:20) vec ( R ) vec ( S ) (cid:21) , and rearrange the output into a vector so that (cid:126)δ ( (cid:126)x ) = vec ( ∆ ( R , S )) . The Jacobian D (cid:126)δ is obtained from D T bycollapsing the ﬁrst two dimensions and the last two dimensionsso that D (cid:126)δ ∈ R MK × d ( M + K ) . Finally the Jacobian of f iscomputed as D f = A D (cid:126)δ, and we have an afﬁne approximation of f as (with some abuseof notation) f ( (cid:126)x ; (cid:126)x ) = f ( (cid:126)x ) + D f ( (cid:126)x )( (cid:126)x − (cid:126)x ) . This leads to the LM update, (cid:126)x ( k +1) = arg min (cid:126)z (cid:13)(cid:13)(cid:13) f ( (cid:126)z ; (cid:126)x ( k ) ) (cid:13)(cid:13)(cid:13) + λ (cid:13)(cid:13)(cid:13) (cid:126)z − (cid:126)x ( k ) (cid:13)(cid:13)(cid:13) which is solved by (cid:126)x ( k +1) = (cid:126)x ( k ) − ( D f ( (cid:126)x k ) T D f ( (cid:126)x k ) + λ I ) − D f ( (cid:126)x k ) T f ( (cid:126)x ( k ) ) . The full localization algorithm is summarized in Algorithm1.

Algorithm 1

Unsynchronized Sensor Localization.

Require:

TOA T ∈ R M × K Ensure:

Positions of receivers R ∈ R d × M and sources S ∈ R d × K Initialize R and S using SDR (14).Update R and S using LM. D. Estimating Offsets and Delays

Once an estimate (cid:98) ∆ of ∆ is computed, we can also computethe unknown offsets (cid:126)τ and delays (cid:126)σ . We start by noting from(2) that T − ∆ = (cid:126)σ TK + M (cid:126)τ T , where the right hand side is linear in ( (cid:126)σ, (cid:126)τ ) . We exploit it byvectorizing both sides: (cid:126)e := vec ( T − ∆ ) = K ⊗ (cid:126)σ + τ ⊗ M = V (cid:126)σ + W (cid:126)τ = (cid:2) V W (cid:3) (cid:20) (cid:126)σ(cid:126)τ (cid:21) , with ⊗ denoting the Kronecker product, V := K ⊗ I M , W := I K ⊗ M . From here, estimating the times vectoramounts to solving an overdetermined system of linear equa-tions. Note that the system matrix has a nullspace of dimensionone (spanned by α ) due to the global offset ambiguity. Toeliminate the offset ambiguity we arbitrarily set σ = 0 , andlet V (cid:48) be V with the ﬁrst column removed. The least-squaresestimate of the unknown timings is then given as (cid:34)(cid:98) (cid:126)σ (cid:48) (cid:98) (cid:126)τ (cid:35) = (cid:2) V (cid:48) W (cid:3) † (cid:126)e, (15)with (cid:98) (cid:126)σ (cid:48) = [ (cid:98) σ , . . . , (cid:98) σ M ] T and ( · ) † denoting the Moore-Penrose pseudoinverse.IV. V ARIANTS

In many cases, we have some prior information about thedistances or offset times. The proposed method makes itstraightforward to leverage this prior knowledge. In the follow-ing, we explain how by describing several typical scenarios.

A. Localization of Subarrays

Suppose that some distances are known. It is often the casethat the set of receivers consists of several compact subarrayswith known geometry (voice-based assistants, smart phones).These compact subarrays are then distributed in the space ofinterest at unknown positions and with random orientations,together with discrete microphones.Since the EDM is a linear function of the Gram matrix,this prior knowledge corresponds to simple linear constraintson G . We thus augment the original program with a set of known inter-sensor distances for microphone pairs M = (cid:8) ( m , m ) : m < m , distance d m m between (cid:126)r m and (cid:126)r m known (cid:9) , Then we add the following set of constraints to our optimiza-tion: D ( G ) m m = d m m , for all ( m , m ) ∈ M , m (cid:54) = m . (16)If the set of receivers is partitioned as R = [ R , R , . . . , R J ] , with the distances within the j th subarray R j known, theseconstraints correspond to knowing the diagonal subblocks ofthe upper-left block of D .We can again proceed with an LM reﬁnement step. In thiscase, we consider the following objective min θ (cid:107) f ( θ ) (cid:107) (17a)subject to g ( θ ) = 0 , (17b)(17c)where g : R d ( M + K ) (cid:55)→ R |M| is the known-distances residualdeﬁned as g m m ( X ) = edm ( X , X ) m m − d m m with edm ( X , X ) m m = (cid:107) (cid:126)x m − (cid:126)x m (cid:107) being the entriesof the EDM of the point set X . We solve (17) using theAugmented Lagrangian algorithm [47]. The correspondingupdate is given in Appendix A.In some cases the distances may not be known exactly, butwe might have access to good bounds. We once more leveragethe linearity of D in the Gramian by adding the followinglinear constraints to the program, d m m ≤ D ( G ) m m ≤ ¯ d m m . (18)where ¯ d m m and d m ,m are upper and lower bounds on thedistance d m ,m .Finally, we note that priors might be available not onlyfor inter-receiver distances but also for inter-source distancesand the distances between the sources and the receivers (weoften have a coarse idea about how far the source events arefrom the microphones). These can be added to the constraintscompletely analogously. B. When One Set of Times is Known

When one set of timings is known (either the source emis-sion times or the receiver offsets), it would still be possibleto use the general formulation with the derived loss (10a)designed for the case when both sets of times are unknown (2).Though that would require more measurements than necessarygiven such prior knowledge as we show next.Both cases (unknown emission times or unknown offsets)are captured by the simpliﬁed measurement model T = ∆ + M (cid:126)τ T , (19)up to a transposition as necessary (exchanging the role ofsources and receivers). From here, it follows that the inﬂuenceof the timings can be eliminated simply by one left multipli-cation by J M so that P (cid:48) = J M T = J M ∆ does not depend on (cid:126)τ . We can thus replace the loss (10a) by (cid:107) J M ( B − T ) (cid:107) F . (20)To see how this reduces the minimal required numberof measurements (sources or sensors), recall the degrees-of-freedom count from Section II-A. Concretely, DOF ) = d ( M + K − d +12 ) + M gives K ≥ ( d + 1)(2 M − d )2( M − d ) . For d = 3 , this becomes K ≥ M − M − , so that the minimalnumber of receivers is now M = 4 and the minimal casesfor the dimension-zero solution are M = 4 , K = 10 , M =5 , K = 7 , M = 6 , K = 6 , and M = 9 , K = 5 . Another wayto see this is by noting that the operator A (cid:55)→ J M A has asmaller nullspace than A (cid:55)→ J M AJ K . C. Sources with Known (Constant) Offset

As a last example of prior information that is easily handled,we mention the practically relevant case where the source isfor example a smartphone emitting periodic pulses. More gen-erally, the source is emitting signals at known (not necessarilyregular) intervals. The phone is not synchronized with thereceivers, so the emitting times are strictly speaking unknown,but only up to one unknown offset—the start time of theemissions. This case has been studied in [32], [44].We show how to incorporate this using our proposedmethod. We have that τ k = τ o + δ k , with δ k is the known offset of the k th emission with respectto the unknown time τ o . When the emissions are periodic, wehave δ k = k · δ , but for simplicity we keep the notation moregeneral.Note that in general, the vectors (cid:126)σ and (cid:126)τ can never berecovered uniquely, unless one of the times is known. Thereason for this is that the global time reference can be decidedarbitrarily (we used this fact when counting the degrees offreedom in Section II-A).It then follows that the case of sources with known offsets ismathematically equivalent to the case when only the receiversoffsets are unknown. Concretely, ﬁrst write (cid:126)τ = τ o + (cid:126)δ, where (cid:126)δ is a known vectors. Then note that (cid:126)σ TK + M (cid:126)τ T = ( (cid:126)σ − τ o ) TK + (cid:126)δ (cid:62) . Since (cid:126)δ (cid:62) is known it can be absorbed in the measurements;the one unknown offset τ o can then be absorbed in receiveroffsets. D. Missing Measurements

One upshot of our formulation is that it allows localizationeven when some sensor–receiver measurements are unavail-able. We denote the set of missing entries by M = (cid:8) ( m, k ) : distance between m th receiver and k th source is missing (cid:9) . We then replace the objective (14a) with min G , B , { α mk } (cid:13)(cid:13)(cid:13)(cid:13) J M ( B − T + (cid:88) ( m,k ) ∈M α mk (cid:126)e m (cid:126)e (cid:62) k ) J K (cid:13)(cid:13)(cid:13)(cid:13) F , (21)where (cid:126)e n denotes the canonical basis, and the coefﬁcients α mk account for the missing entries.The LM reﬁnement then proceeds with θ =  vec ( R ) vec ( S ) vec ( α )  ∈ R d ( M + K )+ |M| . The Jacobian now includes the derivative withrespect to α as well [ D α f ] mk,ij = (cid:40) m, k ) ∈ M , i = m, j = k, otherwise . (22) V. N

UMERICAL S IMULATIONS

In this section, we evaluate our approach on synthetic data.We implemented our algorithms in Matlab and used the CVXpackage for specifying convex programs [48], [49] with theSeDuMi solver. For reverberation simulations, we use thePyroomacoustics [50] package in Python. In the following,we describe the setup and present a series of results fordifferent scenarios showcasing the accuracy and ﬂexibility ofour method.

A. Setup

We consider an equal number of sensors and sources M = K in the range of 7 to 12. For each pair, we generated200 conﬁgurations of points randomly distributed in a volumeof dimensions 10 m ×

10 m × [ − , s. The TOAs wereﬁnally corrupted by zero-mean random Gaussian noise withstandard deviation σ ∈ { − , − , − , − , , } s orequivalently σ ∈ { . , . , . , . , } cm. The parametersare summarized in Table I. TABLE IP

ARAMETERS FOR THE SETUP AND ALGORITHMS .Dimension d = 3 Volume 10 m ×

10 m × B. Evaluation

The reconstructed points are ﬁrst optimally aligned usingProcrustes analysis [51]. The localization error between thetrue [ R , S ] and reconstructed [ ˆ R , ˆ S ] points is then E rs = M (cid:80) m =1 (cid:13)(cid:13)(cid:13) (cid:126)r m − ˆ (cid:126)r m (cid:13)(cid:13)(cid:13) + K (cid:80) k =1 (cid:13)(cid:13)(cid:13) (cid:126)s k − ˆ (cid:126)s k (cid:13)(cid:13)(cid:13) M + K . (23)We present the results using box plots depicting the median,ﬁrst quartile, and third quartile. The whiskers correspond to1.5 times the inter-quartile range. Values over or below thisare shown as outliers.To facilitate comparison, we will also present the medianand the 95% conﬁdence interval. In all ﬁgures, we clip theerrors to a minimum of − which we assume is sufﬁcientfor most applications. C. Results

As a baseline algorithm, we choose to compare to the two-stage approach of Wang et al. [15]. We consider two aspects inour comparison: localization error and runtime. Note that weonly report the results of Wang et al.’s approach without thefurther third-stage reﬁnement using Ono et al’s [25] algorithm.We had observed that the reﬁnement takes a long time due to the required number of iterations and did not improve theresults in the case M = K = 7 .The localization results comparing our approach (SDR+LM)to the two-stage baseline are shown in Figure 2. As can beseen, our timing-invariant approach outperforms the two-stagemethod, especially in the near-minimal conﬁgurations. As forthe runtime, we show in Figure 3 a comparison of the averageruntime over 20 runs as we increase the number of sensors andreceivers for a ﬁxed noise standard deviation σ = 10 − . Theruntime of our approach is consistently less than 5 seconds,whereas the runtime for the two-stage approach increasessigniﬁcantly with the number of sensors. Thus, our timing-invariant approach avoids both having timing-estimation errorsthat propagate to the position estimation as well as the needfor multiple random initializations that increase the runtime.We now evaluate the case when the microphones are syn-chronized but the source emission times are still unknown.The results are shown in Figure 4 along with the results ofthe fully unsynchronized case. As can be seen, the localizationperformance with partial synchronization is better, especiallyfor ( M = 7 , K = 7) and ( M = 8 , K = 8) .Next, we test the case when the inter-microphone distancesare known, but the whole setup is still fully unsynchronized.The localization errors are shown in Figure 4. As would beexpected, having side information signiﬁcantly improves thelocalization performance.For the last experiment, we attempt localization when someTOA data is missing. We test a range of missing entriesfrom 4% to 20% of the total. Figure 5 shows the results for M = K = 12 where we also compare the performance tothe complete data case. We can see that we are still ableto localize when few entries are missing. However, as wehave found some intrinsically bad conﬁgurations that cannotbe accurately localized with complete data, it is expected thatthe problem would be exacerbated when some TOA entriesare further missing.In summary, our approach (SDR+LM) outperforms the two-stage baseline (Wang et al [15]) in terms of localizationerror and runtime. Additional side information such as partialsynchronization or knowledge of some distances can be easilyaccommodated in our formulation, and as expected improvethe localization results compared to the fully unsynchronizedcase. D. Reverberation Results

We now evaluate the localization performance of our ap-proach in the presence of reverberation.We consider a room of dimensions × × . The M = K = 12 sensors and sources are randomly placed in the room.The sources are random Gaussian noise and do not overlapsuch that the segmentation at each microphone is known. TheSNR is set to 15 dB.We vary the reverberation time by adjusting the wallabsorption coefﬁcients and maximum order for the image-source method. For each reverberation time, we simulate 20realizations. While the desired reverberation times are 0 s,0.1 s, 0.2 s, 0.3 s, 0.4 s, and 0.5 s, the actual reverberation Fig. 2. Localization results for an unsynchronized setup comparing our approach (SDR+LM) to the two-stage baseline (Wang et al [15]). Results are shownat different noise levels and for M = K in the range of 7 to 12.Fig. 3. Average runtime of our approach (SDR+LM) compared to the two-stage baseline (Wang et al [15]). Results are for M = K in the range of 7to 12. The noise standard deviation is ﬁxed σ = 10 − . The runtime of ourapproach is consistently less than 5 seconds. times obtained in simulation are slightly different but close.The parameters are summarized in Table II.The TDOA are estimated using the method of Yamaoka etal [52]. The error between the true t mk and estimated ˆ t mk TDOA is calculated as E t = M (cid:80) m =1 K (cid:80) k =1 | t mk − ˆ t mk | M K . (24)We then use the estimated TDOA as input to our localizationalgorithm as well as the two-stage baseline of Wang et al [15].The results are shown in Figure 6. In Figure 6a, we showthe median TDOA estimation error. As expected, the errorsare larger as the reverberation time increases. Subsequently,

TABLE IIP

ARAMETERS FOR THE REVERBERATION SIMULATION .Room dimensions 10 m ×

10 m × this affects the localization errors as shown in Figure 6b.Our timing-invariant approach still outperforms the two-stagebaseline. VI. E XPERIMENTS

In this section, we evaluate our approach on real data [44]recorded in an ofﬁce with most of the furniture removed.A loudspeaker played a chirp from 65 positions. We haveaccess to the ground truth positions for 12 microphonesmeasured using a laser, the × TOA matrix, and a maskindicating the positions of outlier TOA entries. While the TOAwas extracted from the recordings knowing the chirp, ouralgorithms also work for TDOA matrices which could havebeen extracted without any assumption on the source as shownin Section V-D.

A. Setup

Figure 7 shows the TOA matrix, outlier positions, and thetrue microphone positions. A total of 23 loudspeakers provideclean data without outliers. Thus, for the ﬁrst experiments, Fig. 4. Median localization errors for an unsynchronized setup in different scenarios. Results are shown at different noise levels and for M = K in the rangeof 7 to 12. In the case without any synchronization, our approach (SDR+LM) outperforms the two-stage baseline (Wang et al [15]). Side information suchas partial synchronization or knowledge of some distances improves the results.Fig. 5. (Missing Data.) Median localization errors for M = K = 12 in the case of missing TOA entries at different levels of noise. Between 4% and 20%of entries are missing. The whole setup is unsynchronized. The results are plotted against those without any missing data. (a)(b) Fig. 6. (Reverberation.) Results as the reverberation time increases. (a)Median TDOA estimation error. (b) Median localization error. Our timing-invariant approach outperforms the two-stage baseline. we will use the × subset of TOAs. Then, for the full × TOAs, we will use our missing data approach tohandle the outliers. We report results over 200 runs wherewe randomly choose a subset of the loudspeakers to localizethe 12 microphones.Since we only have the ground truth for the microphonepositions, we calculate the localization error between the true R and reconstructed ˆ R microphone positions as E r = M (cid:80) m =1 (cid:13)(cid:13)(cid:13) (cid:126)r m − ˆ (cid:126)r m (cid:13)(cid:13)(cid:13) M . (25)

B. Results

We ﬁrst test our approach on subsets of the × TOAmatrix that is outlier-free. We also compare to the Wang etal. [15] two-stage baseline. In Figure 8(a), we show the errorsfor localizing all 12 microphones using a varying number ofloudspeakers from 6 (the minimal case for M = 12 ) to 11.We also show the average errors in Figure 8(b). Once again,our approach signiﬁcantly outperforms the two-stage baseline.Also similar to what we observed in the numerical simulations,while using more loudspeakers improves the average error, foreach K , there are a number of large errors corresponding tointrinsically difﬁcult conﬁgurations that have attractive non-global minima. However, the minimum error is consistent forall K and is less than 5 cm. (a)(b) Fig. 7. (Real Data.) Data from a real experiment carried out in an ofﬁceenvironment. (a) True positions of 12 microphones. (b) Mask showing outliersin the TOA data.Fig. 8. (Real Data.) Average localization errors for localizing 12 microphonesusing 6 to 11 sources. We compare our approach (SDR+LM) to a two-stageapproach (Wang). The input TOA matrix is a subset of the × outlier-freeTOA matrix. Fig. 9. (Real Data.) Average localization errors for localizing 12 microphonesusing 6 to 21 sources. We compare using subsets of the available × TOAmatrix (Full Data) to using subsets of the × outlier-free TOA matrix(Outlier-Free). For the former, we use the missing data approach where wetreat outliers as missing entries. Next, we use subsets of the entire × TOA that containsoutliers but assume knowing where the outliers are. We canthus use the missing data approach described in Section IV-D.Figure 9 shows the errors for localizing the 12 microphonesusing 6 to 21 sources. On one hand, the average error is largercompared to the outlier-free case. On the other hand, the bestcase scenarios that result in minimum errors are comparableto the outlier-free case. The smallest localization error is 4 cmand corresponds to K = 16 loudspeakers.VII. C ONCLUSION

We formulated a timing-invariant objective that allows usto localize sensors and sources in an unsynchronized setting.Based on this objective, we proposed an SDP relaxation usingthe full Gram matrix of the point set and then a subsequentreﬁnement step using the LM algorithm. We have thus elimi-nated the need for having to ﬁrst estimate the unknown timingsas well as avoided the multiple initializations required whensolving non-convex problems. We also proposed an approachto handle missing data and showed how to seamlessly in-corporate additional information such as knowledge of somedistances or partial synchronization.Using numerical simulations, we demonstrated the feasibil-ity of the approach in different scenarios and in near-minimalconﬁgurations. We compared our approach to a two-stagestate-of-the-art method [15]. Not only did our approach out-perform the two-stage algorithm in terms of localization error,but also in terms of runtime. We also tested our algorithms onreal times of arrival measured in an ofﬁce environment. Wewere able to localize the 12 microphones to within 0.04 maverage error.Future work could focus on dealing with outliers whicharise in the presence of multipath, for example. The methodcould be similar to our missing data approach except thatthe positions of the outliers are not known and need to bedetermined. A

PPENDIX AL OCALIZATION OF S UBARRAYS

The LM reﬁnement proceeds by iteratively minimizing theaugmented Lagrangian over θ (cid:107) f ( θ ) (cid:107) + g ( θ ) (cid:62) (cid:126)z + µ (cid:107) g ( θ ) (cid:107) , (26)where (cid:126)z is the Lagrange multiplier and µ > . Minimizing(26) is equivalent to minimizing [47] (cid:107) f ( θ ) (cid:107) + µ (cid:107) g ( θ ) + (cid:126)z/ (2 µ ) (cid:107) . (27)Since (27) is nonlinear in θ , the LM algorithm is once againused in which the update is θ ( i +1) = θ ( i ) − (cid:16) D f ( θ ( i ) ) (cid:62) D f ( θ ( i ) ) + µ D g ( θ ( i ) ) (cid:62) D g ( θ ( i ) ) + λ ( i ) I (cid:17) − (cid:16) D f ( θ i ) (cid:62) f ( θ ( i ) ) + µ D g ( θ ( i ) ) (cid:62) ( g + (cid:126)z/ (2 µ ) (cid:17) where D g ∈ R |M|× d ( M + K ) is the Jacobian deﬁned as D g ij,p =  (cid:126)x i − (cid:126)x j p = i, − (cid:126)x i + (cid:126)x j p = j, otherwise . (28)R EFERENCES[1] G. San Antonio, D. R. Fuhrmann, and F. C. Robey, “MIMO radar ambi-guity functions,”

IEEE Journal of Selected Topics in Signal Processing ,vol. 1, no. 1, pp. 167–177, 2007.[2] R. Parhizkar, A. Karbasi, S. Oh, and M. Vetterli, “Calibration usingmatrix completion with application to ultrasound tomography,”

IEEEtransactions on signal processing , vol. 61, no. 20, pp. 4923–4933, 2013.[3] D. El Badawy, “Good Vibrations and Unknown Excitations,” Ph.D.dissertation, ´Ecole polytechnique f´ed´erale de Lausanne, 2020.[4] B. G. Ferguson, “Improved time-delay estimates of underwater acousticsignals using beamforming and preﬁltering techniques,”

IEEE Journalof Oceanic Engineering , vol. 14, no. 3, pp. 238–244, 1989.[5] R. Scheibler, D. Di Carlos, A. Deleforge, and I. Dokmani´c, “Separake:Source separation with a little help from echoes,” in . IEEE, 2018, pp. 6897–6901.[6] H. Pan, R. Scheibler, E. Bezzam, I. Dokmani´c, and M. Vetterli, “FRIDA:FRI-based DOA estimation for arbitrary array layouts,” in . IEEE, 2017, pp. 3186–3190.[7] M. Martinez-Camara, I. Dokmani´c, J. Ranieri, R. Scheibler, M. Vetterli,and A. Stohl, “The fukushima inverse problem,” in . IEEE,2013, pp. 4330–4334.[8] S. Simoni, S. Padoan, D. F. Nadeau, M. Diebold, A. Porporato, G. Bar-renetxea, F. Ingelrest, M. Vetterli, and M. Parlange, “Hydrologic re-sponse of an alpine watershed: Application of a meteorological wirelesssensor network to understand streamﬂow generation,”

Water ResourcesResearch , vol. 47, no. 10, 2011.[9] M. Z. Win, F. Meyer, Z. Liu, W. Dai, S. Bartoletti, and A. Conti,“Efﬁcient multisensor localization for the internet of things: Exploring anew class of scalable localization algorithms,”

IEEE Signal ProcessingMagazine , vol. 35, no. 5, pp. 153–167, 2018.[10] H. Simkovits, A. J. Weiss, and A. Amar, “Navigation by inertial deviceand signals of opportunity,”

Signal Processing , vol. 131, pp. 280–287,2017.[11] Y. Rockah and P. Schultheiss, “Array shape calibration using sourcesin unknown locations–Part I: Far-ﬁeld sources,”

IEEE Transactions onAcoustics, Speech, and Signal Processing , vol. 35, no. 3, pp. 286–299,1987.[12] ——, “Array shape calibration using sources in unknown locations–PartII: Near-ﬁeld sources and estimator implementation,”

IEEE Transactionson Acoustics, Speech, and Signal Processing , vol. 35, no. 6, pp. 724–735, 1987. [13] A. J. Weiss and B. Friedlander, “Array shape calibration using sources inunknown locations-a maximum likelihood approach,” IEEE Transactionson acoustics, speech, and signal processing , vol. 37, no. 12, pp. 1958–1966, 1989.[14] A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink, “Acoustic micro-phone geometry calibration: An overview and experimental evaluation ofstate-of-the-art algorithms,”

IEEE Signal Processing Magazine , vol. 33,no. 4, pp. 14–29, 2016.[15] L. Wang, T.-K. Hon, J. D. Reiss, and A. Cavallaro, “Self-localization ofad-hoc arrays using time difference of arrivals,”

IEEE Transactions onSignal Processing , vol. 64, no. 4, pp. 1018–1033, 2015.[16] C. Peng, G. Shen, Y. Zhang, Y. Li, and K. Tan, “Beepbeep: a highaccuracy acoustic ranging system using cots mobile devices,” in

Pro-ceedings of the 5th international conference on Embedded networkedsensor systems . ACM, 2007, pp. 1–14.[17] C. Vanwynsberghe, “R´eseaux `a grand nombre de microphones: applica-bilit´e et mise en œuvre,” Ph.D. dissertation, Paris 6, 2016.[18] W. S. Torgerson, “Multidimensional scaling: I. Theory and method,”

Psychometrika , vol. 17, no. 4, pp. 401–419, 1952.[19] J. B. Kruskal and M. Wish,

Multidimensional scaling . Sage, 1978,no. 11.[20] I. Dokmani´c, R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclideandistance matrices: essential theory, algorithms, and applications,”

IEEESignal Processing Magazine , vol. 32, no. 6, pp. 12–30, 2015.[21] C. Vanwynsberghe, P. Challande, J. Marchal, R. Marchiano, andF. Ollivier, “A robust and passive method for geometric calibrationof large arrays,”

The Journal of the Acoustical Society of America ,vol. 139, no. 3, pp. 1252–1263, 2016. [Online]. Available: https://doi.org/10.1121/1.4944566[22] P. H. Sch¨onemann, “On metric multidimensional unfolding,”

Psychome-trika , vol. 35, no. 3, pp. 349–366, 1970.[23] M. Crocco, A. Bue, and V. Murino, “A bilinear approach to the posi-tion self-calibration of multiple sensors,”

IEEE Trans. Signal Process. ,vol. 60, no. 2, pp. 660–673, 2012.[24] I. Dokmani´c, J. Ranieri, and M. Vetterli, “Relax and unfold: Microphonelocalization with euclidean distance matrices,” in , 2015, pp. 265–269.[25] N. Ono, H. Kohno, N. Ito, and S. Sagayama, “Blind alignment ofasynchronously recorded signals for distributed microphone array,” in . IEEE, 2009, pp. 161–164.[26] J. Zhang, R. C. Hendriks, and R. Heusdens, “Structured total leastsquares based internal delay estimation for distributed microphone auto-localization,” in , 2016, pp. 1–5.[27] N. D. Gaubitch, W. B. Kleijn, and R. Heusdens, “Auto-localization inad-hoc microphone arrays,” in

IEEE ICASSP . Vancouver: IEEE, 2013,pp. 106–110.[28] M. Pollefeys and D. Nister, “Direct computation of sound and mi-crophone locations from time-difference-of-arrival data,” in .IEEE, 2008, pp. 2445–2448.[29] S. Thrun, “Afﬁne structure from sound,” in

Advances in Neural Infor-mation Processing Systems , 2006, pp. 1353–1360.[30] M. Krekovi´c, G. Baechler, I. Dokmani´c, and M. Vetterli, “Structure fromsound with incomplete data,” in . IEEE, 2018, pp.3539–3543.[31] M. Krekovi´c, “Point, Plane, Set, Match! Winning the Echo GrandSLAM,” Ph.D. dissertation, ´Ecole polytechnique f´ed´erale de Lausanne,2020.[32] R. Heusdens and N. Gaubitch, “Time-delay estimation for TOA-basedlocalization of multiple sensors,” in

IEEE ICASSP . IEEE, 2014, pp.609–613.[33] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semideﬁnite pro-gramming based algorithms for sensor network localization,”

ACMTransactions on Sensor Networks (TOSN) , vol. 2, no. 2, pp. 188–220,2006.[34] P. Biswas, T.-C. Liang, K.-C. Toh, Y. Ye, and T.-C. Wang, “Semideﬁniteprogramming approaches for sensor network localization with noisydistance measurements,”

IEEE transactions on automation science andengineering , vol. 3, no. 4, pp. 360–371, 2006.[35] K. Yang, G. Wang, and Z.-Q. Luo, “Efﬁcient convex relaxation methodsfor robust target localization by a sensor network using time differencesof arrivals,”

IEEE transactions on signal processing , vol. 57, no. 7, pp.2775–2784, 2009. [36] R. M. Vagheﬁ and R. M. Buehrer, “Asynchronous time-of-arrival-basedsource localization,” in

International Conference on Acoustics, Speechand Signal Processing (ICASSP) . IEEE, 2013, pp. 4086–4090.[37] G. Wang, Y. Li, and N. Ansari, “A semideﬁnite relaxation methodfor source localization using TDOA and FDOA measurements,”

IEEETransactions on Vehicular Technology , vol. 62, no. 2, pp. 853–862, 2012.[38] F. Jiang, Y. Kuang et al. , “Time delay estimation for TDOA self-calibration using truncated nuclear norm regularization,” in

InternationalConference on Acoustics, Speech and Signal Processing (ICASSP) .IEEE, 2013, pp. 3885–3889.[39] Y. Kuang, S. Burgess, A. Torstensson, and K. ˚Astr¨om, “A completecharacterization and solution to the microphone position self-calibrationproblem,” in

International Conference on Acoustics, Speech and SignalProcessing (ICASSP) . IEEE, 2013, pp. 3875–3879.[40] Y. Kuang and K. ˚Astr¨om, “Stratiﬁed sensor network self-calibrationfrom TDOA measurements,” in

European Signal Processing Conference(EUSIPCO) . IEEE, 2013, pp. 1–5.[41] S. Zhayida, S. Burgess, Y. Kuang, and K. ˚Astr¨om, “TOA-based self-calibration of dual-microphone array,”

Journal of Selected Topics inSignal Processing , vol. 9, no. 5, pp. 791–801, 2015.[42] M. Farmani, R. Heusdens, M. S. Pedersen, and J. Jensen, “Tdoa-based self-calibration of dual-microphone arrays,” in

European SignalProcessing Conference (EUSIPCO) . IEEE, 2016, pp. 617–621.[43] S. Burgess, Y. Kuang, and K. ˚Astr¨om, “TOA sensor network self-calibration for receiver and transmitter spaces with difference in dimen-sion,”

Signal Processing , vol. 107, pp. 33–42, 2015.[44] K. Batstone, G. Flood, T. Beleyur, V. Larsson, H. R. Goerlitz, M. Os-karsson, and K. ˚Astr¨om, “Robust self-calibration of constant offset time-difference-of-arrival,” in

International Conference on Acoustics, Speechand Signal Processing (ICASSP) . IEEE, 2019, pp. 4410–4414.[45] I. Dokmani´c, L. Daudet, and M. Vetterli, “How to localize ten micro-phones in one ﬁngersnap,” in

EUSIPCO , 2014.[46] ——, “From acoustic room reconstruction to SLAM,” in

IEEE ICASSP .IEEE, 2016, pp. 6345–6349.[47] S. Boyd and L. Vandenberghe,

Introduction to Applied Linear Algebra:Vectors, Matrices, and Least Squares . Cambridge University Press,2018.[48] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convexprogramming, version 2.1,” http://cvxr.com/cvx, Mar. 2014.[49] ——, “Graph implementations for nonsmooth convex programs,” in

Recent Advances in Learning and Control , ser. Lecture Notes in Controland Information Sciences, V. Blondel, S. Boyd, and H. Kimura, Eds.Springer-Verlag Limited, 2008, pp. 95–110, http://stanford.edu/ ∼ boyd/graph dcp.html.[50] R. Scheibler, E. Bezzam, and I. Dokmani´c, “Pyroomacoustics: A pythonpackage for audio room simulations and array processing algorithms,”in IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP) , 2018.[51] P. H. Sch¨onemann, “A generalized solution of the orthogonal procrustesproblem,”

Psychometrika , vol. 31, no. 1, pp. 1–10, 1966. [Online].Available: https://doi.org/10.1007/BF02289451[52] K. Yamaoka, R. Scheibler, N. Ono, and Y. Wakabayashi, “Sub-sampletime delay estimation via auxiliary-function-based iterative updates,” in2019 IEEE Workshop on Applications of Signal Processing to Audio andAcoustics (WASPAA)