[PDF] CSI-Based Multi-Antenna and Multi-Point Indoor Positioning Using Probability Fusion

Abstract

Channel state information (CSI)-based fingerprinting via neural networks (NNs) is a promising approach to enable accurate indoor and outdoor positioning of user equipments (UEs), even under challenging propagation conditions. In this paper, we propose a CSI-based positioning pipeline for wireless LAN MIMO-OFDM systems operating indoors, which relies on NNs that extract a probability map indicating the likelihood of a UE being at a given grid point. We propose methods to fuse these probability maps at a centralized processor, which enables improved positioning accuracy if CSI is acquired at different access points (APs) and extracted from different transmit antennas. To improve positioning accuracy, we propose the design of CSI features that are robust to hardware and system impairments arising in real-world MIMO-OFDM transceivers. We provide experimental results with real-world indoor measurements under line-of-sight (LoS) and non-LoS propagation conditions, and for multi-antenna and multi-AP measurements. Our results demonstrate that probability fusion significantly improves positioning accuracy without requiring exact synchronization between APs and that centimeter-level median distance error is achievable.

Full PDF

SSUBMITTED TO A JOURNAL 1

CSI-Based Multi-Antenna and Multi-PointIndoor Positioning Using Probability Fusion

Emre Gönülta¸s, Eric Lei, Jack Langerman, Howard Huang, and Christoph Studer

Abstract —Channel state information (CSI)-based ﬁngerprint-ing via neural networks (NNs) is a promising approach to enableaccurate indoor and outdoor positioning of user equipments(UEs), even under challenging propagation conditions. In thispaper, we propose a CSI-based positioning pipeline for wirelessLAN MIMO-OFDM systems operating indoors, which relies onNNs that extract a probability map indicating the likelihood ofa UE being at a given grid point. We propose methods to fusethese probability maps at a centralized processor, which enablesimproved positioning accuracy if CSI is acquired at different ac-cess points (APs) and extracted from different transmit antennas.To improve positioning accuracy, we propose the design of CSIfeatures that are robust to hardware and system impairmentsarising in real-world MIMO-OFDM transceivers. We provideexperimental results with real-world indoor measurements underline-of-sight (LoS) and non-LoS propagation conditions, and formulti-antenna and multi-AP measurements. Our results demon-strate that probability fusion signiﬁcantly improves positioningaccuracy without requiring exact synchronization between APsand that centimeter-level median distance error is achievable.

Index Terms —Channel-state information (CSI), indoor local-ization, multi-point ﬁngerprinting, probability fusion.

I. I

NTRODUCTION

Positioning of mobile user equipment (UE) devices isessential for a broad range of applications, including navigation,virtual reality, asset tracking, advertising, industrial automation,and many more [1]–[4]. While global navigation satellitesystem (GNSS) technologies enable ubiquitous positioningin outdoor environments with a view of the sky, there is nosimilarly ubiquitous solution for indoors. In addition, someof the indoor positioning applications, such as drone or robotnavigation, require centimeter-level precision for executingtheir missions, which is not achievable using conventionalGNSS. One class of solutions for indoor positioning usesinfrastructure cameras for visible or infra-red light, viewingan object of interest which is equipped with either an activetransmitter or passive reﬂector [5]–[8]. While such systems canachieve sub-centimeter accuracy, they are expensive, require

E. Gönülta¸s and E. Lei are with the School of Electrical and ComputerEngineering, Cornell University, Ithaca, NY (e-mail: [email protected];[email protected]).J. Langerman and H. Huang are with Nokia Bell-Labs, Murray Hill,NJ (e-mail: [email protected]; [email protected]).C. Studer was with the School of Electrical and Computer Engineering,Cornell University, Ithaca, NY, and at Cornell Tech, New York, NY, and is nowwith the Department of Information Technology and Electrical Engineering atETH Zurich, Zurich, Switzerland (e-mail: [email protected]).The work of EG and CS was supported in part by Xilinx Inc. and by theUS National Science Foundation (NSF) under grants CCF-1652065, CNS-1717559, and ECCS-1824379. The Quadro P6000 GPU used for this researchwas donated by the NVIDIA Corporation. unobstructed views, and may not work in rooms with brightsunlight.

A. The Challenges of Indoor Positioning using RF signals

An alternative to camera-based positioning methods is touse radio-frequency (RF) signals [3], [9]–[14]. Often, onecan leverage measurements from existing RF signals used forwireless communications, enabling localization services withno additional equipment. However for some RF localizationtechniques, there could be a signiﬁcant cost in calibratingthe infrastructure access points (APs), depending on therequired level of accuracy. For example, achieving meter-level accuracy with RF time-difference-of-arrival measurementsrequires synchronization of the APs with nanosecond accuracyand knowledge of their locations with sub-meter accuracy.A type of RF localization technique known as ﬁngerprintinghas the advantage of not requiring AP calibration. Instead, thesetechniques rely on the ofﬂine creation of an empirical databasethat records RF measurements, such as the received signalstrength indicator (RSSI) from multiple APs, as a functionof the UE location. An algorithm then estimates the locationof a UE given its RF measurements and the ﬁngerprintingdatabase. The database could be created with relatively lowcost, for example, using an RF receiver mounted on a robotthat periodically moves through a space for cleaning.As an alternative to RSSI measurements, ﬁngerprinting couldinstead be based on estimates of the channel state information(CSI), which is always required for data demodulation. Forwideband MIMO-OFDM systems, complex-valued CSI can beestimated for each active subcarrier and each transmit-receiveantenna pair, resulting in signiﬁcantly richer measurementsets compared to RSSI. CSI-based ﬁngerprinting has beenshown to enable accurate localization in both indoor [11],[15]–[17] and outdoor [18]–[21] applications. Recently, arange of CSI-based ﬁngerprinting methods that use machinelearning—rather than sophisticated geometrical models—hasbeen proposed [15], [16], [19], [20], [22]–[24]. For all ofthese methods, carefully designed features from CSI turnout to be critical, as real-world channels exhibit small-scalefading and wireless transceivers suffer from a number ofhardware impairments. In addition, most of these methodsrely on supervised learning in order to train a neural network(NN) that maps CSI to UE position, which requires a dedicatedmeasurement campaign to acquire CSI and associated ground-truth position. If relative location information is sufﬁcient,self-supervised methods known as channel charting [25]–[27]avoid expensive measurement campaigns. Furthermore, if CSI a r X i v : . [ ee ss . SP ] S e p SUBMITTED TO A JOURNAL measurements from multiple APs are available, it was shownin [28], [29] that the accuracy of channel charting methodscan be improved signiﬁcantly.

B. Contributions

In this paper, we propose a CSI-based positioning pipeline formultiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) wireless systems, which lever-ages the availability of CSI acquired at multiple APs andfrom multiple transmit and receive antennas. Our methodbuilds upon neural networks that map CSI features to whatwe call probability maps , which indicate the likelihood ofa UE to be at a predeﬁned grid point in space. To improvepositioning accuracy in scenarios with multiple APs or multipletransmit antennas, we propose a range of methods that fusemultiple probability maps in a centralized processor. To improverobustness of our positioning pipeline to system and hardwareimpairments typically arising in IEEE 802.11ac MIMO-OFDM-based systems, we adapt methods proposed in [19] for cellularapplications. We provide a range of experimental results withreal-world indoor channel measurements under line-of-sight(LoS) and non-LoS conditions and for multi-antenna and multi-AP measurements. Our results reveal that probability fusionimproves positioning performance, enables centimeter-levelaccuracy, and avoids the need of exact synchronization betweenAPs while requiring only a small amount of positioninginformation to be transferred to a centralized processor.

C. Relevant Prior Art

CSI-based positioning has been studied extensively in thepast; see, e.g., [11], [14]–[17], [20], [23], [24], [30]–[33] andthe references therein. Recent work has focused mainly onneural network (NN)-based approaches [15], [16], [20], [23],[23], [24], [34], [35], which (i) do not require geometricmodeling [23], (ii) avoid storage of potentially very largeCSI ﬁngerprint databases [36], and (iii) have the potential togeneralize to areas excluded in the training set [37]. In contrastto such NN-based approaches, we focus on multi-antenna andmulti-point position in which a NN generates what we call probability maps , a probabilistic description of UE locationmeasured at multiple APs and from multiple transmit antennas.These probability maps can then be fused at a centralizedprocessor to improve positioning accuracy. References [23],[24] propose NNs that estimate the likelihood of a UE beingin a certain grid point using one-hot encoded vectors trainedfrom a ﬁxed number of reference positions. In contrast, wepropose a reﬁned strategy that enables NN training from anarbitrary set of locations that are not necessarily on a grid.References [38]–[40] also rely on a probabilistic description ofUE position, but perform positioning using geometrical modelsrather than NNs.NN-based positioning from CSI measurements requirescarefully-designed features, which are robust to small-scalefading as well as common system and hardware impairments.The use of beamspace representations to extract the incidentangles has been used in [20], [25], [26], [41]. The conversion of subcarrier CSI into the delay domain to extract relative time-of-ﬂight information has been used in [20], [41]–[43]. The useof cross-correlation in the spatial domain and autocorrelationin the delay domain has been used in [25], [26] and [19], [27],[44], respectively. Such methods improve resilience to small-scale fading and common hardware impairments, including timesynchronization errors as well as residual carrier frequency andsampling rate offsets. To improve robustness of our positioningpipeline to small-scale fading as well as system and hardwareimpairments that typically arise in IEEE 802.11ac MIMO-OFDM systems, we propose a set of CSI features that combineof all these methods.Virtually all existing positioning methods rely on single-transmit antenna CSI measurements at a single access point(AP) or basestation (BS) possibly with multiple receiveantennas. In contrast to these methods, we generate probabilitymaps for individual transmit antennas, which can then be fusedto improve accuracy while reducing the complexity and storageof the neural network. Multi-point localization strategies, whereCSI from multiple APs or cellular BSs is combined, havebeen proposed in [28], [29] for channel charting. While suchapproaches only enable relative localization, they also requirethe exchange of CSI features to a centralized processor thatperforms channel charting. In contrast to these approaches, wepropose new methods that fuse probability maps generated atmultiple APs (and from multiple transmit antennas), whichnot only improves (absolute) positioning accuracy but alsoreduces the amount of information that must be transferred toa centralized processor.Probability fusion of multiple sensor data such as cameraimages [45] is a widely studied subject; see, e.g., [46]–[48]. Existing outdoor positioning systems, such as GNSS,triangulation-based methods in cellular systems [49], or indoorpositioning systems such as WorldViz [50] or VICON [51],already fuse at least three different data sources to produce arobust position estimate. The methods proposed in this papercombine probabilistic sensor fusion with CSI-based positioning.Concretely, we fuse multiple probability maps that indicatethe likelihood of a UE being at a given grid point usingtheoretically principled conﬂation methods put forward in [52].Such probability conﬂation methods reduce the amount ofinformation that must be transmitted to a centralized processorwhile (often signiﬁcantly) improving positioning accuracy underchallenging indoor propagation conditions.

D. Notation

Lowercase boldface letters, uppercase boldface letters, anduppercase calligraphic letters denote column vectors, matrices,and sets, respectively. For a matrix A , we denote its transposeby A T , its Hermitian transpose by A H , its i th row and j thcolumn by A i,j , and its i th column by a i . For a vector a , the k th entry is denoted by a k , the (cid:96) and (cid:96) norms are (cid:107) a (cid:107) = (cid:112)(cid:80) k | a k | and (cid:107) a (cid:107) = (cid:80) k | a k | , respectively, and the real andimaginary parts are denoted by (cid:60) ( a ) and (cid:61) ( a ) , respectively. E. Outline

The remainder of this paper is organized as follows. Sec-tion II describes the operation principle of the proposed . GÖNÜLTA¸S

ET AL. UE u x ( u ) Wirelesschannel AP AP Probabilityfusion p ( u ) [1] p ( u ) [2] ˆ x ( u ) Fig. 1. Overview of the proposed CSI-based multi-antenna and multi-pointpositioning system. A user equipment (UE) u at position x ( u ) transmits pilotsto one or multiple access points (APs), which extract channel state information(CSI). Each AP then uses CSI to generate a probability map indicating the UE’slocation, which is fused to compute an estimate ˆ x ( u ) of the UE’s position. positioning pipeline and introduces the MIMO-OFDM chan-nel model. Section III details NN-based positioning alongwith CSI-feature construction. Section IV proposes differentprobability fusion methods for multi-antenna and multi-pointdata. Section V shows results for real-world indoor channelmeasurements. Section VI concludes the paper.II. O PERATION P RINCIPLE AND S YSTEM M ODEL

Our objective is to estimate the position of a UE from CSImeasurements acquired at one or multiple APs. We ﬁrst outlinethe operation principle of the positioning pipeline and thendescribe the multi-point MIMO-OFDM system model.

A. Operation Principle

Figure 1 illustrates the basic concept of the proposed multi-antenna multi-point CSI-based positioning pipeline. The u th UEat location x ( u ) transmits pilot sequences from one or multipletransmit antennas, which are then used to estimate the wirelesschannel at one or multiple APs indexed by b = 1 , . . . , B where B is the number of APs. The APs can have one ormany receive antennas. The acquired CSI is then used togenerate a probability map p ( u ) [ b ] which contains informationon the location of the u th UE as seen from the b th AP. Theprobability map is generated from measured CSI using neuralnetworks, which have been trained from a dataset containingCSI-location pairs in a dedicated training phase. Details on thepositioning pipeline are provided in Section III. Ground-truthlocation information for neural network training is acquiredvia a state-of-the-art multi-camera positioning system. Detailson the measurement setup are provided in Section V-A. Theprobability information from all APs is then fused in order toproduce an estimate ˆ x ( u ) of the UE location. Remark 1.

A key limitation of supervised positioning methodsis the requirement of a dedicated training phase to acquire CSIand associated ground-truth position. Self- or semi-supervisedmethods that build on channel charting [18], [25], [53], can beused to reduce or completely avoid the acquisition of ground-truth information at the expense of positioning accuracy. Anextension of the proposed positioning pipeline to such self- orsemi-supervised methods is part of ongoing work. B. Multi-Antenna Multi-Point OFDM System Model

We focus on an IEEE 802.11ac-based wireless communi-cation system [54], which is also what we used in Section Vto generate our indoor positioning results. Concretely, weconsider a multi-antenna UE with M T transmit antennas thatis in range of B multi-antenna APs (or basestations) with M R antennas each. We consider an OFDM system with W subcarriers and cyclic preﬁx length C . The set Ω used containsthe indices of subcarriers associated with tones that havebeen trained (corresponding to data and pilot subcarriers); theset Ω zero contains the indices associated with unused subcarriers.Consequently, we have Ω used ∪ Ω zero = { , . . . , W } . We assumethat the u th UE is at position x ( u ) ∈ R D , where D is typicallytwo or three (representing the spatial location in two or threedimensions), and is transmitting a pilot symbol (e.g., during thepreamble), which is used at each AP to compute an estimate ofthe wireless channel in the frequency domain. The estimated M R × M T MIMO channel matrix at subcarrier w = 1 , . . . , W and at AP b = 1 , . . . , B is denoted by H ( u ) w [ b ] ∈ C M R × M T . Wecall the collection of these channel matrices H ( u ) w [ b ] , w = 1 , . . . , W, b = 1 , . . . , B, (1)the CSI associated with UE u ; we abbreviate (1) by { H ( u ) w [ b ] } . Remark 2.

In what follows, we assume that the delay spreadof the channel plus the maximum timing offset does not exceedthe cyclic preﬁx length. We furthermore assume C ≤ | Ω used | forreasons discussed in Section III-A3. Besides that, we allow theextracted channel estimates to be affected by real-world systemand hardware impairments, such as timing offset, residualcarrier frequency and sampling rate offsets, and phase noise. Remark 3.

We do not require the B APs to acquire theCSI from the u th UE at the same time instant or perfectsynchronization among APs. The only assumption is that theCSI measured at each APs is for the same UE, which istransmitting from approximately the same location x ( u ) ; thisincludes the case in which the UE is transmitting to multipleAPs in a round-robin fashion or scenarios in which one ormultiple APs are acquiring CSI without decoding the UEs data. III. CSI-B

ASED F INGERPRINTING VIA N EURAL N ETWORKS

We now detail the proposed multi-antenna multi-point CSI-based positioning pipeline illustrated in Figure 2. We startby describing the CSI-feature extraction stage followed bydiscussing the NN that generates probability maps. Means tofuse probability maps to obtain accurate location estimates frommulti-antenna and multi-point data are detailed in Section IV.

A. CSI-Feature Extraction

In order to enable CSI-based positioning, it is critical toconstruct robust

CSI-features that (i) are unique for a givenUE location, (ii) are robust to small-scale fading effects [18],[21], [25], [26], [53], and (iii) are resilient to real-worldsystem and hardware impairments [19], [27]. Clearly, for a Many of the proposed techniques can be adapted easily to other MIMO-OFDM-based wireless communication systems.

SUBMITTED TO A JOURNAL

Access pointFeatureextraction f ( · ) Neuralnetwork g θ ( · ) Extract position H ( u ) w f ( u ) p ( u ) ˆ x ( u ) (a) Access point 1Featureextraction f ( · ) Neuralnetwork g θ ( · ) Access point 2Featureextraction f ( · ) Neuralnetwork g θ ( · ) Fusion andextract position H ( u ) w [1] f ( u ) [1] H ( u ) w [2] f ( u ) [2] p ( u ) [1] p ( u ) [2]ˆ x ( u ) (b)Fig. 2. CSI-based positioning pipeline. (a) Positioning with one AP ﬁrstextracts CSI H ( u ) of the u th UE followed by a feature extraction stage thatproduces f ( u ) . A neural network then computes a probability map p ( u ) thatis used to generate an estimate ˆ x ( u ) of the UEs location. (b) Positioningwith two APs fuses the two extracted probability maps p ( u ) [1] and p ( u ) [2] computed from two different neural networks from AP1 and AP2, respectively,to generate an estimate ˆ x ( u ) of the UE location. given UE location x ( u ) , the CSI features extracted from theestimated CSI { H ( u ) w [ b ] , w = 1 , . . . , W, b = 1 , . . . , B } shouldbe unique—otherwise, UE location is not uniquely determined.Furthermore, small-scale fading, e.g., caused by moving objectsin the surrounding area of the UE and APs, should not affectthe CSI features as they are generally difﬁcult to model. Finally,system and hardware impairments, such as varying transmitor receive power as well as timing, carrier frequency, andsampling rate offsets, should not affect the CSI features as theymay vary over time and from UE to UE. We now propose CSIfeatures (see Figure 2) that address the desired properties (ii)and (iii), as CSI-feature uniqueness in property (i) is mostlydetermined by the physical channel and difﬁcult to controlin practice. We apologize in advance for the rather involvednotation.

1) CSI Normalization:

In practical systems, the powerampliﬁer gain at the UE and the low-noise ampliﬁer gainat the AP can be set independently by the UE and the AP,respectively. While the ampliﬁer gain settings are typicallykept constant during transmission of one OFDM frame, theAP does, in general, not know the UE transmit gain settings.Furthermore, the path-loss characteristics in indoor applicationsdepends on the environment. Hence, the receive power is not areliable indicator for the distance between the UE and the AP,and should be ignored. As a result, we normalize the CSI asfollows. Let us deﬁne the vector h ( u ) m,w [ n ] = (cid:2) H ( u ) w [ b ] (cid:3) m as the m th column of H ( u ) w [ b ] , which is the CSI vector correspondingto the m th transmit antenna from the u th UE at subcarrier w .We then normalize these CSI vectors so that ¯ h ( u ) m,w [ b ] = 1 √ ρ h ( u ) m,w [ b ] , ρ = (cid:88) w ∈ Ω used (cid:13)(cid:13) h ( u ) m,w [ b ] (cid:13)(cid:13) , (2) which ensures that they have unit norm over all used subcarriersand receive antennas. This normalization step ensures thattransmit and receive gains, as well as path loss effects, areignored [18], [19], [28], [29], [44], [53]. We note that a moresophisticated CSI normalization strategy has been proposedin [25], which requires knowledge of the path-loss exponent.

2) Beamspace Transform:

In practice, it has been shownthat converting CSI into more “compact” representations, e.g.,domains in which the CSI is sparse, can yield improvedpositioning performance [18], [19], [28], [29], [44], [53], [55].In our application, we also transform the normalized CSI from(2) into the beamspace domain [56]. To this end, we assumethat the antennas at the AP form a uniform linear array (ULA). Then, by taking the discrete Fourier transform (DFT) acrossthe AP array, we obtain the beamspace vectors ˆ h ( u ) m,w [ b ] = D ¯ h ( u ) m,w [ b ] , (3)where D is the M R × M R dimensional DFT matrix normalizedso that D H D = I M R . Note that in the beamspace domain,each entry of the vector ˆ h ( u ) m,w [ b ] is associated with a speciﬁcbeam (or incident angle); for directional channels and largeantenna arrays, the beamspace representation of line-of-sight(LoS) channels is typically sparse as the signals arrive onlyfrom a small subset of incident angles. Remark 4.

We have observed that for the small antenna arraysthat are typical in IEEE 802.11ac, the beamspace transformprovides only marginal improvements. For massive MIMOsystems, however, the beamspace transform has been shownin [18], [19], [25] to signiﬁcantly improve performance.3) Delay-Domain Transform:

In addition to transformingthe normalized CSI into beamspace, we furthermore propose totransform the frequency (or subcarrier) domain into the delaydomain, as OFDM-based systems typically have only a limiteddelay spread (e.g., no larger than the cyclic preﬁx) and only afew taps in the impulse responses are signiﬁcant. To this end,we deﬁne an M -dimensional frequency-domain vector ˜ h ( u ) n,m [ b ] = (cid:2) [ˆ h ( u ) m, [ b ]] n , [ˆ h ( u ) m, [ b ]] n , . . . , [ˆ h ( u ) m,W [ b ]] n (cid:3) T (4)for a given beam n at AP b and a given UE transmit antenna m .In words, the vector ˜ h n,m [ b ] contains the channel estimatesover all W subcarriers. Ideally, by taking the inverse DFT of thefrequency-domain vector ˜ h ( u ) n,m [ b ] , one would obtain the delay-domain description of the frequency-selective channel fromthe m th UE antenna to the b th beam at AP b . Unfortunately,only the subcarriers indexed by Ω user are available in practice,whereas the entries pertaining to the subcarriers indexedby Ω zero are generally unknown (as they were not trained).Fortunately, since we assumed that the cyclic preﬁx length C is not larger than the number of used subcarriers | Ω used | , onecan estimate the delay-domain coefﬁcients within the cyclicpreﬁx length. Let Γ cp be the set of indices associated to thechannel taps in the delay-domain so that C = | Γ cp | . Then, one Beamspace transforms for uniform rectangular arrays exist and simplycorrespond to two-dimensional DFTs [57]. . GÖNÜLTA¸S

ET AL. can estimate the delay-domain coefﬁcients by computing t ( u ) n,m [ b ] = ( D Ω used , Γ cp ) † (cid:2) ˜ h ( u ) n,m [ b ] (cid:3) Ω used . (5)Here, the delay-domain vector t ( u ) n,m [ b ] ∈ C | Γ cp | contains the C dominant taps of the wireless channel between the m th transmitantenna of UE u and n th beam at AP b , the matrix D Ω used , Γ cp contains the rows indexed by Ω used and the columns indexedby Γ cp of the DFT matrix, ( · ) † denotes the left-pseudoinverse,and [˜ h ( u ) n,m [ b ]] Ω used is the subset of the frequency-domain vectorin (4) corresponding to the used subcarriers indexed by Ω used .Note that the least-squares estimator in (5) is frequently usedto denoise channel vectors in OFDM systems [58]. Remark 5.

We have observed that taking the inverse DFTover the entire frequency-domain vector (4) still works well inpractice and requires lower complexity than the approach in(5) as one can use an inverse fast Fourier transform (FFT).4) Autocorrelation:

The previous feature-extraction stepsserved the purposes of (i) ignoring gain settings in ampliﬁersand (ii) to sparsify the CSI in the beamspace and delay domains.We now propose a method that renders the CSI features robustto time-synchronization errors, residual carrier frequency offset(CFO) and sampling rate offset (SRO), and global phasemodulations. The method proposed here is inspired by theapproach proposed recently in [19] for CSI-based positioningin cellular massive MIMO systems.Time synchronization errors and residual carrier frequencyoffset can be modeled in the discrete-time domain as t [ k ] = y [ k − δ ] e jϕk , where t [ k ] is the time-domain signal at sampleindex k , y [ k − δ ] is the true received signal with unknowndelay δ caused by synchronization (or frame-start detection)errors, and ϕ determines the amount of residual CFO andSRO. When computing the “instantaneous autocorrelation” ofthe signal t [ k ] , we have R t [ τ ] = (cid:88) k t [ k ] t ∗ [ k + τ −

1] = (cid:88) k (cid:48) y [ k (cid:48) ] y ∗ [ k (cid:48) + τ − , (6)for τ = 1 , , . . . , which does no longer depend on on thetime synchronization error δ and the residual CFO. We notethat the instantaneous autocorrelation in (6) also removes anyconstant phase offset. For example, if we use the model t [ k ] = y [ k − δ ] e jϕk e jω for a ﬁxed phase offset ω ∈ [0 , π ) , then theautocorrelation is unaffected by that phase offset—such a phaseoffset could, for example, arise from small-scale fading.To improve robustness of our CSI features, we will followthis approach and compute the instantaneous autocorrelationnot only in the delay domain but also in the beamspace domain.Let (cid:2) t ( u ) n,m [ b ] (cid:3) k be the k th delay-domain sample measured atthe n th beam of AP b transmitted from the m th antenna fromUE u . Then, we compute R ( u ) t [ m, τ, κ, b ] Practical receivers perform CFO and SRO estimation and compensation;in practice, however, residual CFO and SRO errors remain [59], [60]. We are not taking any expectation over the product t [ k ] t ∗ [ k + τ − ,which is in contrast to the method proposed in [19]. D r opou t B N R e L U R e L U B N R e L U R e L U R e L U S o f t m a x f ( u ) [ b ]

968 512 512 512 512 p ( u ) [ b ] Fig. 3. Neural network (NN) structure at AP b . The NN g θ b with weightsand biases contained in the vector θ b takes in a CSI-feature vector f ( u ) [ b ] and generates a probability map p ( u ) [ b ] that describes the position of UE u . = M R (cid:88) n =1 C (cid:88) k =1 (cid:2) t ( u ) n,m [ b ] (cid:3) k (cid:2) t ( u ) n + κ − ,m [ b ] (cid:3) ∗ k + τ − , (7)where τ = 1 , , . . . , C and κ = 1 , , . . . , M R . For a givenAP b and UE u , we vectorize the three-dimensional tensorin (7) so that the vector r ( u ) [ b ] ∈ C M R M T C contains all theentries of R ( u ) t [ m, τ, κ, b ] for m = 1 , . . . , M T , τ = 1 , . . . , C ,and κ = 1 , . . . , M R . Finally, to enable the use of off-the-shelf deep-learning toolboxes, we convert the complex valuedvector r ( u ) [ b ] ∈ C M T C M R into a M T C M R -dimensionalreal-valued CSI-feature vector as follows: f ( u ) [ b ] = (cid:104) (cid:60){ r ( u ) [ b ] } T , (cid:61){ r ( u ) [ b ] } T (cid:105) T . (8)In what follows, we will also use CSI features extractedseparately per AP b and per transmit antenna m = 1 , . . . , M T ,which are obtained by vectorizing R ( u ) t [ m, τ, κ, b ] in only τ =1 , . . . , C and κ = 1 , . . . , M R and by stacking the real andimaginary parts in the vector f ( u ) m [ b ] . Remark 6.

We emphasize that taking the autocorrelationover the beamspace domain renders the proposed CSI-featuresindependent to a constant shift in incident angle. While thismay appear counterproductive, i.e., incident angles are a robustlarge-scale fading component, we have observed signiﬁcantimprovements for indoor positioning with IEEE 802.11ac-basedsystems. The reason is that the autocorrelation still capturesdifferences between incident angles, which naturally occurwith multipath propagation. For pure LoS channels, it is notrecommend to compute an autocorrelation in the beamspacedomain.B. Neural Network Structure

As illustrated in Figure 2, we propose to use one or multipleneural networks (NN) g θ b at each AP b with weights and biasesfrom all layers contained in the vector θ b that takes in CSIvectors f ( u ) [ b ] and generates what we call probability map p ( u ) [ b ] = g θ b ( f ( u ) [ b ]) (9)that describes the location of UE u ; see Section III-C fordetails on probability maps. This probability map is thenfused with probability maps from other APs to extract anestimated position ˆ x ( u ) of UE u . The neural network structureis illustrated in Figure 3. We consider a relatively simple six-layer neural network with input CSI-feature vector f ( u ) [ b ] andoutput probability-map vector p ( u ) [ b ] . All but the last layersuse ReLu activations; the last layer uses a softmax activationfor reasons detailed below. The ﬁrst and second layer use batch SUBMITTED TO A JOURNAL normalization (BN) and the ﬁrst one additionally uses dropout.The number of activations for each layer is shown in Figure 3;the dimensions of the input CSI-feature vector f ( u ) [ b ] and theoutput probability-map vector p ( u ) [ b ] depend on the featuretype and the resolution of the probability map, respectively. C. Probability Maps

Instead of using a NN that directly produces an estimate ˆ x ( u ) on the true location x ( u ) of UE u , which is the de-facto standardapproach [15], [16], [19], [20], [22], [23], we propose to use aprobabilistic description of UE location; this has the advantagethat the NN output contains valuable information that canbe used during downstream processing, e.g., (i) to extractreliability estimates on the estimated UE location and (ii) tofuse multiple probability maps to improve positioning accuracyin multi-antenna and/or multi-point scenarios. In addition, aprobabilistic description of location can also resolve “conﬂicts,”which may arise if two spatial locations generate similar CSI.Finally, the concept of probability maps can also improve NNtraining for CSI-based positioning.

1) Basics of Probability Maps:

We overlay a set of K gridpoints g k ∈ R D , k = 1 , . . . , K , over the space that will be usedto perform positioning. Here, D is typically two or three andrepresents the number of spatial dimensions used to performpositioning, and the convex hull over all grid points H = (cid:40) K (cid:88) k =1 α k g k | ( α k ∈ R + , ∀ k ) ∧ K (cid:88) k =1 α k = 1 (cid:41) (10)must include the target area in which localization will beperformed. The probability map p ( u ) [ b ] ∈ [0 , K representsthe probability of UE u being located exactly at each gridpoint, i.e., we have that p ( u ) k [ b ] ∈ [0 , for k = 1 , . . . , K and (cid:80) Kk =1 p ( u ) k [ b ] = 1 . Note that the last softmax layer in theproposed NN structure shown in Figure 3 ensures that thegenerated outputs correspond to probability mass functions(PMFs). Furthermore, if the probabilities contained in p ( u ) [ b ] indeed model the UE’s position, we can compute the expectedlocation of UE u as follows: ˆ x ( u ) [ b ] = K (cid:88) k =1 g k p ( u ) k [ b ] . (11)By deﬁning the D × K grid point matrix G = (cid:2) g , . . . , g K (cid:3) the expected location is simply ˆ x ( u ) [ b ] = Gp ( u ) [ b ] . Hence,one could easily augment the NN shown in Figure 3 todirectly generate an estimate of the UE location ˆ x ( u ) [ b ] byadding an additional output layer that is linear (with weightscorresponding to G and no bias terms) and untrainable. Remark 7.

The selection of grid points can either form anequispaced rectangular grid for D = 2 (or equispaced cubicgrid for D = 3 ) that includes the target area or can be chosenarbitrarily. An example of a rectangular grid is shown inFigure 4. Irregular grid points may be useful to only coverlocations that are populated or to place grid points at higherdensity in areas where higher positioning accuracy is required.2) Training NNs with Probability Maps: In order to ensurethat the probability maps p ( u ) [ b ] generated by the proposed NN accurately model the location of UE u being at gridpoint g k with probability p ( u ) k [ b ] , the network must be trainedaccordingly. Assume that we have a training set with CSIfeature vectors { f ( u ) [ b ] } U (cid:48) u =1 obtained from U (cid:48) distinct locations { x ( u ) } U (cid:48) u =1 . In order to train the NN shown in Figure 3, we needto compute reference probability maps { p ( u ) [ b ] } U (cid:48) u =1 associatedwith ground-truth positions { x ( u ) } U (cid:48) u =1 . Unfortunately, given aposition x ( u ) there are, in general, inﬁnitely many probabilitymaps p ( u ) [ b ] for which x ( u ) [ b ] = Gp ( u ) [ b ] holds and p ( u ) [ b ] is a PMF. To address this issue, we propose to select theprobability map for which the error variance is minimized—this choice only activates probabilities associated with gridpoints that are nearby the ground truth location.To compute such minimum-variance probability maps, wereiterate that (11) is nothing but the expected location of UE u .The D × D covariance matrix is then given by C u [ b ] = K (cid:88) k =1 p ( u ) k [ b ]( g k − ˆ x ( u ) [ b ])( g k − ˆ x ( u ) [ b ]) T . (12)and the combined variance is σ [ b ] = tr( C ( u ) [ b ]) . By deﬁningthe vector v ( u ) ∈ R K with entries v ( u ) k [ b ] = (cid:107) g k − ˆ x ( u ) [ b ] (cid:107) , k = 1 , . . . , K , we have that σ [ b ] = (cid:104) v ( u ) [ b ] , p ( u ) [ b ] (cid:105) . Hence,we can solve the following convex optimization problem tolearn a minimum-variance probability map p ( u ) [ b ] from theground-truth location x ( u ) :  minimize p ∈ R K p T v ( u ) [ b ] subject to (cid:107) Gp − x ( u ) (cid:107) ≤ ε (cid:80) Kk =1 p k = 1 , p k ∈ [0 , , ∀ k. (13)Here, ε > can be used to trade-off accuracy vs. variance.Problems of this form can easily be solved using off-the-shelfconvex solvers, such as CVX [61], or customized solvers thatbuild on Douglas-Rachford splitting [62]. Note that if a ground-truth position is outside the convex hull H spanned by the gridpoints as deﬁned in (10), then the optimization problem mayno longer be feasible ; for positions within the convex hull,the optimization problem in (13) is always feasible. Remark 8.

For grid points on a equispaced D = 2 grid, theproblem in (13) can be simpliﬁed by identifying the nearestfour grid points (two in x-direction; two in y-direction) to thetarget location x ( u ) and assign the nonzero probabilities tothese four grid points while minimizing the variance. After learning the minimum-variance probability maps { x ( u ) } U (cid:48) u =1 for each ground-truth position x ( u ) , we can learn theNN parameters θ b . To this end, we use extracted CSI-features { f ( u ) [ b ] } U (cid:48) u =1 and the probability maps { p ( u ) } U (cid:48) u =1 associatedwith ground-truth position { x ( u ) } U (cid:48) u =1 , and we train the NNusing a symmetric cross-entropy loss function. Remark 9.

We have observed that learning the NN parametersusing a cross-entropy loss instead of training the same networkaugmented with an additional untrainable grid point layer G with a mean-square error loss between estimated and groundtruth position resulted in superior positioning accuracy. Uniqueness depends on the choice of the trade-off parameter ε . . GÖNÜLTA¸S ET AL. IV. P

ROBABILITY F USION

As illustrated in Figure 2(b), we are interested in fusingmultiple probability maps obtained from different APs and/ortransmit antennas in order to improve positioning accuracy.We now propose three different methods for probability fusion:Probability conﬂation, Gaussian conﬂation, and NN-basedprobability fusion. A comparison of these three probabilityfusion approaches is shown in Section V.

A. Probability Conﬂation

Assume that we have multiple neural networks that generatedifferent probability maps for a given UE u , e.g., obtained fromdifferent APs p ( u ) [ b ] , b = 1 , . . . , B , or from different transmitantennas p ( u ) m [ b ] , m = 1 , . . . , M T , or a combination of both.To simplify notation, assume that B (cid:48) neural networks generatea collection of B (cid:48) probability maps denoted by { p ( u ) [ b ] } B (cid:48) b =1 ,irrespective of whether these are obtained from different APsor transmit antennas. Our goal is now to fuse this collection ofprobability maps to a single probability map ¯ p ( u ) , which canthen be used to generate an improved estimate of the u th UE’sposition by computing the expected position ˆ x ( u ) = G ¯ p ( u ) .The idea of combining multiple PMFs into a single PMFthat more accurately describes the observed quantity has beenwidely studied in the literature; see, e.g., [52] and the referencestherein. Intuitively, combining PMFs for UE positioning shouldautomatically give more weight to probability maps withsmaller variance. Ideally, the fused probability map shouldprovide a more accurate estimate of the UE’s position thansolely using the most reliable probability map. To achievethis goal, we propose to use probability conﬂation as putforward in [52, Def. 2.7]. The approach is straightforward—simply compute the (unnormalized) sub-PMF via a point-wiseHadamard product as µ ( u ) k = B (cid:48) (cid:89) b =1 p ( u ) k [ b ] , k = 1 , . . . , K, (14)followed by normalizing the fused sub-PMF vector µ ( u ) to thefused PMF according to ¯ p ( u ) = µ ( u ) (cid:107) µ ( u ) (cid:107) . (15)As demonstrated in [52, Sec. 4], probability conﬂation cannotimprove the amount of information contained in all probabilitymaps { p ( u ) [ b ] } B (cid:48) b =1 , but the resulting conﬂated PMF can beshown to be optimal (among other properties) in terms ofminimizing the loss of Shannon information. Remark 10.

In practice, probability conﬂation requires oneto pass all probability maps { p ( u ) [ b ] } B (cid:48) b =1 with a total numberof K × B (cid:48) real numbers to a centralized processor, whichperforms the computations in (14) and (15).B. Gaussian Conﬂation While probability conﬂation requires the transfer of K × B (cid:48) real numbers, we can use an alternative conﬂation approach that According to [52, Sec. 4], the Shannon information of an event A isdeﬁned as S ( A ) = − log ( P [ A ]) , where P [ A ] is the probability of A . reduces the amount of information transfer. This probabilityfusion approach is inspired by the method used to trainprobability maps in Section III-C2, where we compute themean and variance of a probability map. Let us assume thatthe mean UE position ˆ x ( u ) can be modeled as follows: ˆ x ( u ) [ b ] = x ( u ) + e ( u ) [ b ] . (16)Here, x ( u ) [ b ] is the true position and the error vector e ( u ) [ b ] ∈ R D is assumed to be zero-mean. As shown in (11), given aprobability map p ( u ) [ b ] , the mean position can be computedas in (11). By furthermore assuming that the entries inthe error vector e ( u ) [ b ] are pairwise uncorrelated (meaningthat the positioning errors in each spatial dimension areuncorrelated), its covariance matrix K ( u ) [ b ] = diag( C ( u ) [ b ]) corresponds to the main diagonal of the covariance matrix C ( u ) [ b ] deﬁned in (12). With the model in (16), we have B (cid:48) “noisy” observations of the true location x ( u ) . By assumingthat the entries in the error vector e ( u ) [ b ] are Gaussian and thatthe error vectors are pairwise independent across observations b = 1 , . . . , B (cid:48) , we can now perform Gaussian conﬂation asanalyzed in [52, Thm. 6.1]. The optimal combination of meanpositions ˆ x ( u ) [ b ] , b = 1 , . . . , B (cid:48) , in terms of minimizing thepost fusion error covariance (or mean-square error) is given by ˆ x ( u ) d = (cid:80) B (cid:48) b =1 (cid:2) K ( u ) [ b ] (cid:3) − d,d ˆ x ( u ) d [ b ] (cid:80) B (cid:48) b =1 (cid:2) K ( u ) [ b ] (cid:3) − d,d , d = 1 , . . . , D. (17)Intuitively, Gaussian fusion de-weights position estimates withhigher variance. The diagonal entries of the error covariancematrix K ( u ) of the fused estimate ˆ x ( u ) d from (17) are given by (cid:2) K ( u ) (cid:3) d,d = B (cid:48) (cid:88) b =1 (cid:2) K ( u ) [ b ] (cid:3) − d,d , d = 1 , . . . , D. (18) Remark 11.

In practice, Gaussian conﬂation requires only B (cid:48) mean-variance pairs for each dimension D , which requires atransfer of × D × B (cid:48) real numbers to a centralized processor,which computes an improved location estimate as in (17).C. Neural-Network-Based Probability Fusion Besides the two conﬂation methods discussed above, thereexist other probability fusion methods. A straightforwardapproach is to compute simple unweighted average as ˆ x ( u ) = 1 B (cid:48) B (cid:48) (cid:88) b =1 ˆ x ( u ) [ b ] = 1 B (cid:48) B (cid:48) (cid:88) b =1 Gp ( u ) [ b ] , (19)which is a special case of Gaussian conﬂation in (17) assumingthat all error variances are equal. While this simple averagingapproach can serve as a simple baseline probability fusionmethod, it can be improved by including this idea into a neuralnetwork and optimizing the linear combination weights.We ﬁrst train the B (cid:48) neural networks g θ b , b = 1 , . . . , B (cid:48) . Wethen stack the B (cid:48) neural networks along with their individualprobability map outputs p ( u ) [ b ] , b = 1 , . . . , B (cid:48) , and add abias-free linear layer whose weight matrix is initialized with ¯ G = 1 B (cid:48) (cid:2) G , . . . , G (cid:124) (cid:123)(cid:122) (cid:125) B (cid:48) times (cid:3) (20) SUBMITTED TO A JOURNAL to its output. This ﬁnal linear layer, combined with the stackedprobability map vector ¯ p ( u ) = (cid:2) p ( u ) [1] T , . . . , p ( u ) [ B (cid:48) ] T (cid:3) T , (21)computes ˆ x ( u ) = ¯ G ¯ p ( u ) as in (19). For the same trainingset consisting of ground-truth locations { x ( u ) [ b ] } U (cid:48) u =1 andassociated CSI-feature vectors { f ( u ) [ b ] } U (cid:48) u =1 , we then continueweight learning of the ﬁnal layer ¯ G by minimizing the meandistance error (MDE) loss deﬁned as L MDE = 1 U (cid:48) U (cid:48) (cid:88) u =1 (cid:13)(cid:13) x ( u ) − ˆ x ( u ) (cid:13)(cid:13) . (22)Since we continue learning ¯ G after initializing it with thematrix in (20), we expect this method to perform no worsethan keeping ¯ G ﬁxed and performing averaging as in (19). Remark 12.

We note that one could also retrain the weightsand biases contained in θ b , b = 1 , . . . , B , of the B (cid:48) networkswhen learning the weights in the matrix ¯ G . We have, however,not observed accuracy improvements by doing so. V. R

ESULTS

We now evaluate the performance of the proposed posi-tioning pipeline for a range of indoor CSI measurements. Weﬁrst describe the system setup, measurement scenarios, andperformance metrics. We then show accuracy results for a rangeof multi-antenna and multi-point probability fusion methods.

A. Measurement Setup

Our measurement setup consists of ﬁve components: (i) Aportable UE with wireless LAN (WLAN) connectivity; weuse a Raspberry Pi Model 4 that is equipped with a two-antenna IEEE 802.11ac transceiver. (ii) A robot equippedwith an embedded processor to move the UE; we use iRobotRoomba Create 2 controlled by the Raspberry Pi. (iii) A WLANAP that enables the extraction of CSI; the AP is equippedwith a four-antenna IEEE 802.11ac transceiver operating at5 GHz with 80 MHz bandwidth and provides an API to accessthe raw per-subcarrier CSI for each receive and transmitantenna. (iv) A precise positioning system for ground-truthposition extraction; we use use two different systems dependingon the scenario: A WorldViz PPT-N active point trackingsystem [50] with sub-millimeter positioning accuracy enabledby triangulation with four infrared (IR) camera readings ofactive IR transmitters, and Vicon Vero passive point trackingsystem [51] with sub-millimeter positioning accuracy enabledby triangulation with twelve motion capture camera readingsof passive reﬂective markers. (iv) A host computer that iscollecting CSI measurements, running the precision positioningsystems, and controlling the robot. Figure 4 illustrates themeasurement setup.The data collection procedure is as follows: We controlthe robot’s position to follow a predeﬁned path in piecewiselinear movements over two dimensions x = [ x , x ] T . TheRaspberry Pi continuously transmits high-quality images froman on-board camera to the host PC via the two transmit antennas M T = 2 . At the same time, the AP is receiving data and CSI AP Fig. 4. Measurement setup. We use a robot that carries a WLAN transmitter;one or multiple APs then record CSI measurements. An accurate positionestimate of the robot x ( u ) is extracted using a multi-camera precisionlocalization system and recorded for NN training and testing.(a) (b)Fig. 5. Floor plans used to perform indoor positioning experiments. (a) Labﬂoor plan with an area of . m × . m. (b) Living room (bottom; area m × . m) and kitchen (top; area . m × m) ﬂoor plans. The access point(AP) locations are marked by circles. from the four antennas M R = 4 , and the precision positioningsystem is extracting ground-truth position information; all thedata is stored at the host computer. Since we measure CSIand ground truth positions from two separate sources, weﬁrst synchronize the operating-system clocks of the RaspberryPi and the precision positioning system running at the hostPC via the network time protocol (NTP). We then match themeasurement times of the CSI and the precision positioningsystem. For the multi-point measurement, we perform twoseparate measurement campaigns, i.e., we ﬁrst measure CSI atAP1 and ground-truth position for the entire area and later wemeasure CSI at AP2 and ground-truth position while followingapproximately the same track with the robot. We then match theCSI from both APs so that there is less than cm ground-truthposition difference between both measurements. Remark 13.

Since our multi-point measurements are acquiredat two different time instants—more than tens of minutes apart—our positioning results in Section V-D imply that accuratepositioning from multi-point measurements does not need exacttime synchronization or simultaneous recording of CSI. . GÖNÜLTA¸S

ET AL. B. Measurement Scenarios

In order to evaluate the proposed multi-antenna and multi-point probability fusion positioning pipeline, we consider fourdifferent scenarios measured in three different locations: Labspace, living room, and kitchen. Figure 5 shows the ﬂoor plansof these locations as well as the AP positions. a) Single-Point LoS Lab Scenario:

We collected CSI andground-truth position in the lab space shown in Figure 5(a)under line-of-sight (LoS) conditions. For the training set, therobot passes through the lab space by following grid-like paththree different times; Figure 6(a) shows the robot’s path. Forthe test set, the robot follows a “VIP” shaped path (shownwith blue color). b) Single-Point Non-LoS Living Room Scenario:

We col-lected CSI and ground-truth position in the living room shownat the bottom of Figure 5(b) under non-LoS conditions; non-LoS conditions are achieved by placing AP1 behind a TV.For the training set, the robot was moving randomly througha predeﬁned area in the living room; Figure 6(b) shows therobot’s path. For the test set, the robot follows a “VIP” shapedpath (shown with blue color). c) Single-Point Non-LoS Kitchen Scenario:

We collectedCSI and ground-truth position in the kitchen shown at thetop of Figure 5(b) under non-LoS conditions as we usedAP1. For the training set, the robot was moving randomlythrough a predeﬁned area in the kitchen; Figure 6(c) shows therobot’s path. For the test set, we use % randomly selectedmeasurements from the robot’s path (shown with blue color). d) Multi-Point LoS Living Room Scenario: We collectedCSI and ground-truth position in the living room area shownat the bottom of Figure 5(b) under LoS conditions, wherewe used both AP1 and AP2. For the training set, we haverecorded CSI at two different times as detailed at the end ofSection V-A, where the robot was following a grid-like path;Figure 6 shows the robot’s path. For the test set, the robotfollows a rectangular-like path (shown with blue color).

Remark 14.

We will make our CSI and ground-truth positiondatasets as well as our TensorFlow code available on GitHubafter (possible) acceptance of the paper.C. Performance Metrics and Positioning Methods

For all experiments, we use the CSI-features detailed inSection III-A and the NN topology discussed in Section III-B.For the probability maps, we use a × regular rectangulargrid, which results in probability maps of K = 484 . In order toevaluate the positioning performance of the compared methods,we ﬁrst obtain position estimates for the test set and then wecompute three different metrics: (i) The mean distance error(MDE), i.e., the expression in (22) evaluated over the test set,(ii) the median distance error, and (iii) the 95th percentile error.The median and 95th percentile distance errors were obtainedby sorting the distances (cid:107) x ( u ) − ˆ x ( u ) (cid:107) , u = 1 , . . . , U (cid:48) , of thetest set in ascending order and extracting the 50% and 95%distances, respectively.For the multi-antenna and multi-point scenarios, we use theprobability fusion methods discussed in Section IV. Concretely,we compare the performance of CSI-based positioning for the following methods. For single-point and single-antennaexperiments, we train a single neural network (NN) for onetransmit antenna and one AP; we label these methods as “1NN, AP b TX m ,” where “1 NN” implies that we use a singleNN, b = 1 , . . . , B is the AP number, and m = 1 , . . . , M T the transmit (TX) antenna number. For multi-antenna and/ormulti-point experiments, we use, as a baseline, a single NNin which we stack all CSI-features f ( u ) m [ b ] for b = 1 , . . . , B and m = 1 , . . . , M T in a single long CSI-feature vector;we label this method as “1 NN, stacked features.” In whatfollows, we will compare this method to probability fusion-based approaches. For fusion-based methods, we train two orfour NNs depending on the scenario. For two transmit antennasand one AP (i.e., multi-antenna and single-point) or for twoAPs and one transmit antenna (i.e., single-antenna and multi-point), we train two NNs and perform probability fusion fromtwo probability maps. We consider the following methods: “2NN, averaging,” where we implement unweighted averagingas in (19); “2 NN, prob. conﬂation,” where we implementprobability conﬂation as detailed in Section IV-A; “2 NN,Gaussian conﬂation,” where we implement Gaussian conﬂationas detailed in Section IV-B; and “2 NN, NN fusion,” where weimplement the NN-based fusion as detailed in Section IV-C.For two transmit antennas and two APs (i.e., multi-antennaand multi-point), we train four NNs and perform probabilityfusion from four probability maps; the associated methods arelabeled as “4 NN, averaging,” “4 NN, prob. conﬂation,” “4NN, Gaussian conﬂation,” and “4 NN, NN fusion.” D. Positioning Results

Figure 7 shows bar plots evaluated on the test sets for multi-antenna scenarios. Figure 8 shows bar plots evaluated for amulti-point scenario. Figure 9 shows bar plots for a multi-antenna multi-point scenario. For all three ﬁgures, the left barplot shows the mean distance error, the middle bar plot showsthe median distance error, and the right bar plot shows the95th percentile distance error.

1) Multi-Antenna Results:

Figures 7(a), 7(b), and 7(c) showpositioning results corresponding to the multi-antenna scenariosa), b), and c) detailed in Section V-B. We observe thatwhen using CSI-features from different transmit antennas butfrom a single AP, the accuracy can vary signiﬁcantly acrossantennas. For the probability fusion methods, we see that simpleaveraging and NN-based fusion performs equally well. The bestperforming methods are the use of a single NN with stackedfeatures (labeled by “1 NN, stacked features”), as well asprobability conﬂation (labeled by “2 NN, prob. conﬂation”) andGaussian conﬂation (labeled by “2 NN, Gaussian conﬂation”).

2) Multi-Point Results:

Figure 8 shows positioning resultscorresponding to the multi-point scenario d) detailed inSection V-B where we only use one transmit antenna. Weobserve that the accuracy of using AP1 is superior than thatof AP2. Furthermore, we see that when fusing the probabilitymaps from AP1 and AP2, probability conﬂation and Gaussianconﬂation performs equally well as the single NN with stackedfeatures. However, the amount of data to be transferred to acentralized processor for the Gaussian conﬂation approach is (a) -1.5-1-0.500.511.5 x [m] -2-1.5-1-0.500.511.522.5 x [ m ] (b) -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 x [m] -1.8-1.6-1.4-1.2-1-0.8-0.6-0.4-0.20 x [ m ] (c) -1 -0.5 0 0.5 1 x [m] -1.5-1-0.500.511.522.5 x [ m ] (d)Fig. 6. Ground-truth positions collected for different scenarios: (a) Single-point LoS lab; (b) single-point non-LoS living room; (c) single-point non-LoSkitchen; (d) multi-point LoS living room. The gradient-colored curves represent locations used for training; the blue curves correspond to the test set. signiﬁcantly lower than those of the single NN with stackedfeatures and probability conﬂation.

3) Multi-Antenna Multi-Point Results:

Figure 9 showspositioning corresponding to the multi-point scenario d) detailedin Section V-B where we use both transmit antennas. We seethat the accuracy of the ﬁrst transmit antenna (TX1) of AP1 issuperior to the other antenna-AP combinations. Furthermore,we see that Gaussian conﬂation outperforms all other fusionapproaches for the considered performance metrics.

4) Visualization of Probability Fusion:

Figure 10 illustratesthe efﬁcacy of probability fusion, where we show the ground-truth positions in Figure 10(a) on the test-set for scenario d)in Section V-B and estimated position using the proposedmethods. Figure 10(b) shows that the estimated locationsat AP2 from the second transmit antenna (TX2) results inquite a few outliers. When performing a single NN withstacked features and multiple NNs with Gaussian conﬂation inFigure 10(c) in Figure 10(d), respectively, one can see that theposition accuracy signiﬁcantly improves. Clearly, treating theinformation contained in the probability maps as Gaussiansand fusing these mean-variance pairs results in accurate indoorpositioning while only requiring a minimum amount of datatransmission to a centralized processor that performs fusion.

Remark 15.

We emphasize that the 1 NN method with asingle stacked feature results in excessively large features inthe ﬁrst layer, which substantially increases computationalcomplexity and storage. Furthermore, the stacked featureapproach requires centralized processing of all CSI features,which requires a large number of data to be transferred to acentralized processor. In contrast, Gaussian conﬂation requiresthe transfer of only mean-variance pairs to the centralizedprocessor, while resulting in similar or often superior accuracy.

VI. C

ONCLUSIONS

We have proposed CSI-based indoor positioning methods thatare able to fuse one or multiple probability maps which describethe UEs’ positions. We have used a NN-based positioningpipeline that takes in features designed for MIMO-OFDM-based systems and are robust to typical hardware impairments,and generates probability maps. We have proposed threedifferent fusion methods for the computed probability maps,which reduce the amount of data to be transferred to acentralized processor that estimates UE position. To demon-strate the effectiveness of the proposed positioning methods, we have evaluated our methods on four real-world indoorpositioning datasets, which include multi transmit-antenna andmulti AP scenarios. Our comparison reveals three facts: (i)Indoor position accuracy of a few centimeters is possible fromIEEE 802.11ac measurements, (ii) simple probability fusiontechniques can signiﬁcantly improve positioning accuracy whilereducing the amount of data to be transported to a centralizedprocessor, and (iii) multi-point probability fusion does notrequire accurate synchronization between the APs. We believethat the proposed probability fusion approach paves the wayfor other positioning systems or scenarios in which multiplefeatures from different sensors are available.There are many avenues for future work. First and foremostis the exploration of channel-charting based methods thatreduce the need for dedicated CSI and ground-truth positionmeasurement campaigns. Second is the development of accuratepositioning pipelines that fuse multiple sensor modalities(besides CSI) which is part of ongoing work.VII. A

CKNOWLEDGMENTS

The authors would like to thank O. Castañeda and B. Rap-paport for discussions on CSI-based positioning using neuralnetworks. We also thank Prof. K. Petersen for allowing us touse the Vicon positioning system [51] and the lab space atCornell University shown in Figure 5(a).R

EFERENCES[1] S. Han, Z. Gong, W. Meng, C. Li, and X. Gu, “Future alternativepositioning, navigation, and timing techniques: A survey,”

IEEE WirelessCommun. , vol. 23, no. 6, pp. 154–160, Oct. 2016.[2] N. Fallah, I. Apostolopoulos, K. Bekris, and E. Folmer, “Indoor humannavigation systems: A survey,”

Interacting with Computers , vol. 25, no. 1,pp. 21–33, Jan. 2013.[3] F. Wen, H. Wymeersch, B. Peng, W. P. Tay, H. C. So, and D. Yang,“A survey on 5G massive MIMO localization,”

Digital Signal Process. ,vol. 94, pp. 21–28, Nov. 2019.[4] R. F. Brena, J. P. García-Vázquez, C. E. Galván-Tejada, D. Muñoz-Rodriguez, C. Vargas-Rosales, and J. Fangmeyer, “Evolution of indoorpositioning technologies: A survey,”

J. Sensors , vol. 2017, Mar. 2017.[5] J. Armstrong, Y. A. Sekercioglu, and A. Neild, “Visible light positioning:A roadmap for international standardization,”

IEEE Commun. Mag. ,vol. 51, no. 12, pp. 68–73, Dec. 2013.[6] Y.-S. Kuo, P. Pannuto, K.-J. Hsiao, and P. Dutta, “Luxapose: Indoorpositioning with mobile phones and visible light,” in

Proc. 20thAnnual Int. Conf. Mobile Comput. Networking , Sep. 2014, pp. 447–458.[Online]. Available: https://doi.org/10.1145/2639108.2639109[7] H. Koyuncu and S. H. Yang, “A survey of indoor positioning and objectlocating systems,”

Intl. J. Comput. Science Network Security , vol. 10,no. 5, pp. 121–128, May 2010. . GÖNÜLTA¸S

ET AL. . . . . M ea nd i s t a n cee rr o r[ m ] . . . . M e d i a nd i s t a n cee rr o r[ m ] . . . . t hp e r ce n til e d i s t a n cee rr o r[ m ] (a) . . . . . . . . . . . . M ea nd i s t a n cee rr o r[ m ] . . . . . . . . . . . . M e d i a nd i s t a n cee rr o r[ m ] . . . . . . . . . . . . t hp e r ce n til e d i s t a n cee rr o r[ m ] (b) . . . . . . M ea nd i s t a n cee rr o r[ m ] . . . . . . M e d i a nd i s t a n cee rr o r[ m ] . . . . . . . . t hp e r ce n til e d i s t a n cee rr o r[ m ] (c)Fig. 7. Bar plots showing mean distance error (left), median distance error (middle), and the th percentile distance error (right) evaluated on the test set forthree multi-antenna scenarios: (a) single-point LoS lab scenario, (b) single-point non-LoS living room scenario, and (c) single-point non-LoS kitchen scenario. . . . . . . M ea nd i s t a n cee rr o r[ m ] . . . . . . M e d i a nd i s t a n cee rr o r[ m ] . . . . . . . t hp e r ce n til e d i s t a n cee rr o r[ m ] Fig. 8. Bar plots showing mean distance error (left), median distance error (middle), and the th percentile distance error (right) evaluated on the test set forthe single-antenna multi-point LoS living room scenario detailed in Section V-B. . . . . . . M ea nd i s t a n cee rr o r[ m ] . . . . . . M e d i a nd i s t a n cee rr o r[ m ] . . . . . . . t hp e r c . d i s t a n cee rr o r[ m ] Fig. 9. Bar plots showing mean distance error (left), median distance error (middle), and the th percentile distance error (right) evaluated on the test set forthe multi-antenna multi-point LoS living room scenario detailed in Section V-B. − − . − . − . − . . . . . − − . − . − . − . . . . . x [m] x [ m ] (a) − − . − . − . − . . . . . − − . − . − . − . . . . . x [m] x [ m ] (b) − − . − . − . − . . . . . − − . − . − . − . . . . . x [m] x [ m ] (c) − − . − . − . − . . . . . − − . − . − . − . . . . . x [m] x [ m ] (d)Fig. 10. Ground-truth locations (a) and estimated location (b), (c), and (d) for the multi-point LoS living room scenario detailed in Section V-B. (b) shows theperformance of using a single NN from AP2 and TX1 ( MDE = 11 cm); (c) shows the performance of 1 NN with stacked features from 2 APs and 2 TXantennas (

MDE = 4 . cm); and (d) shows the performance of 4 NNs with Gaussian conﬂation ( MDE = 3 . cm).[8] C. Lee, Y. Chang, G. Park, J. Ryu, S.-G. Jeong, S. Park, J. W. Park,H. C. Lee, K. shik Hong, and M. H. Lee, “Indoor positioning systembased on incident angles of infrared emitters,” in Proc. Ann. Conf. ofIEEE Ind. Electron. Soc. , vol. 3, Nov. 2004, pp. 2218–2222.[9] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoorpositioning techniques and systems,”

IEEE Trans. Syst., Man, Cybern.C, Appl. Rev. , vol. 37, no. 6, pp. 1067–1080, Nov. 2007.[10] W. Liu, Q. Cheng, Z. Deng, H. Chen, X. Fu, X. Zheng, S. Zheng,C. Chen, and S. Wang, “Survey on CSI-based indoor positioning systemsand recent advances,” in

Proc. Intl. Conf. Indoor Positioning and IndoorNavigation (IPIN) , Sep. 2019, pp. 1–8.[11] K. Wu, J. Xiao, Y. Yi, D. Chen, X. Luo, and L. M. Ni, “CSI-basedindoor localization,”

IEEE Trans. Parallel Distrib. Syst. , vol. 24, no. 7,pp. 1300–1309, Jul. 2013.[12] F. Gustafsson and F. Gunnarsson, “Mobile positioning using wirelessnetworks: Possibilities and fundamental limitations based on availablewireless network measurements,”

IEEE Signal Process. Mag. , vol. 22,no. 4, pp. 41–53, Jul. 2005.[13] Z. Sahinoglu, S. Gezici, and I. Guvenc, “Ultra-wideband positioningsystems,”

Cambridge, New York , 2008.[14] C. Zhang, Y. Ueng, C. Studer, and A. Burg, “Artiﬁcial intelligence for5G and beyond 5G: Implementations, algorithms, and optimizations,”

IEEE J. Emerg. Sel. Topics Circuits Syst. , vol. 10, no. 2, pp. 149–163,Jun. 2020.[15] M. Arnold, J. Hoydis, and S. ten Brink, “Novel massive MIMO channelsounding data applied to deep learning-based indoor positioning,” in

Intl.ITG Conf. Systems, Commun., Coding , Feb. 2019, pp. 1–6.[16] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based ﬁngerprintingfor indoor localization: A deep learning approach,”

IEEE Trans. Veh.Technol. , vol. 66, no. 1, pp. 763–776, Jan. 2017.[17] Z. Yang, Z. Zhou, and Y. Liu, “From RSSI to CSI: Indoor localizationvia channel response,”

ACM Comput. Surveys , vol. 46, no. 2, pp. 1–32,Nov. 2013.[18] E. Lei, O. Castañeda, O. Tirkkonen, T. Goldstein, and C. Studer, “Siameseneural networks for wireless positioning and channel charting,” in , Sep.2019, pp. 200–207.[19] P. Ferrand, A. Decurninge, and M. Guillaud, “DNN-based localizationfrom channel estimates: Feature design and experimental results,”

ArXivpreprint: 2004.00363 , Apr. 2020.[20] J. Vieira, E. Leitinger, M. Sarajlic, X. Li, and F. Tufvesson, “Deepconvolutional neural networks for massive MIMO ﬁngerprint-basedpositioning,” in n Proc. IEEE Intl. Symp. Personal, Indoor, Mobile RadioCommun. , Oct. 2017, pp. 1–6.[21] V. Savic and E. G. Larsson, “Fingerprinting-based positioning indistributed massive MIMO systems,” in

Proc. IEEE 82nd Vehic. Tech.Conf. , Sep. 2015, pp. 1–5.[22] A. Zappone, M. Di Renzo, and M. Debbah, “Wireless networks designin the era of deep learning: Model-based, AI-based, or both?”

IEEETrans. Commun. , vol. 67, no. 10, pp. 7331–7376, Jun. 2019.[23] X. Wang, L. Gao, S. Mao, and S. Pandey, “DeepFi: Deep learning forindoor ﬁngerprinting using channel state information,” in

Proc. IEEEWireless Commun. Netw. Conf. , Mar. 2015, pp. 1666–1671.[24] H. Chen, Y. Zhang, W. Li, X. Tao, and P. Zhang, “ConFi: Convolutionalneural networks based indoor Wi-Fi localization using channel stateinformation,”

IEEE Access , vol. 5, pp. 18 066–18 074, Sep. 2017. [25] C. Studer, S. Medjkouh, E. Gönülta¸s, T. Goldstein, and O. Tirkkonen,“Channel charting: Locating users within the radio environment usingchannel state information,”

IEEE Access , vol. 6, pp. 47 682–47 698, Aug.2018.[26] S. Medjkouh, E. Gönülta¸s, T. Goldstein, O. Tirkkonen, and C. Studer,“Unsupervised charting of wireless channels,” in

Proc. IEEE GlobalCommun. Conf. (GLOBECOM) , Dec. 2018, pp. 1–7.[27] P. Ferrand, A. Decurninge, L. G. Ordoñez, and M. Guillaud, “Triplet-based wireless channel charting,”

ArXiv preprint: 2005.12242 , May 2020.[28] J. Deng, S. Medjkouh, N. Malm, O. Tirkkonen, and C. Studer, “Multipointchannel charting for wireless networks,” in

Proc. IEEE Conf. Rec.Asilomar Conf. Signals, Sys., and Comp. , Feb. 2018, pp. 286–290.[29] C. Geng, H. Huang, and J. Langerman, “Multipoint channel chartingwith multiple-input multiple-output convolutional autoencoder,” in

Proc.IEEE/ION Position, Location Navigation Symp. (PLANS) , Apr. 2020, pp.1022–1028.[30] S. He and S. . G. Chan, “Wi-Fi ﬁngerprint-based indoor positioning:Recent advances and comparisons,”

IEEE Commun. Surveys Tuts. , vol. 18,no. 1, pp. 466–490, Aug. 2016.[31] Y. Ma, G. Zhou, and S. Wang, “WiFi sensing with channel stateinformation: A survey,”

ACM Comput. Surv. , vol. 52, no. 3, Jun. 2019.[32] W. Liu, Q. Cheng, Z. Deng, H. Chen, X. Fu, X. Zheng, S. Zheng,C. Chen, and S. Wang, “Survey on CSI-based indoor positioning systemsand recent advances,” in

Proc. Int. Conf. on Indoor Positioning andIndoor Navigation (IPIN) , Sep. 2019, pp. 1–8.[33] Y. Chapre, A. Ignjatovic, A. Seneviratne, and S. Jha, “CSI-MIMO:Indoor Wi-Fi ﬁngerprinting system,” in

Proc. 39th Ann. IEEE Conf.Local Computer Networks , Sep. 2014, pp. 202–209.[34] X. Wang, L. Gao, and S. Mao, “BiLoc: Bi-modal deep learning forindoor localization with commodity 5GHz WiFi,”

IEEE Access , vol. 5,pp. 4209–4220, Mar. 2017.[35] B. Berruet, O. Baala, A. Caminada, and V. Guillet, “DelFin: A deeplearning based CSI ﬁngerprinting indoor localization in IoT context,” in

Proc. Int. Conf. on Indoor Positioning and Indoor Navigation (IPIN) ,Sep. 2018, pp. 1–8.[36] L. Tang, R. Ghods, and C. Studer, “Reducing the complexity ofﬁngerprinting-based positioning using locality-sensitive hashing,” in

Proc.Asilomar Conf. Signals, Syst., Comput. , Nov. 2019, pp. 1086–1090.[37] M. Widmaier, M. Arnold, S. Dorner, S. Cammerer, and S. ten Brink,“Towards practical indoor positioning based on massive MIMO systems,”in

Proc. IEEE Veh. Technol. Conf. , Sep. 2019, pp. 1–6.[38] S. Fang, T. Lin, and K. Lee, “A novel algorithm for multipathﬁngerprinting in indoor WLAN environments,”

IEEE Trans. WirelessCommun. , vol. 7, no. 9, pp. 3579–3588, Sep. 2008.[39] M. Youssef and A. Agrawala, “The Horus WLAN location determinationsystem,” in

Proc. 3rd Intl. Conf. Mobile sys., Applications, Services , Jun.2005, pp. 205–218.[40] J. Xiao, K. Wu, Y. Yi, and L. M. Ni, “FIFS: Fine-grained indoorﬁngerprinting system,” in

Proc. 21st Intl. Conf. Computer Commun.Networks (ICCCN) , Aug. 2012.[41] X. Sun, C. Wu, X. Gao, and G. Y. Li, “Fingerprint-based localization formassive MIMO-OFDM system with deep convolutional neural networks,”

IEEE Trans. Veh. Technol. , vol. 68, no. 11, pp. 10 846–10 857, Sep. 2019.[42] D. Vasisht, S. Kumar, and D. Katabi, “Sub-nanosecond time of ﬂight oncommercial Wi-Fi cards,” in

Proc. ACM Conf. on Special Interest GroupData Commun. , vol. 45, no. 4. Association for Computing Machinery,Aug. 2015, pp. 121–122. . GÖNÜLTA¸S

ET AL. [43] Y. Liu, W. Xiong, Z. Zhu, and S. Li, “CSI based high accuracy devicefree passive localization system,” in Proc. IEEE Veh. Technol. Conf. ,Aug. 2018, pp. 1–5.[44] P. Agostini, Z. Utkovski, and S. Sta´nczak, “Channel charting: AnEuclidean distance matrix completion perspective,” in

Proc. IEEE Intl.Conf. Acoustics, Speech and Signal Proces. , May 2020, pp. 5010–5014.[45] S. Acharya and M. Kam, “Evidence combination for hard and soft sensordata fusion,” in

Proc. Intl. Conf. Inf. Fusion , Jul. 2011, pp. 1–8.[46] R. Krzysztofowicz and D. Long, “Fusion of detection probabilities andcomparison of multisensor systems,”

IEEE Trans. Syst., Man, Cybern. ,vol. 20, no. 3, pp. 665–677, May-June 1990.[47] M. Aeberhard, S. Paul, N. Kaempchen, and T. Bertram, “Object existenceprobability fusion using dempster-shafer theory in a high-level sensordata fusion architecture,” in

Proc. IEEE Intelligent Veh. Symp. , Jun. 2011,pp. 770–775.[48] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble ofdeep neural networks with probability-based fusion for facial expressionrecognition,”

Cognitive Computation , vol. 9, no. 5, pp. 597–610, 2017.[49] N. Garcia, H. Wymeersch, E. G. Larsson, A. M. Haimovich, andM. Coulon, “Direct localization for massive MIMO,”

IEEE Trans. SignalProcess.

Trans. Amer. Math.Soc. , vol. 363, no. 6, pp. 3351–3372, Jun. 2011.[53] P. Huang, O. Castañeda, E. Gönülta¸s, S. Medjkouh, O. Tirkkonen, T. Gold-stein, and C. Studer, “Improving channel charting with representation-constrained autoencoders,” in n Proc. IEEE Int. Workshop Signal Process.Advances Wireless Commun. (SPAWC) , Aug. 2019, pp. 1–5. [54] IEEE 802.11ac-2013, “Draft standard for information technology —telecommunications and information exchange between systems — localand metropolitan area networks — speciﬁc requirements — part 11:Wireless LAN medium access control (MAC) and physical layer (PHY)speciﬁcations,” IEEE, Tech. Rep., 2013.[55] D. Alibi, U. Javed, Fei Wen, Di He, Peilin Liu, Yi Zhang, and LinggeJiang, “2D DOA estimation method based on channel state informationfor uniform circular array,” in

Proc. Intl. Conf. Ubiquitous Positioning,Indoor Navigation Location Based Services , Nov. 2016, pp. 68–72.[56] J. Brady, N. Behdad, and A. M. Sayeed, “Beamspace MIMO formillimeter-wave communications: System architecture, modeling, analy-sis, and measurements,”

IEEE Trans. Antennas Propag. , vol. 61, no. 7,pp. 3814–3827, Mar. 2013.[57] M. D. Zoltowski, M. Haardt, and C. P. Mathews, “Closed-form 3D angleestimation with rectangular arrays via DFT beamspace ESPRIT,” in

Proc.Asilomar Conf. Signals, Syst., Comput. , vol. 1, Oct. 1994, pp. 682–687.[58] S. Haene, A. Burg, P. Luethi, N. Felber, and W. Fichtner, “FFT processorfor OFDM channel estimation,” in

Proc. IEEE Int. Symp. Circuits Syst. ,May 2007, pp. 1417–1420.[59] T. Schenk,

RF imperfections in high-rate wireless systems: Impact anddigital compensation . Springer Science & Business Media, 2008.[60] C. Studer, M. Wenk, and A. Burg, “MIMO transmission with residualtransmit-RF impairments,” in

Proc. Int. ITG Workshop Smart Antennas(WSA) , Feb. 2010, pp. 189–196.[61] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convexprogramming, version 2.1,” 2014.[62] J. Douglas and H. H. Rachford, “On the numerical solution of heatconduction problems in two and three space variables,”