Application of backpropagation neural networks to both stages of fingerprinting based WIPS
IIEEE UPINLBS’2016Nov. 3~4, 2016, Shanghai, P.R. China
Application of backpropagation neural networks toboth stages of fingerprinting based WIPS
Caifa Zhou, Andreas Wieser
ETH Zurich, IGPStefano-Franscini-Platz 5, 8093 Zurich, SwitzerlandEmail: { [email protected]; [email protected] } Abstract —We propose a scheme to employ backpropagationneural networks (BPNNs) for both stages of fingerprinting-basedindoor positioning using WLAN/WiFi signal strengths (FWIPS):radio map construction during the offline stage, and localizationduring the online stage. Given a training radio map (TRM), i.e.,a set of coordinate vectors and associated WLAN/WiFi signalstrengths of the available access points, a BPNN can be trainedto output the expected signal strengths for any input positionwithin the region of interest (BPNN-RM). This can be used toprovide a continuous representation of the radio map and tofilter, densify or decimate a discrete radio map. Correspondingly,the TRM can also be used to train another BPNN to outputthe expected position within the region of interest for any inputvector of recorded signal strengths and thus carry out localization(BPNN-LA). Key aspects of the design of such artificial neuralnetworks for a specific application are the selection of designparameters like the number of hidden layers and nodes withinthe network, and the training procedure. Summarizing extensivenumerical simulations, based on real measurements in a testbed,we analyze the impact of these design choices on the performanceof the BPNN and compare the results in particular to thoseobtained using the k nearest neighbors ( k NN) and weighted k nearest neighbors approaches to FWIPS. The results indicatethat BPNN-RM can help to reduce the workload for radiomap generation significantly by allowing to sample the signalstrengths at significantly less positions during the offline phasewhile still obtaining equal or even slightly better accuracy duringthe online stage as when directly applying the sampled radio mapto (weighted) k NN. In the scenario analyzed within the paper theworkload can be reduced by almost 90%. We also show that aBPNN-LA with only 1 hidden layer outperforms networks withmore hidden layers and yields positioning accuracy comparableto or even slightly better than k NN but with less computationalburden during the online stage.
I. I
NTRODUCTION
As key requirements of context awareness and pervasivecomputing, indoor location based services (ILBSs) as well asthe systems to provide indoor positioning have attracted muchattention from both academia and industry over the last twodecades [1]. Their predicted market value is up to 2.5bn dollarsby 2020 [2]. Various indoor positioning systems (IPSs) basedon different signals, for instance WLAN/WiFi [3], Bluetooth,radio frequency identification (RFID) [4], light [5], magneticfield [6], ultra-wide band (UWB) [7] and ultrasound/acousticsound [8], [9], have been investigated as alternatives to globalnavigation satellites systems (GNSSs) which are unavailableor too inaccurate in the indoor environment [10].The application of WLAN/WiFi signals has attracted con-tinuous attention due to the widespread deployment of WLANs and availability of WiFi enabled mobile devices. From this per-spective, WLAN/WiFi based IPSs (WIPSs) are cost-effectivebecause they often do not require any additional infrastructureand no specific hardware for the purpose of positioning.Fingerprinting based localization is a very promising posi-tioning approach for IPSs because it also works if there isno line-of-sight (LoS) signal propagation between the accesspoints (APs) and the receivers. Methods based on trilaterationand triangulation depend on the availability of LoS signals andare negatively affected by non-LoS signal propagation whichis common within buildings. In this paper, the authors thusfocus on fingerprinting based WIPS (FWIPS).Generally, a FWIPS involves two stages: a site surveyoffline stage and a user positioning online stage. In the offlinestage, the site survey is conducted to create the radio map (RM)which represents the expected WLAN/WiFi signal strengthfor all locations within the region of interest (RoI). Oftenthe survey consists of sampling the received signal strength(RSS) from all visible APs at given reference points (RPs) withknown locations within the RoI. The collection of all RPs andthe corresponding RSS vectors is stored in a fingerprintingdatabase. The raw data in the fingerprinting database arethen converted into the radio map which is used for onlinepositioning. Here in this paper the original RM is just a setof RP coordinates and the respective measured RSS valueswithout any mapping or filtering. During the online stage,a user measures the RSS vector and matches it to the RMusing some defined similarity metric in the signal space (e.g.,Euclidean distance) under the general assumption that thelocation of the user is embedded in the readings of RSS. In asimple approach, the points whose RSS within the RM are themost ”similar” ones to the user’s RSS values are utilized toestimate the user location (e.g., using the k nearest neighbor( k NN) algorithm). The main bottleneck which constrains thewidespread commercial application of FWIPS is the heavyworkload to build the RM for a large area (e.g., an entireairport or a big mall) and to keep the RM up-to-date [10].Apart from the separate and time-consuming manual col-lection of RSS values at known positions two other methodsfor obtaining the RM are available: unsupervised fingerprintingand partial fingerprinting [11]. For the unsupervised finger-printing the RM is created by employing an indoor propagationmodel of radio waves to predict the RSS values within the RoI.This requires accurate information about the structure (e.g.,floors, walls, windows and doors) of the building, the materials(e.g., concrete, wood, metal and glass) used for the respectivestructures as well as the position and configuration of the APs978-1-5090-2879-5/16/$31.00 ©2016 IEEE 207 a r X i v : . [ s t a t . M L ] M a r e.g., power, gain of the antenna and protocols). This approachthus requires a labor intensive site survey or detailed buildingplans and assumptions, and it yields bad performance in caseof invalid assumptions or changes of any of the parameters.Partial fingerprinting utilizes crowdsourcing to improvethe efficiency of RM construction and update [10], [12],[13]. Depending on the degree of user participation thereare three types: explicit crowdsourcing-based RSS collection,implicit crowdsourcing-based RSS collection, and partially-labeled fingerprinting. The first two require users to report theirlocations by marking them manually on a digital map whenevera vector of sampled RSS values is stored or uploaded for RMgeneration. The crowdsourced RSS readings are used directlywith the first method while they are filtered or combined withother resources according to the second method. The thirdmethod involves less active participation of the users who onlyneed to agree that RSS values and location information areshared by their mobile device but do not need to manuallyidentify their location on a map. For example [13] proposedan approach reporting the RSS values of APs along with thesampling position estimated automatically from the data of thebuilt-in inertial sensors of the respective mobile device. Thesecrowdsourcing based approaches are less labor intensive (forthe provider of the positioning service) but their performance islimited by uncertainties introduced through (i) the applicationof different devices, (ii) manual position indication by theusers, or (iii) location estimation with the built-in inertialsensors.Another popular approach is constructing the radio mapin a fast way by densifying or adapting a sparse radio mapcomprising only few originally measured RPs and their asso-ciated RSS values. In [14] and [15] the authors investigated in-frastructure based approacheswhich require deploying specifichardware for RSS monitoring. The RM is then constructed orupdated to match the RSS values observed at stationary mon-itoring points. The requirement of extra installations for RSSmonitoring is in contrast to the potential advantage of FWIPSthat the existing WLAN/WiFi infrastructure can be used with-out need for additional deployment. Non-infrastructure basedmethods may be used to infer the updated radio map viatransfer learning algorithms (e.g., compressive sensing, l -minimization, manifold alignment) using the sparse radio mapor crowd sourced RSS readings and the assumption that nearbypositions have more similar RSS readings than those far away[16], [17]. Both approaches have been investigated in theliterature. The focus, however, was only on building the RMduring the offline stage; they were rarely applied to the onlinestage for location estimation at the same time [10].In this paper, the authors propose a scheme to employbackpropagation neural networks (BPNNs) to learn the map-ping relationship between RP coordinates and RSS vectorsfor both stages of FWIPS. BPNNs, widely used in machinelearning, were so far applied to indoor location estimation inoptical, RFID, WLAN/WiFi and dead reckoning based IPSs[18]–[22]. In this paper, BPNN is not only applied to indoorlocalization (BPNN based localization, BPNN-LA), but also tofast radio map construction starting from a sparse training radiomap (TRM) (BPNN based radio map construction, BPNN-RM). Employing BPNN for both stages of FWIPS, especiallyfor the RM construction with low workload, has hardly been investigated in the literature so far. We investigate hereinthe performance of the proposed scenario compared to twopopular fingerprinting localization algorithms (FLAs), k NNand weighted k NN (W k NN). Additionally, we analyze theimpact of various choices of BPNN design parameters vianumerical simulations and derive proposals regarding thesechoices.The structure of the remaining paper is as follows: theprinciples of an FWIPS are described in Section II. In SectionIII definitions of a BPNN as well as the proposed scenario ofemploying BPNN to FWIPS are illustrated. An experimentalanalysis of the performance using the proposed scheme ispresented in Section IV.II. P
RINCIPLES OF
FWIPSIn this section, the authors give more details on thedefinitions of an FWIPS, including the deployment of RPs,RSS collection and performance evaluation. A typical FWIPSconsists of two stages: offline and online as shown in Fig. 1.During the offline stage, the data required to construct theRM are collected within the RoI covered by WLAN/WiFisignals. The radio map is then employed together with RSSmeasurements recorded by the user device to estimate theuser’s location via FLAs within the online stage.
A. Offline Stage
If the RoI is covered by a sufficient number of APsdistributed spatially such that several of them are availableif the user device occupies any position within the RoI, nomodifications are necessary. Should there be too few APsfor positioning, additional APs have to be installed as radiosources for the WIPS. In this paper we assume that the signalsof N APs can be received within the RoI. For the sake ofsimplicity we assume herein the RoI is rectangular and theAPs are regularly distributed across the RoI as visualized inthe schematic map given in Fig. 2.
Fig. 1. Overview of an FWIPS M known locations in the area are selected as RPs.We collect their coordinates in the M × D matrix R =[ r (1) , r (2) , · · · , r ( M ) ] where the i -th column r ( i ) ∈ R D is thecolumn-vector of coordinates of the i -th RP (e.g., in a 2Dscenario r ( i ) = [ x ( i ) ; y ( i ) ] ). In this paper, we use the grid size G = ∆ x × ∆ y (see Fig. 2) of the rectangular arrangementsof RPs as a measure of the amount of reference data to beprovided during the offline stage and thus as a measure ofworkload and cost. The smaller the grid size, the higher the208 ig. 2. Schematic arrangement of points involved in the deployment,validation and use of an FWIPS workload to construct the RM, but the better the anticipatedpositioning accuracy.At all the RPs the RSS are sampled and associatedwith the respective APs using the data extracted from thebeacon frames. The results are stored in the matrix S =[ s (1) , s (2) , · · · , s ( M ) ] , where each of the M columns containsthe recorded RSS values of the N APs, i.e., the fingerprint s ( i ) = [ RSS ( i, ; RSS ( i, ; · · · ; RSS ( i,N ) ] ∈ R N and eachcolumn is associated with one RP. If the coordinates of the RPsare not known and these points are not (yet) marked visiblyin the physical space, the coordinates need to be determinedalong with the recording of the RSS measurements. This canbe achieved by employing a suitable positioning technology(e.g., a multi-sensor system involving inertial sensors, or a totalstation). The coordinates are then again assumed as known.The sampling results and the known RP coordinates can thenbe combined to represent the original radio map (ORM) withthe defined grid size. B. Online Stage
During the online stage, the N -dimensional RSS vector s ( t ) ∈ R N is measured at an unknown location l ( t ) ∈ R D by a user who requests the positioning service. The aim is tocalculate l ( t ) from s ( t ) and the RM using an FLA. More detailson k NN and W k NN, two selected FLAs for performanceanalysis, are presented in the following subsections. Herein wewill carry out a performance analysis by actually measuring s ( t ) at known or independently measured locations such thatthe error of the positions l ( t ) derived from s ( t ) can be assessed.We assume that such measurements are actually carried out at T locations. We collect the measured RSS in a matrix S ( t ) and the corresponding positions in the matrix R ( t ) . These twomatrices represent the validation dataset (VDS). k NN: k NN is a method which is widely used in the fieldof machine learning for classification and clustering. With aselected number k of nearest neighbors k NN works in twosteps: • Step 1: find the k nearest neighbors in the RSS spacevia computing the Euclidean distance between s ( t ) andthe RSS vectors within the RM. From the view of mathematics the subset S k NN ⊂ S of nearest neighborsis calculated with the condition: S k NN ⊂ S : card( S k NN ) = k, (cid:107) s ( i ) − s ( t ) (cid:107) (cid:54) (cid:107) s ( l ) − s ( t ) (cid:107) ∀ s ( i ) ∈ S k NN , s ( l ) ∈ S \ S k NN (1)where card( · ) indicates the number of elements of theset. The corresponding k RP locations are collected inthe matrix R k NN = [ r k NN(1) , · · · , r k NN( k ) ] . • Step 2: estimate the user location ˆ l ( t ) as the averageof these locations: ˆ l ( t ) := 1 k k (cid:88) i =1 r k NN( i ) (2)To evaluate the performance of positioning, the error radius e , shown in Fig. 2, is defined as the Euclidean distance betweenthe estimated location and the ground truth location of the user: e ( t ) := (cid:107) l ( t ) − ˆ l ( t ) (cid:107) , t = 1 , , · · · , T (3)For a statistical analysis we will later also use the meanand standard deviation of the error radii i.e., e and σ e derivedfrom all T testing points in the VDS: e := T (cid:88) t =1 e ( t ) /Tσ e := ( T (cid:88) t =1 ( e ( t ) − e ) / ( T − / (4)
2) W k NN: W k NN differs from k NN only with respect to(2). Instead of the arithmetic mean W k NN uses a weightedmean with the respective inverse of the Euclidean distance inthe signal space as weight: ˆ l ( t ) := (cid:32) k (cid:88) i =1 w ( i ) r k NN( i ) (cid:33) / (cid:32) k (cid:88) i =1 w ( i ) (cid:33) (5)where w ( i ) := 1 / (cid:107) s k NN( i ) − s ( t ) (cid:107) (6)To determine an appropriate number k we employ themethod typically used in the field of machine learning asgiven in [23]. Correspondingly, the upper bound of k is (cid:98)√ M (cid:99) (where (cid:98)·(cid:99) returns the maximum integer less or equal to · ). Theconcrete choice of k will be discussed later in Section IV-B.III. BPNN AND THEIR APPLICATION TO
FWIPSAn artificial neural network (ANN) mimics the learningprocess of the neurons of human beings. Technically it trans-fers input data to output data via interconnected neurons. Thekey aspects of ANN design and operation are (i) the structurein terms of the nodes, layers and activation functions, and (ii)the learning algorithm. We first present these two conceptsherein. Then we discuss the particular training of the ANNcausing it to be a BPNN. Finally, we present a general scenariofor applying BPNNs to WIPS including RSS sampling, BPNN-LA and BPNN-RM. 209 a) A node of an ANN(b) The basic structure of an ANNFig. 3. Schematic view of a single node and the layered structure of a BPNN
A. Design elements of an ANN1) Nodes of the ANN:
A node is the elementary unit of anANN. The node works as shown in Fig. 3(a). The node takesa column vector x (from the input data or the preceding layer)as the input, multiplies with the vector of weights ω , and addsa scalar bias b . The result of this operations is used as theargument of a so-called activation function f . The evaluationof f , i.e., y = f ( ω T x + b ) , is the output of the node andrepresents—together with the outputs of the other nodes of thesame layer—the input of the subsequent layer or the output ofthe ANN. The properties of each node are determined by theactivation function, weights and bias.
2) Layers of the ANN:
The nodes are arranged into threetypes of layers: input layer, hidden layers and output layer. Theinput layer which has Γ in nodes, transforms the general input x ∈ R d into the space R Γ in . This dimension transformationdepends on the specific applications and the design of theANN. There is no dimension transformation in the input layerin this paper. With a given activation function (e.g., sigmoidor linear) of the input layer, the input domain of f is oftenlimited, so that we apply a normalizer to transform any rangeof the input vector componentsto the domain of f . With theactivation functions chosen herein this domain is the interval [0 , such that the normalizer maps each component of x intothat interval via an affine transformation consisting of a scaling S in ∈ R d × d and translation h in ∈ R d . The elements of thediagonal matrix S in and of h in are determined by the range ofinput data. The input x in to the first hidden layer of the ANNis thus calculated by: x in = S in · x + h in (7)All nodes of a specific layer share the same activationfunction but have different weights and biases. Denoting theweights and biases of the input layer as Ω in = [ ω , · · · , ω in ] and b in = [ b , · · · , b in ] , respectively, the output, f in ( Ω Tin x + b in ) ∈ R Γ in of the input layer is the input to each node of thefirst hidden layer.The required number of hidden layers depends on theapplication, especially on the non-linearity of the relationbetween input and output [24]. Generally, there are Λ hiddenlayers and a different number of nodes in each hidden layer.From the training and convergence perspective, Λ should notbe too big, especially in the application to FWIPS [24]. Usuallythere is just one hidden layer [25]. We will later analyzethe performance with up to 3 hidden layers whose respectivenumber of nodes is up to 30 for each of them.As for the output layer, the number of nodes equals thedimension of the output. Except the input layer, there are Λ+1 layers in total. Here we denote the activation function, theweights and the biases of the m th layer ( m = 1 , · · · , Λ , Λ+1 )as f m , Ω m = [ ω m , · · · , ω m Γ m ] , b m = [ b m , · · · , b m Γ m ] , re-spectively. In the cases of positioning and of radio mapconstruction, Γ Λ+1 equals the dimension of the coordinatesand the number of available APs, respectively i.e., D and N in this paper. The basic structure of the ANN is presented inFig. 3(b). The design parameters that influence the performanceof the ANN are the type of activation function, the number ofhidden layers ( Λ ), the numbers of nodes in the hidden layers( Γ ), the weights ( Ω ) and the biases ( B ) for the nodes. In thispaper, we use the sigmoid function and a linear function as theactivation functions for the hidden layers and the output layer,respectively. Formally, the output y out as shown in Fig. 3(b)is: y out = f Λ+1 ( Ω TΛ+1 f Λ ( Ω TΛ f Λ − ( Ω TΛ − · · · f ( Ω T1 f in ( Ω T in x + b in ) + b ) + · · · + b Λ − ) + b Λ ) + b Λ+1 ) (8)
3) Training of the ANN:
The purpose of the trainingis to determine the weights and biases such that the error δ y = y target − y out is minimized using the training dataset { x training , y target } while the activation functions, numberof hidden layers and numbers of nodes within each layerare fixed. Backward error propagation [26] is an establishedapproach to efficiently carry out this optimization. As for theimplementation of training the ANN, a given radio map willbe divided arbitrarily into three datasets: training, validationand testing dataset. The training dataset is used to update theweights and biases, the validation set is employed to checkthe mean square error (MSE) of the output with the updatedweights and biases, and the testing data set is used for qualitycontrol after completion of the training.The training process stops when certain conditions arefulfilled. In the ANN implementation used herein three con-ditions are checked as shown in Table I, and training stopsif any of them is fulfilled: (i) the MSE calculated from thevalidation dataset is no more than the maximum admissibleerror; (ii) the number of training epochs reached the maximumadmissible number of epochs; (iii) the MSE calculated from thevalidation dataset increases continuously over more than themaximum admissible number of epochs with failed validation.The weights and biases as of the stopping epoch are selected Here we use the term ’epoch’ instead of ’iteration’ to indicate the trainingsteps because each step typically includes a batch of training points, and eachtraining point requires one iteration.
TABLE I. T
RAINING CONDITIONS
BPNN-LA BPNN-RMMax. * epochs 1000 1000Max. error 0.25 m dB Max. * B. Chain rule for gradient descent optimization of BPNN
To compute the optimal weights and biases of the nodes,gradient descent is applied for minimizing the squared trainingerror ˆ F ( y ) = δ y T δ y . This optimization is carried out itera-tively. The weights and biases of epoch t + 1 depend on thoseof the previous epoch and on the gradient descent: Ω i,jm ( t + 1) = Ω i,jm ( t ) − η · g m ( Ω i,jm ) , Ω m ∈ R Γ m − × Γ m b jm ( t + 1) = b jm ( t ) − η · g m ( b jm ) , b m ∈ R Γ m (9)where j is the index of the node, i the index of the weightper node, η the learning rate, Γ := Γ in and g m ( · ) are thegradients of ˆ F ( y ) in the Ω -space or b -space of the m th layer, respectively. Assuming that the inputs to the activationfunctions of the m th layer and their outputs are y min and y mout respectively, we have: y min = Ω m T y m − out + b m y mout = f m ( y min ) (10)Therefore, according to the chain rule the gradients w.r.t. Ω i,jm and b km are: g m ( Ω i,jm ) = ( ∂ ˆ F ( y ) /∂ y min ) · ( ∂ y min /∂ Ω i,jm ) g m ( b km ) = ( ∂ ˆ F ( y ) /∂ y min ) · ( ∂ y min /∂ b km ) (11)The first term on the right side of (11) is redefined as s m := ∂ ˆ F ( y ) /∂ y min . The weights and biases of the m th layer are thenupdated according to: Ω m ( t + 1) = Ω m ( t ) − η · ( s m · ( y m − ) T ) T b m ( t + 1) = b m ( t ) − η · s m (12)According to the chain rule s m can be calculated from thecorresponding vector s m +1 of the subsequent layer: s m = ( ∂ ( y m +1 in ) /∂ ( y min ) T ) · ( ∂ ˆ F ( y ) /∂ ( y m +1 in ) T )= ˙ (cid:122) ( y min )( Ω m +1 ) T s m +1 (13) where ˙ (cid:122) ( y min ) is a diagonal matrix of derivatives of theactivation functions with respect to y min . ˙ (cid:122) ( y min ) = ˙ f m ( y inm (1)) · · · ... ... ... · · · ˙ f m ( y min (Γ m )) (14)In this way, gradient descent learning works by backpropaga-tion: s Λ+1 → s Λ → · · · → s . To sum up, each epoch ofBPNN training works via forward and backward propagationaccording to: y out := xy m +1 out = f m +1 ( Ω T m +1 y mout + b m ) , m = 0 , , · · · Λ y out = y Λ+1 out s Λ+1 = − (cid:122) ( y Λ+1 in ) δ ys n = ˙ (cid:122) ( y nin )( Ω n +1 ) T s n +1 , n = Λ , Λ − , · · · , (15) C. BPNN based radio map construction & localization
On the basis of BPNN we propose an algorithm for radiomap construction and indoor localization. The systematic viewof the proposed approach is presented in Fig. 4.
1) ORM generation module:
At given RPs a surveyor usesthe sampling device (e.g. a mobile phone) to collect the RSSfrom all available APs within the RoI. In this process, thegrid size is relatively large to keep the workload low. Thecoordinates of the RPs and the corresponding RSS vectorsare stored in a table which represents the ORM. In orderto mitigate the measurement noise, the measurements canbe filtered before storing the discrete representation of thespatially continuous signal strength fields as ORM. In thelater experiments we will only reduce the impact of noise byaveraging multiple RSS measurements taken at each RP withina short time interval.
2) BPNN-RM training & generalization module:
Thismodule consists of two parts: training of BPNN and RMgeneration using the trained BPNN. The module is evoked if (i)a denser discrete representation of the RM is required than theone available as ORM or (ii) if a continuous representation ofthe RM is required such that the (expected) signal strength ofany AP and—if need be—also of the corresponding spatialderivatives can be calculated for any location within theRoI. We denote such an RM, derived using the BPNN, asreconstructed radio map (RRM) subsequently. The coordinatesof the M RPs stored as part of the ORM are normalized andthen used as the input data for the estimation of optimumweights and biases according to the algorithm described inthe previous section. The signal strengthens corresponding tothe above inputs in the ORM are the training targets for thisBPNN. The BPNN-RM is trained according to the processpresented in Section III-A3.The RRM generation is the process of generalizing thetrained BPNN. Given a specific desired grid size of the RRMa set of corresponding coordinates within the RoI is generatedand normalized. The normalized coordinates are the input tothe trained BPNN-RM. Combining the output vectors with theabove input coordinates yields the RRM. 211 ig. 4. Systematic view of the proposed algorithms
3) BPNN-LA training & generalization module:
This mod-ule includes two parts: training and generalization (i.e. apply-ing the trained BPNN to localization). During the training, thenormalizer transforms the RSS to [0 , since the activationfunctions of the hidden layers are sigmoid in this paper. Thenormalized RSS vectors and the corresponding RP locationsare the training input and training target respectively. Theconstraints shown in Table I are used as criteria during thetraining. With a given training dataset, the training of BPNN-LA follows the procedures in Section III-A3. The trainedBPNN ( Λ , Γ , Ω , B ) is saved for the generalization within theonline stage.IV. E XPERIMENTAL P ERFORMANCE A NALYSIS
A. Testbed
In this section, we test our proposed approach using realmeasurements from the th floor of a building at HarbinInstitute of Technology, depicted in Fig. 5 . There are 8 APsin the experimental area, which are attached stably to thewall at a height of 2 m from the floor. RSS were recordedat points arranged in a regular grid of . × . yielding anORM with a grid size of 0.25 m . It is subsequently annotatedas ORM . . Some of these points were later used as RPsfor positioning or RM generation, others for testing only.For the former purpose a training radio map (TRM) withlarger grid size was then obtained by down-sampling from the ORM . . The sampling and preprocessing of the RSS valuesare described in [27]. B. BPNN based indoor localization
In this section an experimental analysis of the quality oflocalization using BPNN and of the related design parametersis presented. All the following simulations are carried out usingMATLAB R2015a on Euler, a high performance computingcluster of ETH. First, an example is given to show how to The dataset was created while the first author was with the CommunicationResearch Center, Harbin Institute of Technology, Harbin, P.R. China as masterstudent. determine the locally optimal number of layers and nodeswith a given TRM (grid size). Then a table shows all thelocally optimal parameters w.r.t. the mean error radius forvarious TRM grid sizes as well as for different numbersof hidden layers. A detailed analysis of parameter selection,computational complexity and cumulative positioning erroris presented afterwards. Furthermore, since the weights andbiases of the BPNN are initialized randomly, the simulationsare carried out 100 times using the same design parameters inorder to also take the influence of the random initialization intoaccount. For this purpose we collect the mean and standarddeviation resulting from each of the 100 simulations in thevectors e = [ e , e , · · · , e ] and σ e = [ σ e , σ e , · · · , σ e ] .We define the uncertainty due to random initialization as thestandard deviation of e and σ e : σ e := ( (cid:88) i =1 ( e ( i ) − e ) / / σ σ e := ( (cid:88) i =1 ( σ e ( i ) − σ e ) / / (16)where · returns the mean value of · . As for the number k ofnearest neighbors for k NN and W k NN, according to the rulecited in Section II-B we select it according to the number ofRPs in the TRM. In this paper, there are 139 RPs within theORM. Therefore, the maximal value of k should be 11 withthe grid size of 0.25 m and 3 in the case of a grid size of9 m .
1) Locally optimal parameters of
TRM . : To presenthow we determine the locally optimal parameters Λ and Γ wetake TRM . (i.e., ORM . ) as an example. With a givenTRM, a BPNN with the specific number of hidden layers andneurons is employed to learn the mapping between the RSSvectors and the corresponding RP locations. The trained BPNNis then generalized to the VDS for performance evaluation.The parameters which achieve the minimal mean error radius(MER) are treated as the locally optimal ones according toSection III-C. 212 A P W A P WAP7
WAP8W A P W A P W A P W A P A pp r o . (cid:20)(cid:25) m A pp r o . m R P s : T P s : Fig. 5. Indication of Testbed (a) Mean error of BPNN-LA with 1 hidden layer (HL1)(b) Ratio of MER of BPNN-LA with 2 hiddenlayers (HL2)Fig. 6. Error of BPNN-LA with TRM . In Fig. 6(a) we show the ratio of MER between BPNN-LAand k NN as well as W k NN for different numbers of neurons.Since k NN and W k NN do not have neurons, the variation ofratio in the figure is exclusively due to the variation of qualityof the BPNN solution. A ratio < k NN or W k NN one. Asshown in Fig. 6(a), the MER decreases fast as the number ofneurons increases from 1 to 11 for a BPNN with 1 hidden layer.It hardly changes any more if the number of neurons is furtherincreased. The locally optimal number of neurons in this caseof 1 hidden layer (HL1) i.e., the one yielding the minimal MERis 21 in this example. However, taking the uncertainty of the MER into account, the plot shows that the MER is stable for Γ ≥ . Also, for Γ ≥ , the MER of BPNN-LA is slightlysmaller than the one obtained using k NN and W k NN. Thestandard deviation of the error radius is almost independent ofthe number of neurons if Γ ≥ .With the same process, we can obtain the locally optimalparameters for multiple hidden layers (MHLs). In Fig. 6(b) wepresent the ratio of MER between BPNN-LA with 2 hiddenlayers (HL2) and 1 hidden layer layer (HL1) depending on thenumber of neurons of the layers (white color indicates that thevalue is larger than 1.5). It is shown that the MER is largerfor HL2 than for HL1 for most combinations of numbers ofneurons. Furthermore, we see that the MER of HL2 is almostindependent of Γ , the number of neurons in the second hiddenlayer. Similar results we also found for even higher numbersof hidden layers and for other grid sizes. This indicates that aBPNN with 1 hidden layer is preferable for the application ofBPNN to the online stage of FWIPS. Fig. 7. Comparison of MER with locally optimal number of nodes of BPNN-LA for different TRM grid sizes Fig. 8. Cumulative probability of positioning error for BPNN-LA
2) Locally optimal parameters of BPNN-LA for all TRMs:
We report the locally optimal parameters of BPNN-LA withseveral different TRMs and 3 different numbers of hiddenlayers in Table II. For this analysis, we extracted 13 differentTRMs from
ORM . with grid size varying from 0.25 m to9 m . In the table, we find several patterns: (i) with a givenBPNN-LA, for example HL1, the locally optimal number ofneurons is proportional to the number of points in the TRM:the larger the number of RPs in the TRM the bigger the locallyoptimal number of neurons. This pattern is also shared by HL2and HL3, especially the locally optimal number of neuronsof the last hidden layer. One explanation for this pattern isthat smaller grid size of the TRM preserves more informationof the nonlinearity of the underlying RM, which is a key213 ABLE II. L
OCALLY OPTIMAL PARAMETERS OF
BPNN-LA
FOR SEVERAL
TRM S (GS: GRID SIZE ; ER:
ERROR RADIUS ) GS ER of k NN ( m ) ER of W k NN ( m ) ER of HL1 ER of HL2 ER of HL3Mean Std Mean Std Γ Mean Std Γ , Γ Mean Std Γ , Γ , Γ Mean Std0.25 2.58 1.49 2.59 1.49 21 2.53 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± factor to determining the required number of neurons [24]. (ii)With a specific TRM, HL1 achieves smaller MER as well asstandard deviation than HL2 and HL3. This pattern is causedby increasing influence of the random initialization of weightsand biases with increasing number of hidden layers [24].
3) Influence of Λ : Comparing BPNN-LA with 3 differenthidden layers using the locally optimal number of neuronsto k NN and W k NN in terms of the mean error radius,as shown in Fig. 7, we can conclude: (i) HL1 outperformsHL2 and HL3 for almost all the analyzed grid sizes of theTRM. This tendency is consistent with the results reportedin [25]. (ii) k NN achieves slightly smaller MER than W k NNin this example. (iii) Comparing HL1 to W k NN, HL1 yieldscomparable performance with all grid sizes but in the caseof particularly large grid size (larger than 6.25 m ) HL1 hasslightly better performance than W k NN.
4) Cumulative positioning accuracy of BPNN-LA:
InFig. 8, we present the positioning accuracy of BPNN-LA with1 hidden layer and the respective locally optimal number ofneurons for all tested TRMs as well as that of k NN and W k NNas empirical distribution functions. With
TRM . , BPNN-LAoutperforms the other solutions and is even better than k NNand W k NN with the same TRM. For BPNN-LA HL1 about68% of the errors are below 2.5 m . Over 99% of the estimatedlocations are within an error radius of 8 m which is accurateenough for room level positioning and ILBSs.
5) Computational complexity of BPNN-LA:
Using the di-mension of the RSS vectors ( N ), the number of RPs ( M ),hidden layers ( Λ ) and neurons for each hidden layer ( Γ = { Γ in , Γ , · · · , Γ Λ , Γ Λ+1 } ), we can assess the computationalcomplexity for the location estimation per request (i.e., oneposition required) during the online stage. For k NN the com-putational complexity is O ( kN M ) . For the generalization ofBPNN-LA the computational complexity is O (max { Γ , N } Λ) with the assumption that the evaluation of the activation functions is negligible. These two computational complexitiesare comparable. The latter is smaller in the case of largenumber of RPs (i.e., M (cid:29) N ). Therefore, BPNN-LA alsogains online computational efficiency in this case. C. BPNN based radio map construction
Now we assume that the BPNN is used to construct areconstructed radio map RRM of grid size G R starting froma given radio map TRM of grid size G S with the purposeof using the RRM for subsequent position estimation withinan FWIPS. The underlying idea is that the TRM could resultdirectly from sampling the RSS at a certain (moderate) numberof RPs and could be converted into a denser radio map RRM(i.e., G R < G S ) which ideally yields higher accuracy ofthe estimated positions than the TRM. Higher accuracy couldpotentially even be obtained if G R ≥ G S . When using BPNN-RM for this radio map reconstruction the accuracy of thepositions finally obtained depends on the FLA, the quality ofthe measurements, on G R , G S , the number Λ of hidden layers,and the numbers Γ , Γ , . . . , Γ Λ of nodes within the hiddenlayers. In this section we investigate this relationship for k NNand W k NN analyzing whether BPNN-RM can be used toincrease the quality of the position estimation, in particularfor the densification case which would be attractive becauseit would help to reduce the workload associated with radiomap generation. Of course it is possible to also use BPNNinstead of k NN or W k NN for location estimation, as discussedin the previous section. However, we will not focus on thisimplementation herein.We first analyze the situation for a small subset of freeparameters and then generalize by calculating and discussingthe locally optimal parameters for a variety of cases.
1) Locally optimal parameters with
TRM . : The deter-mination of the locally optimal parameters of BPNN-RM aredetermined using the same approach as for BPNN-LA above:214
ABLE III. L
OCALLY OPTIMAL PARAMETERS OF
BPNN-RM
FOR SELECTED COMBINATIONS OF
TRM ( I . E . G S ) AND
RRM( I . E . G S ) G S G R ER of k NN ER of k NN with RRM (1 layer) ER of k NN with RRM (2 layers) ER of k NN with RRM (3 layers)Mean Std Γ Mean Std Γ , Γ Mean Std Γ , Γ , Γ Mean Std0.25 0.25 2.58 1.49 18 3.19 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Fig. 9. Mean error of k NN using BPNN-RM with
TRM . (1 hidden layer) for a given pair, G S and G R , of grid sizes, a BPNN is trainedusing Λ hidden layers, and Γ , Γ , · · · Γ Λ nodes; then theVDS (testing points l ( t ) and the corresponding RSS vectors s ( t ) ) are used to calculate the positioning error at each l ( t ) .Repeating this for a variety of parameters the ones yieldingthe minimum MER are determined as locally optimal ones. Wefirst present an example with TRM . (i.e., G S =7.5 m ), with13 different grid sizes G R , using BPNN-RM with 1 hiddenlayer whose number of neurons varies from 1 to 30. The resultsare visualized in Fig. 9.The two plots in Fig. 9 depict the MER (left) and thestandard deviation (right) of the error radius obtained using k NN , in terms of different number of neurons as well asdifferent reconstruction grid sizes. They show that the optimalnumber of neurons to achieve the minimum MER is mostlyconsistent with the number yielding minimum standard devi-ation. According to the MER in the figure the locally optimalnumber of neurons for RRM . is 2 in the case of densificationfrom TRM . using a BPNN-RM with 1 hidden layer.
2) Locally optimal parameters of BPNN-RM for a varietyof grid sizes:
Using the extensive numerical simulations, asbefore, we have determined the locally optimal number ofneurons for RRM generation as judged by the MER afterpositioning with k NN using the RRM. The results are given inTable III for selected pairs of grid sizes G S and G R of the radiomaps and for the selected numbers of hidden layers. In the casethat TRM and RRM have the same grid size (i.e., G S = G R ), The results obtained using W k NN instead of k NN are virtually identicaland not shown therefore.
BPNN-RM with 1 hidden layer outperforms BPNN-RM with 2or 3 hidden layers in terms of the MER for the grid sizes from2 m to 9 m . The location accuracy obtained using the RRM isalso comparable to the one obtained using the TRM directly. Ina few cases, the results are slightly better with higher numberof hidden layers. However, figuring in the uncertainty of theempirical results the benefit is not significant. So, we concludethat 1 hidden layer with an optimized number of nodes issufficient. Fig. 10. Comparison of between ORM and varies TRM w.r.t. MER using k NN
3) Comparison of the performance related to TRM andRRM:
With the proposed BPNN-RM we expect to reduce theworkload of the radio map construction while maintaining thepositioning accuracy. Therefore, we present a comparison ofthe MER for several different grid sizes of TRM and RRMusing k NN in Fig. 10. We can draw several conclusions fromthe figure: (i) k NN with RRM grid sizes G R from 0.5 m to 5 m , trained from a TRM with grid size G S = 1 m achievescomparable performance to k NN with an ORM of 0.25 m gridsize. The maximal grid size G S of the TRM with which weobtained comparable MER as with an ORM of 0.25 m gridsize is 2 m . This means that only 1/8 of the workload for radiomap generation is required when reconstructing the radio mapfor k NN using BPNN-RM instead of using the ORM directlyfor k NN. (ii) Comparing the BPNN-RM results to
TRM , TRM and TRM , the reduction of MER is up to 10%, 20%and 40%, respectively. BPNN-RM with W k NN leads to similar215onclusions. (a) Cumulative error probability for k NN with RRM (b) Cumulative error probability for W k NN with RRMFig. 11. Cumulative probability of k NN and W k NN with RRM
4) Cumulative error probability of k NN and W k NN usingRRM:
In this section, we compare the cumulative error prob-ability of the positions estimated using k NN and W k NN for 4different TRMs and 6 RRMs which are reconstructed using thetrained BPNN-RM from
TRM . As shown in Fig. 11, we findthat: (i) From 75% to over 85% of the errors are within 4 m .(ii) The positioning accuracy of both k NN and W k NN usingthe selected RRMs is higher than using the respective TRMwith equal grid size when considering errors larger than 2.5 m .(iii) The RRMs yielding the best performance with k NN andW k NN are
RRM and RRM . ; 92% and 95% of the errorsare smaller than 4.5 m when using them. This positioningaccuracy is better than the one obtained using directly the TRM . i.e., a much denser radio map associated with muchhigher workload for construction. (iv) BPNN-RM reduces theworkload for creating the radio map by almost 90% whencollecting only the data required for TRM instead of TRM . and still obtaining better results by converting the TRM intoa RRM with a grid size of e.g., 2.25 m . This improvementresults from the capability of the BPNN to filter the noise inthe measured signal strengths used for RM generation.V. C ONCLUSION
The authors propose a scenario to apply BPNN to bothstages of FWIPS: BPNN-LA for localization in the online stageand BPNN-RM for radio map reconstruction in the offlinestage. BPNN-LA with 1 hidden layer (HL1) outperforms k NN,W k NN and BPNN with multiple hidden layers in terms of themean error radius. 90% of the positioning errors are within4 m using HL1 trained by the 0.25 m grid size radio map.As for BPNN-LA with multiple hidden layers (2 and 3 hiddenlayers analyzed herein), they yielded higher mean error radius than HL1. A trained BPNN-LA with one hidden layer iscomputationally more efficient during the online stage than k NN and W k NN, especially in case of a large number ofreference points in the radio map.We have tested the benefit of BPNN-RM for converting anoriginally sampled radio map into a reconstructed radio mapof possibly different grid size. In particular, the positioningerrors after application of both k NN and W k NN have beenanalyzed. The reduction of the mean error radius attributed toRM reconstruction was found to be up to 40%. As for thereduction of the workload required to build the RM, BPNN-RM reduces it by almost 90% since it allows using
TRM instead of TRM . while still obtaining equal or even slightlybetter performance.We expect that the results can be generalized to otherfingerprinting based IPSs (e.g., IPSs based on Bluetooth,magnetic field) and WIPSs which are deployed in the heteroge-neous RoI (e.g., the airports and big malls). We will investigatethis further by exploring BPNN-LA/RM deep learning andassessing the performance for more general real world settingswhere RPs are not arranged in a regular grid and the RoIis not dominated by free space such that the RSS-fields aremore complex than in our examples. We expect BPNN to beeven more beneficial in such cases while likely requiring moreneurons in the hidden layers than in the cases presented herein.A CKNOWLEDGMENT
The authors thank Konrad Schindler for granting accessto a high performance computing cluster at ETH for thepurpose of the extensive numeric simulations used herein. Thedoctoral research of the first author is financed by the ChineseScholarship Council (CSC).R
EFERENCES[1] G. D. Abowd, “What next, ubicomp?: Celebrating an intellectualdisappearing act,” in
Proceedings of the 2012 ACM Conference onUbiquitous Computing
Proceedings IEEE INFOCOM 2000.Conference on Computer Communications. Nineteenth Annual JointConference of the IEEE Computer and Communications Societies (Cat.No.00CH37064) , vol. 2, no. c, pp. 775–784, 2000. [Online]. Available:http://research.microsoft.com/en-us/groups/sn-res/infocom2000.pdf[4] E. Lohan, K. Koski, J. Talvitie, and L. Ukkonen, “Wlan and rfidpropagation channels for hybrid indoor positioning,” in
Localizationand GNSS (ICL-GNSS), 2014 International Conference on , June 2014,pp. 1–6.[5] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoorpositioning techniques and systems,”
Systems, Man, and Cybernetics,Part C: Applications and Reviews, IEEE Transactions on , vol. 37, no. 6,pp. 1067–1080, 2007.[6] B. Li, T. Gallagher, A. G. Dempster, and C. Rizos, “How feasible isthe use of magnetic field alone for indoor positioning?” in
Indoor Po-sitioning and Indoor Navigation (IPIN), 2012 International Conferenceon , Nov 2012, pp. 1–9.
7] T. Gigl, G. J. Janssen, V. Dizdarevi´c, K. Witrisal, and Z. Irahhauten,“Analysis of a uwb indoor positioning system based on receivedsignal strength,” in
Positioning, Navigation and Communication, 2007.WPNC’07. 4th Workshop on . IEEE, 2007, pp. 97–101.[8] M. Hazas and A. Hopper, “Broadband ultrasonic location systems forimproved indoor positioning,”
Mobile Computing, IEEE Transactionson , vol. 5, no. 5, pp. 536–547, 2006.[9] A. Mandal, C. V. Lopes, T. Givargis, A. Haghighat, R. Jurdak, andP. Baldi, “Beep: 3d indoor positioning using audible sound,” in
Con-sumer communications and networking conference, 2005. CCNC. 2005Second IEEE . IEEE, 2005, pp. 348–353.[10] S. He and S. H. G. Chan, “Wi-Fi fingerprint-based indoor positioning:Recent advances and comparisons,”
IEEE Communications Surveys andTutorials , vol. 18, no. 1, pp. 466–490, 2016.[11] K. Majeed, S. Sorour, T. Al-Naffouri, and S. Valaee, “Indoor localiza-tion and radio map estimation using unsupervised manifold alignmentwith geometry perturbation,”
IEEE Transactions on Mobile Computing ,vol. PP, no. 99, pp. 1–1, 2015.[12] J.-g. Park, B. Charrow, D. Curtis, J. Battat, E. Minkov, J. Hicks,S. Teller, and J. Ledlie, “Growing an organic indoor location system,”
Proceedings of the 8th international conference on Mobile systemsapplications and services MobiSys 10 , no. June, p. 271, 2010. [Online].Available: http://portal.acm.org/citation.cfm?doid=1814433.1814461[13] C. Wu, Z. Yang, Y. Liu, and W. Xi, “WILL: Wireless indoor localizationwithout site survey,”
IEEE Transactions on Parallel and DistributedSystems , vol. 24, no. 4, pp. 839–848, 2013.[14] A. M. Bernardos, J. R. Casar, and P. Tarr´ıo, “Real time calibrationfor RSS indoor positioning systems,” , no. September, pp. 15–17, 2010.[15] M. M. Atia, A. Noureldin, and M. J. Korenberg, “Dynamic online-calibrated radio maps for indoor positioning in wireless local areanetworks,”
IEEE Transactions on Mobile Computing , vol. 12, no. 9,pp. 1774–1787, 2013.[16] S. Pan, J. Kwok, Q. Yang, and J. Pan, “Adaptive Localization in aDynamic WiFi Environment through Multi-view Learning.”
Nationalconference on artificial Intelligence
Mobile Computing, IEEE Transactions on , vol. 14, no. 5, pp. 1031–1043, 2015.[18] B. P. Statistik, “Neural network based indoor positioning technique inoptical camera communication system,”
Katalog BPS , vol. XXXIII,no. 2, pp. 81–87, 2014. [Online]. Available: http://cid.oxfordjournals.org/lookup/doi/10.1093/cid/cir991$ \ delimiter”026E30F$[19] B. Wagner, D. Timmermann, G. Ruscher, and T. Kirste, “Device-freeuser localization utilizing artificial neural networks and passive RFID,” , 2012.[20] J. Xu, H. Dai, and W.-h. Ying, “Multi-layer neural network for receivedsignal strength-based indoor localisation,” IET Communications ,vol. 10, no. 6, pp. 717–723, 2016. [Online]. Available: http://digital-library.theiet.org/content/journals/10.1049/iet-com.2015.0469[21] M. M. Soltani, A. Motamedi, and A. Hammad, “EnhancingCluster-based RFID Tag Localization using artificial neural networksand virtual reference tags,”
International Conference on IndoorPositioning and Indoor Navigation , no. October, pp. 1–10, 2013.[Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6817886[22] M. Edel and E. Koppe, “An advanced method for pedestrian deadreckoning using BLSTM-RNNs,” , no. October,pp. 13–16, 2015.[23] S. R. Kulkarni and G. Harman, “Statistical learning theory: Atutorial,”
Wiley Interdisciplinary Reviews: Computational Statistics ,vol. 3, no. 6, pp. 543–556, 2011. [Online]. Available: http://onlinelibrary.wiley.com/doi/10.1002/wics.179/epdf[24] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring Strategies for Training Deep Neural Networks,”
Journal of MachineLearning Research , vol. 1, pp. 1–40, 2009.[25] M. Brunato and R. Battiti, “Statistical learning theory for locationfingerprinting in wireless LANs,”
Computer Networks , vol. 47, no. 6,pp. 825–845, 2005.[26] M. T. Hagan, H. B. Demuth, and M. H. Beale, “Neural NetworkDesign,” pp. 1–1012, 1995. [Online]. Available: http://books.google.ru/books?id=bUNJAAAACAAJ[27] C. Zhou, L. Ma, and X. Tan, “Joint semi-supervised rss dimensionalityreduction and fingerprint based algorithm for indoor localization,” in
Institute of Navigation (ION GNSS+2014), 27th International TechnicalMeeting of The Satellite Division Conference on , September 2014, pp.3201–3211., September 2014, pp.3201–3211.