Graph Based Multi-layer K-means++ (G-MLKM) for Sensory Pattern Analysis in Constrained Spaces
GGraph Based Multi-layer K-means++ (G-MLKM) for SensoryPattern Analysis in Constrained Spaces
Feng Tao, Rengan Suresh, Johnathan Votion, and Yongcan Cao
Abstract —In this paper, we focus on developing a novel unsu-pervised machine learning algorithm, named graph based multi-layer k-means++ (G-MLKM), to solve data-target associationproblem when targets move on a constrained space and minimalinformation of the targets can be obtained by sensors. Instead ofemploying the traditional data-target association methods thatare based on statistical probabilities, the G-MLKM solves theproblem via data clustering. We first will develop the Multi-layer K-means++ (MLKM) method for data-target association atlocal space given a simplified constrained space situation. Thena p-dual graph is proposed to represent the general constrainedspace when local spaces are interconnected. Based on the dualgraph and graph theory, we then generalize MLKM to G-MLKM by first understanding local data-target association andthen extracting cross-local data-target association mathematicallyanalyze the data association at intersections of that space. Toexclude potential data-target association errors that disobeyphysical rules, we also develop error correction mechanisms tofurther improve the accuracy. Numerous simulation examples areconducted to demonstrate the performance of G-MLKM.
Index Terms —Graph theory, MLKM, clustering, data prepro-cessing, data-object association.
I. I
NTRODUCTION
Associating data with the right target in a multi-targetenvironment is an important task in many research areas, suchas object tracking [1], surveillance [2], [3], and situationalawareness [4]. Image sensors can be used to acquire richinformation related to each target, which will significantlysimplify the data-target association problem. For example,video cameras in a multi-target tracking mission can providecolors and shapes of targets as extra features in the associationprocess [5]. However, considering the costs, security issues,and special environments (e.g., ocean tracking [6], militaryspying), a simple, reliable, and low-cost sensor network isoften a preferred option [7]. Consequently, the data-targetassociation problem needs to be further studied, especially incases when the gathered data are cluttered and contains limitedinformation related to the targets.The existing approaches for data-target association are, ingeneral, consisted of three procedures [8]: (i) Measurementscollection - preparation before data association process, suchas object identification in video frames, radar signals pro-cessing, or raw sensor data accumulation; (ii) Measurementsprediction - predict the potential future measurements basedon history data, which yields an area (validation gate) thatnarrows down the search space; (iii) Optimal measurementselection - select the optimal measurement that matches historydata according to a criterion (varies in different approaches)
The authors are with the Department of Electrical and Computer Engineer-ing, University of Texas, San Antonio, TX 78249, USA.Corresponding Author: Yongcan Cao ([email protected]) and update the history dataset. With the same proceduresbut different choices of the optimal measurement criteria,many data-target association techniques have already beendeveloped. Among them, the well-known techniques includethe global nearest neighbor standard filter (Global NNSF) [9],joint probabilistic data association filter (JPDAF) [10]–[13],and multiple hypothesis tracking (MHT) [14].The Global NNSF approach attempts to find the maximumlikelihood estimate related to the possible measurements (non-Bayesian) at each scan (that measures the states of all targetssimultaneously). For nearest neighbor correspondences, thereis always a finite chance that association is incorrect [15].Besides that, the Global NNSF assumes a fixed numberof targets and cannot adjust the target number during thedata association process. A different well-known techniquefor data association is JPDAF, which computes associationprobabilities (weights) and updates the track with the weightedaverage of all validated measurements. Similar to GlobalNNSF, JPDAF cannot be applied in scenarios with targets birthand death [1]. The most successful algorithm based on thisdata-oriented view is the MHT [16], which takes a delayeddecision strategy by maintaining and propagating a subsetof hypotheses in the hope that future data will disambiguatedecisions at present [1]. MHT is capable of associating noisyobservations and is resistant to a dynamic number of targetsduring the association process. The main disadvantage of MHTis its computational complexity as the number of hypothesesincreases exponentially over time.There are other approaches available for data association.For example, the Markov chain Monte Carlo data association(MCMCDA) [5], [17]. MCMCDA takes the data-oriented,combinatorial optimization approach to the data associationproblem but avoids the enumeration of tracks by applying asampling method called Markov chain Monte Carlo (MCMC)[17], which still implements statistical probabilities in theprocedure of optimal measurement selection.The main contribution of this paper is the development ofan efficient unsupervised machine learning algorithm, calledGraph Based Multi-layer K-means++ (G-MLKM). The pro-posed G-MLKM differs from the existing data-target asso-ciation methods in three aspects. First, in contrast to theprevious developed data association approaches that estimatethe potential measurement from history data for each targetand select an optimal one from validated measurements basedon statistical probabilities, G-MLKM solves the data-targetassociation problem in the view of data clustering. Second,the previous approaches are mainly developed with respectto sensors that are capable of obtaining information froma multiple dimensional environment, such as radars, sonars,and video cameras. G-MLKM is proposed on sensors that a r X i v : . [ c s . L G ] S e p nly provide limited information. An interesting research ontracking targets with binary proximity sensors can be seen in[7], whose objective is only limited to target counting, whileG-MLKM can associate data to targets. Third, G-MLKM canaddress the case that targets move in a constrained space,which requires dealing with data separation and merging.The reminder of this paper is structured as follows. Thedata association problem in a constrained space and thecorresponding tasks are described in Section II. In SectionIII, the multi-layer k-means++ (MLKM) method is developedfor data-target association at local space given a simplifiedconstrained space situation. The graph based multi-layer k-means++ (G-MLKM) algorithm is then developed in SectionIV for general constrained spaces. Simulation examples arethen provided in Section V. Section VI provides a briefsummary of the work presented in this paper.II. P ROBLEM F ORMULATION
In this paper, we consider the problem of data-target associa-tion when multiple targets move across a road network. Here, aroad network is a set of connected road segments, along whichlow-cost sensors are spatially distributed. The sensors are usedto collect information of targets, which, in particular, are thevelocity of targets and the corresponding measured time. Weassume 1) there is no false alarm in the sensor measurements,and 2) the target’s velocity does not change rapidly withintwo adjacent sensors. The collected information about a targetis normally disassociated with the target itself, meaning thatthe target from which the information was captured cannot bedirectly identified using the information. Hence, data-targetassociations is necessary.Fig. 1 shows one road network example that consists of6 road segments. Without loss of generality, let the totalnumber of road segments in one road network be denotedas L . The road segments are denoted as R , R , · · · , R L ,respectively. The length of road segment R i is denoted as D i for i = 1 , , · · · , L . To simplify discussion, we assumethe road segments are for one-way traffic, i.e., targets cannotchange their moving directions within one road segment.However, when the road segment allows bidirectional traffic,we can separate it into two unidirectional road segmentsand the proposed approach in this paper directly applies. Let S i = { S i , S i , · · · , S iN i } be a set of N i ∈ R sensors placedalong the direction of road segment R i . In other words, forsensor S ij ∈ S i , the larger the sub-notation j is, the furtherdistance the sensor locates away from the starting point of roadsegment R i . We denote the corresponding distance betweensensor S ij and the starting point of road segment R i as d ij .Hence, the position set for sensors in R i related to the startingpoint can be denoted as P i = { d i , d i , · · · , d iN i } , where ≤ d i < d i < · · · < d iN i ≤ D i .For each sensor S ij , its measurements are collected andstored in a chronological order. The collections are denoted asa column vector X ij , such that X ij = [ x ij , x ij , · · · , x m ij ij ] (cid:48) , where i ∈ { , , · · · , L } , j ∈ { , , · · · , N i } , the primesymbol represents the transpose operation for a vector, m ij is the total number of measurements in X ij , and x nij , n ∈ Fig. 1. An example road network. R i represents the road segment. S ij represents the j th sensor on the i th road segment. { , , · · · , m ij } , denotes an individual measurement in X ij .In particular, x nij = [ v nij , t nij ] stores the measured velocity v nij when one target passed by sensor S ij at time t nij . As theelements in X ij are stored in the chronological order, therecorded time for each measurement satisfies t ij < t ij < · · · In this section, we consider the special case when L = 1 ,i.e., the road network is only consisted of one road segment, R . In this special case, there are neither intersections norloops in the road network. Therefore, the tasks in identifyingdata-target associations are simplified to cluster X into m groups ( Task 1 ) only. One example of matrix X ∈ R × isshown in Fig. 2, which is the plot of measurements for 10different targets that are captured by 9 equally spaced sensorson road segment R . A. K-means++ Clustering and Deep Neural Network K-means [18] and k-means++ [19] are perhaps the mostcommon methods for data clustering. For a set of data points V e l o c i t y Target 1Target 2Target 3Target 4 Target 5Target 6Target 7 Target 8Target 9Target 10 Fig. 2. Example of X for a single road segment. in a N -dimensional system, the two algorithms perform clus-tering by grouping the points that are closer to the opti-mally placed centroids. From the machine learning perspec-tive, k-means learns where to optimally place a pre-definednumber of centroids such that the cost function, defined as Φ Y ( C ) = (cid:80) y ∈ Y d ( y, C ) , is minimized, where d ( y, C ) =min y ∈ Y (cid:107) y − c i (cid:107) represents the distance between a sub-setof measurements Y and a centroid c i and C = { c , ..., c k } represents the set of centroids. The associated cost functionis the sum of the Euclidean distances from all data pointsto their closer centroid. The cost function and optimizationalgorithms are the same for k-means and k-means++ whilethe only difference between them is that k-means++ placesthe initial guesses for the centroids in places that have dataconcentration, and consequently improves the running time ofLloyd’s algorithm and the quality of the final solution [19].A much more complex boundary may exist between twodata groups. Therefore, we also verify the potential perfor-mance of Deep Neural Network (DNN) algorithm [20] in thedata association process, which is known for the capability ofrecognizing underlying patterns and defining better decisionboundaries among data samples. For the purpose of evaluatingthe supervised DNN capabilities, a slight modification of theproblem is considered. Instead of a complete unlabeled dataset X , part of the measurements are pre-labeled, i.e., data-targetrelations for part of the measurements are known. Also, weextend the measurement’s dimensions to further include vt , v t , and vt as extra features so that the inner structure ofDNN can be simpler. Table I presents the detail settings ofthe DNN framework. TABLE IDNN CONFIGURATION PARAMETERS .Framework DefinitionCost Function SoftmaxActivation Function ReluOptimizer Adam OptimizerNumber of Hidden Layers 2Number of Neurons 8 B. K-means++ with data preprocessing While DNN can potentially provide better performance forthe data association problem, it demands labeled datasets fortraining. In real scenarios, however, the training dataset mayot be available. In contrast, k-mean++ can cluster data sam-ples without the need for labeled dataset. This unsupervisedproperty of k-means++ enables a wider application domain.Hence, k-means++ is more practical for the task of clustering X into m groups. Moreover, when the dataset X is smalland sparse, k-means++ can perform well on the task of data-target association.However, when the measurements are distributed along thetime axis and velocity profiles are close, k-means++ tendsto place the centroids in positions where data from differenttargets that overlap and hence causes an inaccurate data-target pairing. This happens because k-means implementsEuclidean distance to determine which centroid data sample ( v, t ) belongs, i.e., arg min ( v ∗ i ,t ∗ i ) ∈C (cid:113) ( v − v ∗ i ) + ( t − t ∗ i ) , (7)where C is the set of centroids. When data samples distributealong time axis, the time difference becomes the determiningfactor for grouping results.One natural way to balance the two components (timedifference and velocity difference) in (7) is to process X before applying k-means++. The idea of preprocessing issimilar to the principal component analysis [21] that projectsdata into a main axis. The preprocessed data sample is denotedas ˆ x n j = [ v n j , ˆ t n j ] , where ˆ t n j is given by ˆ t n j = t n j − d j − d ∗ v n j , (8)where j ∈ { , · · · , N } , n ∈ { , · · · , m j } , d j is theposition of sensor S j with respect to the starting point ofroad segment R , and d ∗ is the reference point for projecting.Fig. 3 is the preprocessed result for the dataset in Fig. 2. Inthis example, the reference point d ∗ is select to be the startingpoint, and we can see clusters for each target have been formedafter data preprocessing. V e l o c i t y Target 1Target 2Target 3Target 4 Target 5Target 6Target 7 Target 8Target 9Target 10 Fig. 3. Example of preprocessed X for a single road segment. C. Multi-layer K-means++ Through the preprocessing procedure, data can be roughlyseparated for different targets that provide dense and groupedsubsets. The boundaries between two groups, however, maybestill too complex for k-means++ to define, especially, when X is a large dataset and the grouped subsets are closeto each other. Inspired by the DNN capability of defining classification boundaries via a multi-layer structure and aback-propagation philosophy, we propose a new multi-layerk-means++ (MLKM) method that integrates the DNN’s multi-layer structure with the clustering capabilities of k-means++to overcome the complex boundary challenge.The proposed MLKM algorithm is performed via 3 lay-ers: (i) - data segmentation and clustering - The dataset issequentially partitioned into smaller groups for the purposeof creating sparse data samples for k-means++; (ii) - Er-ror detection and correction - Check the clustered data bysearching for errors through predefined rules and re-clusterthe data using nearest neighbor concepts [22] if an error isfound. Note that the k-means++ associates the data closer tothe optimally placed centroid based on the Euclidean distancebetween data point and centroid, which is a scalar quantity; (iii) - Cluster matching - match the clusters of each segmentby preprocessing the cluster centroids of all segments to thecluster centroid of the first segment and again grouping thembased on k-means++. A detail explanation for these threelayers are given as follows. Layer 1 (Data Segmentation & Clustering): Without loss ofgenerality, we assume that there are K sensors per segment.The dataset X ∈ R m × N ( m and N are the maximumnumber of measurements and the total sensor number insensor set S , respectively) is sequentially partitioned into E segments, such that E = (cid:26) N /K, N % K = 0 ,N /K + 1 , otherwise . In other words, when N % K (cid:54) = 0 , the last segment willcontain measurements from less than K sensors. In the fol-lowing of the paper, we assume that N % K = 0 in thefollowing of this paper for the simplicity of presentation. When N % K (cid:54) = 0 , we can add some extra artificial sensors with allzero measurements. Then the data segment can be definedas X e = (cid:83) eKj =( e − K +1 ¯ X j , where e = 1 , , · · · , E . K-means++ algorithm is then applied to each X e by excludingall zero elements. By aggregating the clustering results, wecan obtain a set of centroids for X e , e = 1 , , · · · , E ,defined as C e = { c e , c e , · · · , c e m } , and the associatedmeasurements with each c e k centroid are represented as T e k ,where k ∈ { , , · · · , m } . Layer 2 (Error Detection & Correction): The first layerseeks to associate data for each data segment. Because theclustering standard used in k-means++ is a scalar quantitywhile the actual measurements are given by vectors, there arepotential data association errors in T e k . Hence an additionallayer to perform error detection and correction is needed.The error detection is to verify logic rules to determine ifwrong data association appears in T e k . The error correctionwill conduct data re-association on the identified wrong as-sociations. To avoid the same wrong reassociation again, theglobal nearest neighbor standard is chosen as the re-associationtechnique instead of k-means++ given the assumption that thetarget’s velocity does not change rapidly within two adjacentsensors.We here proposed the following logic rules for error detec-tion: | T e k | > K ; • ∃ n (cid:54) = n , x n l ∈ T e k , x n j ∈ T e k ⇒ l = j ; • ∃ l ≥ j , x n l (cid:54) =0 ∈ T e k , x n j ∈ T e k ⇒ t n l ≤ t n j ;where | T e k | indicates the cardinality of T e k . The first rulemeans that more than K measurements appear in T e k . Thesecond rule means that more than one sensory measurementsfrom the same sensor are associated with one target in ¯ T e k .The third rule means that target is recorded in a later timeby a previous sensor. If one or more rules are satisfied, thecorresponding T e k is then considered to be an erroneous dataassociation and will be stored in Y ∗ e , where Y ∗ e refers to thewrong data associations in X e .The error correction is to re-associate data in Y ∗ e forthe purpose of breaking all the logic rules listed above.We propose to use the global nearest neighbor approach.Specifically, elements in Y ∗ e that belongs to measurementsof sensor S (cid:96) are selected sequentially to be evaluated againstwith every measurement in Y ∗ e that belongs to measurementsof sensor S (cid:96) +1) to obtain the best match. The evaluation isaccomplished via the following optimization process: arg min κ t κ (cid:96) +1) − (cid:32) t (cid:96) + (cid:13)(cid:13) d (cid:96) +1) − d (cid:96) (cid:13)(cid:13) v (cid:96) (cid:33) , s.t. x κ (cid:96) +1) ∈ Y ∗ e . With this procedure, all T e k are updated with the correctedclusters and all c e k are re-calculated based on the updated T e k . The new corrected set of centroids C e is updated forall segments and grouped into C = {C C ... C E } . Theposition of the centroid set C e is defined as d e = eK (cid:88) j =( e − K +1 d j /K. (9) Layer 3 (Cluster Matching): Through the preceding twolayers, data-target association can be accomplished for eachdata segment X e independently. However, the target associa-tions are uncorrelated among each data segment. In particular,the unsupervised k-means++ only groups data samples thatbelong to the same target while the clusters of each targetare anonymous. Hence, it is still unclear how to associate theclusters among different segments.In Layer 3, we project C e , e = 1 , · · · , E, using thepreprocessing technique that is stated in III-B. More precisely,the time component in c e k ∈ C e is preprocessed as ˆ t e k = t e k − d e − d v e k , ∀ e ∈ { , · · · , E } , ∀ k ∈ { , · · · , m } where c e k = [ v e k , t e k ] , and d e is the position of centroidset C e defined in (9). Then k-means++ is applied to thepreprocessed C to find the clusters that group cluster cen-troids in different data segments. Accordingly, the associatedmeasurements T e k with respect to each centroid are mergedtogether as T k and, hence, provides the complete data-targetassociation result for the entire road segment.Note that the proposed MLKM method may not be applieddirectly to the case when L > (i.e., more than one roadsegments). Therefore, we propose a more general method, named G-MLKM, to solve the general data-target associationproblem for a general road network in the next section.IV. G-MLKM FOR A GENERAL ROAD NETWORK In this section, we consider the general case when the roadnetwork is consisted of multiple road segments. To solve thedata-target association problem, we propose a new graph-basedmulti-layer k-means++ (G-MLKM) algorithm. In particular,G-MLKM uses graph theory to represent the road network asa graph, and then links data from different road segments ateach intersection of the road network by analyzing the graphstructure. The data-target association problem for a generalroad network is then solved by merging the clustering resultsat intersections with the MLKM results on each road segment.We first briefly introduce graph theory and the representa-tion of road networks using graphs as preliminaries. Then theprocedures for G-MLKM are explained in detail. In particular,we begin with a new graph representation for the road network.Then the procedures for linking measurements at intersections( Task 2 ) are described. After that, we unify the results on roadsegments and intersections, and complete the data mergingtask ( Task 3 ). A. Preliminaries1) Graph Theory: For a system of L connected agents,its network topology can be modeled as a directed graph G = ( V , E ) , where V = { v , v , · · · , v L } and E ⊆ V × V are, respectively, the set of agents and the set of edges thatconnect the agents. An edge ( v i , v j ) in set E means that theagent v j can access the state information of agent v i , but notnecessarily vice versa [23]. The adjacency matrix A ∈ R L × L of the directed graph G is defined by A = [ a ij ] ∈ R L × L ,where a ij = 1 if ( v i , v j ) ∈ E and a ij = 0 otherwise. 2) Graph Representation of Road Networks: There aremainly two strategies to represent road networks using graph,namely primal graph and dual graph [24]. In a primal graphrepresentation, road intersections or end points are representedby agents and road segments are represented by edges [25],while in a dual graph representation, road segments are repre-sented by agents and an edge exists if two roads are intersectedwith each other [26]. Compared with primal graph, dualgraph concerns more on the topological relationship amongroad segments. As the data-target associations for each roadsegment can be solved by the MLKM method, the focus here isto cluster data at each intersection. As a consequence, the dualgraph is a better option. However, the geometric propertiessuch as road length are neglected by dual graph. Hence, somefurther modification to the dual graph is needed. B. G-MLKM Algorithm In this subsection, we will provide the detail procedures forthe G-MLKM algorithm that are composed of the followingthree steps. ig. 4. (a) Dual graph representation for the road network in Fig. 1 withthe nodes and arrows representing, respectively, the agents and the directededges. (b) P-dual graph representation for the road network in Fig. 1, wheretwo sensor nodes represent one road segment and the edge within the twosensor nodes are ignored. In this example, there exist 3 subgraphs which aredenoted as a, b, and c . 1) Modified Graph Representation for Road Networks: Considering the cases when targets may stop in a road segmentor data collection process may terminate before targets passthrough a road segment, the total number of measurementscollected by sensor S iN i (locates near the ending point ofroad segment R i ) may be less than the one collected bysensor S i (locates near the starting point of road segment R i ). If the entire road segment is abstracted as one singleagent, the inequality of measurements in the road segmentmay create issues for the subsequent data-target associationsprocess. Here, we modify the dual graph by incorporatingthe primal graph for the representation of the road segment.In other words, we propose to replace each road segmentnode in the dual graph by two agents with one directed edgeconnecting them and the direction of the edge is determined bythe traffic direction. In particular, we use the sensor nodes S i and S iN i as the two agents. We may neglect the edge between S i and S iN i because we focus on data-target associationsat intersections while the data-target associations within theroad segment can be accomplished by the MLKM methodwithout the need for the knowledge of the graph. Moreover,the connection between S i and S iN i is unidirectional whenthe traffic is unidirectional. We call the new graph “p-dualgraph”, i.e., prime-based dual graph. An example of how toderive the p-dual graph is shown in Fig. 4, where the original6 agents in the dual graph are replaced by 12 agents and theedges between S i and S iN i are removed in the p-dual graph.For a general road network with L edge segments, the edgesof the new p-dual graph is given by V ∗ = { S , S N , S , · · · , S L , S LN L } with the corresponding adjacency matrix, A ∗ ∈ R L × L , given by A ∗ = [ a ∗ ij ] ∈ R L × L , a ∗ ij = (cid:20) a ij (cid:21) . (10) 2) Graph Analysis for Data Pairing at Intersections: From A ∗ defined in (10), we can observe that the adjacency matrix A ∗ has L columns and L rows that are all zeros. Hence, thesparse matrix A ∗ can be further analyzed and decomposed toextract subgraphs related to different intersections. Then thetask of linking the trajectories of targets at road intersectionscan be equivalently solved via pairing measurements of sensor S i /S iN i from road segments in the subgraphs, which isfurther decomposed into the following three procedures. Algorithm 1 Subgraph Extraction Input: ∀ b ij ∈ A ∗ ; Output: ( O i nts T , I i nts T ) , i nts ∈ { a, b, c, · · · } Idx row = Idx col = { , , · · · , |A ∗ |} ; i = 0; for i nts in { a, b, c, · · · } do O i nts T = I i nts T = ∅ ; if | Idx row | ≥ then procedure I NCREMENT ( i ) i = i + ; if i ∈ Idx row then return i ; else I NCREMENT ( i ); procedure R ECURSION ( i ) if (cid:80) ∀ j ∈ Idx col b ij ≥ then O i nts T = O i nts T ∪ { i } ; procedure E XTRACT ( i ) for j in Idx col do if b ij (cid:54) = 0 then I i nts T = I i nts T ∪ { j } ; Idx col = Idx col \ I i nts T ; Idx row = Idx row \{ i } ; for j ∈ I i nts T do if (cid:80) ∀ l ∈ Idx row b lj ≥ then for l in Idx row do if b lj (cid:54) = 0 then O i nts T = O i nts T ∪ { l } ; if Idx row ∩ O i nts T (cid:54) = ∅ then E XTRACT ( ∃ l ∈ ( Idx row ∩ O i nts T )) ; else return ( O i nts T , I i nts T ) ; else Idx row = Idx row \{ i } ; i = i + ; R ECURSION ( i ) ; else break ; i. Subgraph Extraction: The first procedure is to extractsubgraphs from A ∗ . Let the letters in alphabet { a, b, c, ... } denote the names for different intersections. The subgraphextraction procedure begins with an intersection name as a , follows by b , c , and so on. For any intersection i nts ,the subgraph extraction is conducted by cross-searching thenon-zero entries of the matrix A ∗ in a repeated row andcolumn pattern. The corresponding indices of row and columncontaining non-zero entries, indicating the agents and edgesthat are included in that subgraph, are stored in the sets O i nts T and I i nts T , respectively. More precisely, O i nts T denotesthe index set of road segments that have outgoing targetsrelated to intersection i nts and I i nts T denotes the index setof road segments that have ingoing targets related to thesame intersection. The index storing processes are defined as O i nts T = O i nts T ∪{ i } , and I i nts T = I i nts T ∪{ j } , where i, j are the ig. 5. An intersection consists of three road segments denoted as R i , R j ,and R k . The virtual reference for data preprocessing is in the center of theintersection with a radius of r to each road segment ending point. corresponding row index and column index, respectively. Theiterative search process will terminate and return ( O i nts T , I i nts T ) when there is no more non-zero element in the recordedrow and column indices. Algorithm 1 is the pseudo code forthe subgraph extraction procedure. The extracted results aredenoted as ( O i nts T , I i nts T ) , where i nts ∈ { a, b, c, · · · } . ii. Data Preprocessing at Intersections: Given the subgraphthat describes an intersection i nts is available from the pre-ceding subgraph extraction procedure, datasets of X i /X iN i which are subjected to the pairing task for the correspondingintersection can be pinpointed. In particular, (3) and (4) definethe dataset for the intersection i nts as an incoming dataset Q i nts I and an outgoing dataset Q i nts O , respectively. As weassume that 1) no false alarm in the measurements, and 2) thetarget’s velocity does not change rapidly within two adjacentsensors, data pairing at intersections may interpret as dataclustering. A potential machine learning technique for dataclustering is the k-means++. However, the sensors S i /S iN i from different road segments are not guaranteed to locate neareach other for a road intersection, which may contribute to arelatively large time difference in two sensors’ measurementsfor one target. Hence, before applying k-means++, data pre-processing on Q i nts I and Q i nts O is necessary.Based on the proposed preprocessing definition in (8), wehere propose a new data preprocessing technique that firstselects a virtual reference at the center of the intersection i nts and then recomputes ˆ t kij via projecting each element in I i nts T and O i nts T to the virtual reference as ˆ t kij = t ki − r + d i v ki , i ∈ I i nts T & j = 1 t kiN i + D i − d iNi + rv kiNi , i ∈ O i nts T & j = N i , (11)where k ∈ { , , · · · , m ij } and r is the radius of the inter-section circle centered at the virtual reference. An example oflocating the virtual reference is shown in Fig. 5, where theintersection is consisted of 3 road segments denoted as R i , R j and R k . iii. Data Pairing at Intersections and Error Correction: Denote the preprocessed datasets for Q i nts I and Q i nts O as ˆ Q i nts I and ˆ Q i nts O . Then k-means++ can be applied to the preprocessedintersection datasets { ˆ Q i nts I , ˆ Q i nts O } for data pairing. Similar tothe development of MLKM for the case of one road segment,errors may arise when conducting the data pairing/clustering.Error detection and correction are needed to further improvethe accuracy. For an intersection i nts , the cardinalities of the preprocessed ˆ Q i nts I and ˆ Q i nts O remain the same as those of Q i nts I and Q i nts O . As defined in (5), | ˆ Q i nts I | = n I and | ˆ Q i nts O | = n O , where n I ≥ n O . The set of centroids is denoted as C i nts = { c i nts , c i nts , · · · , c i nts n I } , and the associated measurementswith each centroid c i nts j , j ∈ { , , · · · , n I } , are given as Y i nts j . The error correction is similar to the Layer 2 in theMLKM method described in section III-C, and defines threelogic rules for error detection: • | Y i nts j | > ; • | Y i nts j ∩ ˆ Q i nts I | (cid:54) = 1 ; • x iN i ∈ Y i nts j , x l (cid:54) =0 ∈ Y i nts j ⇒ t l ≤ t iN i ;where | Y i nts j | is the cardinality of Y i nts j . The first rule meansmore than two measurements are associated in Y i nts j . Errorcan be determined in this case because each target has at mosttwo measurements in one intersection. The second rule meanseither none or more than one sensory measurements can befound from the incoming dataset ˆ Q i nts I . The third rule meansthat the outgoing measurement in Y i nts j is recorded earlierthan the incoming measurement. If one or more rules aresatisfied, the corresponding Y i nts j is then considered to bean erroneous data association and will be stored in Y i nts . Theerror correction is to re-associate data in Y i nts for the purposeof breaking all the three logic rules listed above. To achievethis goal, we separate Y i nts into two subsets denoted as Y Ii nts and Y Oi nts given by Y Ii nts = { x iN i | ∀ x iN i ∈ Y i nts } , Y Oi nts = { x l | ∀ x l ∈ Y i nts } , where Y Ii nts and Y Oi nts store all measurements x iN i and x l in Y i nts , respectively. Re-associate data in Y i nts becomesa linear assignment problem [27] between Y Ii nts and Y Oi nts .The optimal pairing between Y Ii nts and Y Oi nts can be foundwhen the matching score reaches to the minimum via solvingthe optimization problem of arg min M || M × Y Ii nts − Y Oi nts || , where Y Ii nts ∈ R m I × and Y Oi nts ∈ R m O × are columnvectors converted from subsets Y Ii nts and Y Oi nts , respectively. M ∈ R m O × m I is a special binary matrix with the summationof each row being 1.After the error correction is accomplished, all Y i nts j willbe updated to complete Task 2 . Furthermore, a permutationmatrix G i nts ∈ R n I × n I can be created to record the pairingrelationship between incoming dataset Q i nts I and outgoingdataset Q i nts O for each intersection. 3) Group Merging in the Road Network: K-means++ clus-tering on the preprocessed dataset at each intersection solvesthe task of linking the trajectories of targets at road intersec-tions (Task 2) while the proposed MLKM method solves thetask of data associations for each road segment (Task 1) . Ifthe clustering results at all intersections are combined withthe MLKM results on all road segments, trajectory awarenessfor each target in the road network is achieved. This is validfor situations when targets only pass the same road segmentonce. However, when targets pass the same road segment andintersection for multiple times, one target can be assignedto multiple associated data groups on the road segment. Todetermine the connections among all associated data groups,an extra task ( Task 3 ) for merging data groups in the roadetwork is needed. Given that the datasets at intersectionsare extracted from the L matrices collected from all roadsegments, clusters at the intersections can be classified basedon the data groups for all road segments. Therefore, the task ofdetermining the connections among the associated data groupsin the road network can be focused on connections of ¯ T iz defined in (2) for each road segment.Let the symmetric matrix G R i ∈ R m i × m i denote theconnections among the m i association groups in road segment R i given by G R i = [ b ij ] ∈ R m i × m i where b pq = b qp = (cid:40) , if ¯ T ip , ¯ T iq belong to the same target , , otherwise . To determine the entries in G R i , the depth-first search (DFS)[28] is implemented to detect cycles in the adjacency matrix A .If cycles do not exist, the non-diagonal entries are set to andhence G R i is an identity matrix. Otherwise, further analysison the connections among data groups at each road segmentis operated sequentially in the following three steps: i. Node Analysis on Dual Graph: The analysis starts withidentifying road segments that have only outgoing flow, i.e.,source nodes in the graph. The source nodes can be identifiedfrom the adjacency matrix A by checking the sum of eachcolumn. In particular, road segment R i is a source node whenthe sum of the i th column of A satisfies L (cid:80) l =1 a li = 0 , where a li is the ( l, i ) th entry of the adjacency matrix, which representsthe edge ( R l , R i ) . ii. Trajectory Flow for Data Groups from Source Nodes: If the road segment R i is a source node, the m i data groupsin R i resulting from the MLKM method are considered to be m i unique targets. Then the trajectories of these m i targets aretraced in the road network. In particular, if ¯ T iz ∩ X iN i = ∅ ,the target associated with data group ¯ T iz does not contain anymeasurement from sensor S iN i , which corresponds to the casewhen target stops in the road segment or the data collectionterminates before the target could approach to sensor S iN i . Thetrajectory tracking for this target is then completed. Otherwise,the permutation matrix G i of intersection i that is consistedof sensor S iN i is utilized to pinpoint the trajectory of thesame target in the intersection, and its data group ¯ T lz in thesubsequent node or sink node R l where it is heading to. Thetrajectory tracking of the same target on the new road segmentswill keep on until the target stops or leaves the road network.The same process is used for tracing the flow of other targets. iii. Matrix Description of Intermediate Nodes: After thetrajectories of all targets from the road segments have beenconfirmed, data points for each target on different road seg-ments can be merged. More precisely, the corresponding entry ( p, q ) in G R l that is assigned as means that data groups ¯ T lp and ¯ T lq belong to one target. Consequently, the correspondingmatrix G R l can be determined.V. SIMULATIONIn this section, the performance of the proposed G-MLKMalgorithm is evaluated. We first introduce the testing datasetsgeneration process. Then the performance of the MLKM method on one road segment is evaluated and compared withk-means++ and DNN. Then the complete G-MLKM algorithmperformance is evaluated. A detailed example presenting theoutput via using G-MLKM is given to show how matrices G i nts and G R i are created for data pairing at intersectionsand group merging. A. Testing Data Generation In order to obtain a quantitative performance evaluationof the data association techniques, labeled data is needed toobtain the percentage of true association between targets andtheir measurements. One convenient way to have accuratelabeled dataset for data-target association is to generate itartificially. Let the generated testing dataset from the roadnetwork be M t = { T , T , · · · , T L } , where T i ∈ R m i × m i has the same data structure as T i defined in (1). In particular,each element in T i is a data group that belongs to one target.Moreover, for any T i collected from road segments that haveboth incoming and outgoing flows, multiple rows may belongto the same target.We utilize the road network structure shown in Fig. 1 as aprototype for testing data generation. Moreover, N S sensorsare assumed to be equally distributed on each road segment,where the length of road segment is N S × d . The position setfor sensors is selected as P i = { d, d, · · · , N S d } with respectto the starting point of road segment R i . The intersectionsare considered to have the same radius with the value of d/ . Hence, the distance between any two adjacency sensorsis d . To further simplify the data generation process, weassume road segment R is the only entrance of the roadnetwork during the data collection period with the incomingtargets number be N A , and targets have equal possibilitiesof valid heading directions at each intersection. The targetsare assumed to move with a constant velocity and the veloc-ity is also discretely affected by Gaussian noise, such that, v ij = v + N ( µ, σ ) , where v ij is one velocity measurementat sensor S ij , v is the velocity measurement at the previoussensor. The corresponding time measurement is calculated as t ij = t + v ij / ( j · d ) . The initial velocity and time for the N A targets are uniformly selected from the range ( v min , v max ) and ( t min , t max ) , respectively (refer Table II). The testing datasetgenerating process stops when all targets move out of the roadnetwork.With the generated testing datasets, we may evaluate theperformance of the data-target association techniques by cal-culating the data association accuracy, which is defined as theratio between correctly classified number of data ( M cr ) andthe total number of data ( M t ), such that, numel ( M cr ) numel ( M t ) × , where numel ( M ) returns the number of elements in M . Asmultiple testing datasets are generated, the provided statisticalinformation about performance includes the minimum (left -blue bar), average (middle - orange bar), and maximum (right- yellow bar) accuracies. B. MLKM Performance and Comparisons Before evaluating the entire accuracy of the proposed G-MLKM algorithm, the MLKM method is evaluated and com-pared with the other two common data clustering machine ABLE IIS IMULATION PARAMETERS .Simulation 1 Simulation 2 N A 10 50 N S 10 20 ( v min , v max ) U (10 , U (10 , t min , t max ) U (0 , U (0 , learning techniques, in particular, k-means++ and DNN, basedon the collected dataset in road segment R . 1) K-means++: The first set of simulations evaluate theperformance of K-means++ based on two criteria: (i) unpro-cessed vs. preprocessed data, and (ii) using different values of N A and N S . When the values of N A and N S increase, moredata points are introduced into the dataset, leading to moreoverlapping among these data points. Figures 6 and 7 showthe performance of K-means++ using the parameters listed inTable II. A cc u r a cy ( % ) Lower Bound Average Upper Bound Fig. 6. K-means++ accuracy for Simulation 1 parameters on unprocessed(UP) and preprocessed (P) data. A cc u r a cy ( % ) Lower Bound Average Upper Bound Fig. 7. K-means++ accuracy for Simulation 2 parameters on unprocessed(UP) and preprocessed (P) data. As can be observed, a higher accuracy is achieved using thepreprocessed data than that using the unprocessed data. Thiscan be seen by comparing average, maximum and minimumaccuracy for the two methods that use the preprocessed dataversus the the unprocessed data, as shown in Figure 6. Usingthe raw data, the measurements associated with a specifictarget are sparse along the time axis. However, the velocity measurements from the same sensor are closely grouped alongthe velocity axis. These conditions contribute to incorrectclustering of the data. The preprocessing technique reducesthe distance between target related measurements, thereforereducing the effect of the velocity measurements on theclustering.A low accuracy is obtained for large values of N A and N S .This can be observed by comparing average, maximum andminimum accuracy for different N A and N S , as shown in Fig-ures 6 and 7. Similar to the unprocessed data, a large numberof sensors/targets increases the density of measurement points.The concentration of measurements increases the probabilitythat K-means/K-means++ clusters the data incorrectly (evenwith preprocessing). 2) DNN: The K-means++ fails to correctly cluster datawhen overlapping of measurements occurs. Deep neural net-works (DNN) is used as an alternative approach because ithas been shown to provide good results to uncover patternsfor large dataset classification. One necessary condition forDNN is the availability of labeled datasets for training. Tomeet the requirements of DNN, it is assumed that labeled datais available for training.The results for DNN are obtained using N A = 50 targetsand N S = 50 sensors. Assuming that a portion of the dataassociation has already been identified, the objective is to traina neural network to label the unidentified measurements. Thenumber of ‘training’ sensors that provide labeled informationand ‘testing’ sensors that provide unlabeled information areprovided in Table III. The accuracy is obtained for variousproportions of ‘training’ sensors to ‘testing’ sensors. TableIII also shows the accuracy obtained for different datasetconfiguration. TABLE IIIDNN WITH DIFFERENT TRAINING AND TESTING DATASETS .Train Sensors Test Sensors Train Accuracy Test Accuracy20 30 98% 68%25 25 97.8% 68%30 20 99% 72%40 10 98.6% 84.4%45 5 98.9% 91.6% It can be observed that the training (respectively, testing)accuracy is high (respectively, low), when the testing dataset isrelatively small. However, when the testing dataset is relativelyhigh, the testing performance increases significantly (up to91%). A high training accuracy with a low testing accuracymeans that DNN suffers from overfitting due to the small sizeof training dataset. Given this comparison, DNN is applicablewhen a large portion of training dataset is available to trainthe network for classifying a relatively small amount ofmeasurements. 3) MLKM: K-means++ does not provide good accuracy fora high number of measurements but performs well when clus-tering small amounts of data. DNN can cluster large datasetsbut requires a large training dataset. MLKM combines themulti-layer back-propagation error correction from DNN andthe clustering capabilities of K-means++. The DNN-inspirederror correction significantly improves the performance ofLKM by preventing the clustering errors in layer 1 topropagate to the cluster association in layer 3.The results for the MLKM method are obtained using N A = 50 number of targets and N S = 20 number ofsensors. In addition, the time and velocity parameters are set to ( t min , t max ) = U ( − , and ( v min , v max ) = N (50 , ,receptively. Figure 8 shows the performance of the MLKMmethod with and without error correction, as well as resultsusing the standard K-means++ method with preprocessing. 64 79 87.168.75 84.3 91.65 72 87 96.2P: Kmns++ MLKM w/o EC MLKM w/ EC020406080100 A cc u r a cy ( % ) Lower Bound Average Upper Bound Fig. 8. K-means++ for preprocessed data (P:K-means++), MLKM withouterror correction (MLKM w/o EC) and MLKM with error correction (MLKMw/ EC). It can be observed that a higher accuracy is achievedusing MLKM than that using K-means++. Figure 8 shows theaverage, maximum and minimum accuracy for both methods.The error correction performed in layer 2 improves the averageaccuracy of MLKM by approximately 7% (MLKM w/ EC91.65%; MLKM w/o EC 84.3%). C. G-MLKM Overall Performance The results for the G-MLKM method are obtained using N A = 20 number of targets and N S = 10 number ofsensors. In addition, the time and velocity parameters are setto ( t min , t max ) = U (0 , and ( v min , v max ) = U (10 , ,respectively. Figure 9 shows the performance of the G-MLKMalgorithm with and without error correction. 79 91 81 92.284 94G-MLKM w/o EC G-MLKM w/ EC020406080100 A cc u r a cy ( % ) Lower Bound Average Upper Bound Fig. 9. Accuracy Obtained for Extended MLKM w/ EC and w/o EC. It can be observed that a higher accuracy is achieved usingG-MLKM with error correction than the result without errorcorrection. Figure 9 shows the average, maximum and mini-mum accuracy for both methods. The second error correctionperformed in the algorithm improves the average accuracy ofG-MLKM by approximately 11% (G-MLKM w/ EC 92.2%;G-MLKM w/o EC 81%). D. Matrix Output of the G-MLKM Algorithm The proposed G-MLKM algorithm implements multiple(determined by the structure of road networks) permutationmatrices G i nts and L symmetric matrices G R i to representthe data cluster classification results at intersections and roadsegments, respectively. A detail example is illustrated to showthe use of proposed G-MLKM matrix output.Suppose 5 targets (named as N , N , N , N , N , respec-tively) go through the road network as shown in Fig. 1 duringa certain time. The trajectory ground truth is listed in Table IV.In particular, road segment R has three data groups denotedas { , , } , R has six data groups denoted as { , , , , , } , R has five data groups denoted as { , , , , } , R hasthree data groups denoted as { , , } , R has one data groupsdenoted as { } , and R has two data groups denoted as { , } .Take target N as an example, it travels through roadsegment R , R , then heads to road segment R . After that,it keeps on moving through road segment R , R and fi-nally leaves the road network through road segment R .The connections among associated data groups in each roadsegment that are related to target N is represented as { , , , , , } , which means data group 1 in roadsegment R , data groups 1 and 6 in road segment R , datagroup 1 in road segment R , data group 3 in road segment R , and data group 5 in road segment R all belong to themeasurements extracted from target N .As the road segment R has two data groups belong to onetarget, the ideal matrix G R ∈ R × should be G R = , with respect to its data groups { , , , , , } . For the otherroad segments, the corresponding matrix G R i is an identitymatrix related to its own data groups. Especially, G R = I × , G R = I × , G R = I × , G R = I × , G R = I × .Let the intersection formed by road segments R , R and R be denoted as a . The incoming dataset Q aI ∈ R × can bestored in the sequence of { , , } and the outgoing dataset Q aO ∈ R × can be stored in the sequence of { , , } .Therefore, the permutation matrix G a may determined as G a = . Similarly, for the intersection formed by road segment R , R and R (named as intersection b ), matrix G b may determinedas G b = , arget Trajectory Representation N R → R → R → R → R → R { , , , , , } N R → R → R { , , } N R → R → R { , , } N R → R → R → R { , , , } N R → R → R → R { , , , } TABLE IVG ROUND T RUTH FOR ARGETS T RAJECTORIES . T HE REPRESENTATIONOF A i DENOTES THE ASSOCIATED DATA GROUP A IN ROAD SEGMENT R i . with Q bI ∈ R × stored in the sequence of { , , , , , } and the outgoing dataset Q bO ∈ R × in the sequence of { , , , , , } . For the intersection formed by roadsegment R , R and R (named as intersection c ), G c maydetermined as G c = , with Q cI ∈ R × stored in the sequence of { , , , , , } and the outgoing dataset Q cO ∈ R × in the sequence of { , , , , , } .With these matrices determined, the output result from G-MLKM can be clearly presented.VI. C ONCLUSIONS AND F UTURE W ORK This paper has studied data pattern recognition for multi-targets in a constrained space, where the data is the mini-mal information provided by spatially distributed sensors. Incontrast to the existing methods that rely on probabilistichypothesis estimation, we proposed to utilize the machinelearning approach for the data correlation analysis. Two com-mon data clustering algorithms, namely, K-means++ and deepneural network, were first analyzed for data association givena simplified constrained space. Then the MLKM method wasproposed via leveraging the structure advantage of DNN andthe unsupervised clustering capability of k-means++. Afterthat, graph theory was introduced in the purpose of extendingthe scope of MLKM for a general constrained space. Inparticular, we proposed a p-dual graph for data associationat intersections and merged the results from local spaces andintersections through the dual graph of the constrained space.Simulation studies were provided to demonstrate the perfor-mance of the MLKM method and the proposed G-MLKM.Our future work will focus on releasing the assumptions in thispaper to improve G-MLKM in the scenarios of false alarms.R EFERENCES[1] B.-N. Vo, M. Mallick, Y. Bar-shalom, S. Coraluppi, R. Osborne,R. Mahler, and B.-T. Vo, “Multitarget tracking,” Wiley Encyclopediaof Electrical and Electronics Engineering , 2015.[2] B. Benfold and I. Reid, “Stable multi-target tracking in real-timesurveillance video,” in Proceedings of IEEE Conference on ComputerVision and Pattern Recognition , pp. 3457–3464, 2011. [3] I. Haritaoglu, D. Harwood, and L. S. Davis, “ W : real-time surveillanceof people and their activities,” IEEE Transactions on Pattern Analysisand Machine Intelligence , vol. 22, no. 8, pp. 809–830, 2000.[4] M. R. Endsley, “Toward a theory of situation awareness in dynamicsystems,” Human Factors , vol. 37, no. 1, pp. 32–64, 1995.[5] H. Pasula, S. Russell, M. Ostland, and Y. Ritov, “Tracking many objectswith many sensors,” in Proceedings of International Joint Conferenceon Artificial Intelligence , vol. 99, pp. 1160–1171, 1999.[6] T. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar tracking of multipletargets using joint probabilistic data association,” IEEE Journal ofOceanic Engineering , vol. 8, no. 3, pp. 173–184, 1983.[7] J. Singh, U. Madhow, R. Kumar, S. Suri, and R. Cagley, “Trackingmultiple targets using binary proximity sensors,” in Proceedings ofInternational Conference on Information Processing in Sensor Networks ,pp. 529–538, 2007.[8] G. Grisetti, “Robotics 2 data association.” available online athttp://ais.informatik.uni-freiburg.de/teaching/ws09/robotics2/pdfs/rob2-11-dataassociation.pdf, 2010.[9] P. Konstantinova, A. Udvarev, and T. Semerdjiev, “A study of a targettracking algorithm using global nearest neighbor approach,” in Pro-ceedings of the International Conference on Computer Systems andTechnologies , pp. 290–5, 2003.[10] Y. Bar-Shalom, P. K. Willett, and X. Tian, Tracking and data fusion .YBS publishing, 2011.[11] H. Ma and B. W.-H. Ng, “Distributive JPDAF for multi-target track-ing in wireless sensor networks,” in Proceedings of IEEE Region 10Conference , pp. 1–4, 2006.[12] H. Kim and J. Chun, “JPDAS Multi-Target Tracking Algorithm forCluster Bombs Tracking,” in Progress in Electromagnetic ResearchSymposium , pp. 2552–2557, 2016.[13] W. Yuhuan, W. Jinkuan, and W. Bin, “A modified multi-target trackingalgorithm based on joint probability data association and Gaussianparticle filter,” in Proceedings of World Congress on Intelligent Controland Automation , pp. 2500–2504, 2014.[14] S. S. Blackman, “Multiple hypothesis tracking for multiple targettracking,” IEEE Aerospace and Electronic Systems Magazine , vol. 19,no. 1, pp. 309–332, 2004.[15] I. J. Cox, “A review of statistical data association techniques for motioncorrespondence,” International Journal of Computer Vision , vol. 10,no. 1, pp. 53–66, 1993.[16] D. Reid, “An algorithm for tracking multiple targets,” IEEE Transactionson Automatic Control , vol. 24, no. 6, pp. 843–854, 1979.[17] S. Oh, S. Russell, and S. Sastry, “Markov chain monte carlo data asso-ciation for general multiple-target tracking problems,” in Proceedings ofIEEE Conference on Decision and Control , vol. 1, pp. 735–742, 2004.[18] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman,and A. Y. Wu, “An efficient k-means clustering algorithm: analysis andimplementation,” IEEE Transactions on Pattern Analysis and MachineIntelligence , vol. 24, no. 7, pp. 881–892, 2002.[19] D. Arthur and S. Vassilvitskii, “K-means++: The advantages of carefulseeding,” in Proceedings of Annual ACM-SIAM Symposium on DiscreteAlgorithms , pp. 1027–1035, 2007.[20] G. E. Hinton, “How Neural Networks Learn From Experience,” Cogni-tive Modeling , vol. 267, pp. 181–195, 2002.[21] M. Einasto, L. Liivam¨agi, E. Saar, J. Einasto, E. Tempel, E. Tago, andV. Mart´ınez, “SDSS DR7 superclusters-principal component analysis,” Astronomy & Astrophysics , vol. 535, p. A36, 2011.[22] S. Arya, D. M. Mount, N. Netanyahu, R. Silverman, and A. Y. Wu,“An optimal algorithm for approximate nearest neighbor searching infixed dimensions,” in Proceedings of ACM-SIAM Symposium on DiscreteAlgorithms , pp. 573–582, 1994.[23] Y. Cao, W. Yu, W. Ren, and G. Chen, “An overview of recent progressin the study of distributed multi-agent coordination,” IEEE Transactionson Industrial Informatics , vol. 9, no. 1, pp. 427–438, 2013.[24] P. Zhao, T. Jia, K. Qin, J. Shan, and C. Jiao, “Statistical analysis onthe evolution of openstreetmap road networks in beijing,” Physica A:Statistical Mechanics and its Applications , vol. 420, pp. 59–72, 2015.[25] S. Porta, P. Crucitti, and V. Latora, “The network analysis of urbanstreets: a primal approach,” Environment and Planning B: Planning andDesign , vol. 33, no. 5, pp. 705–725, 2006.[26] S. Porta, P. Crucitti, and V. Latora, “The network analysis of urbanstreets: a dual approach,” Physica A: Statistical Mechanics and itsApplications , vol. 369, no. 2, pp. 853–866, 2006.[27] H. W. Kuhn, “The hungarian method for the assignment problem,” NavalResearch Logistics (NRL) , vol. 2, no. 1-2, pp. 83–97, 1955.[28] R. Tarjan, “Depth-first search and linear graph algorithms,”