[PDF] Uncertainty modelling and computational aspects of data association

Abstract

A novel solution to the smoothing problem for multi-object dynamical systems is proposed and evaluated. The systems of interest contain an unknown and varying number of dynamical objects that are partially observed under noisy and corrupted observations. An alternative representation of uncertainty is considered in order to account for the lack of information about the different aspects of this type of complex system. The corresponding statistical model can be formulated as a hierarchical model consisting of conditionally-independent hidden Markov models. This particular structure is leveraged to propose an efficient method in the context of Markov chain Monte Carlo (MCMC) by relying on an approximate solution to the corresponding filtering problem, in a similar fashion to particle MCMC. This approach is shown to outperform existing algorithms in a range of scenarios.

Full PDF

UUNCERTAINTY MODELLING AND COMPUTATIONAL ASPECTSOF DATA ASSOCIATION

JEREMIE HOUSSINEAU, JIAJIE ZENG, AND AJAY JASRA

Abstract.

A novel solution to the smoothing problem for multi-object dy-namical systems is proposed and evaluated. The systems of interest containan unknown and varying number of dynamical objects that are partially ob-served under noisy and corrupted observations. An alternative representationof uncertainty is considered in order to account for the lack of informationabout the diﬀerent aspects of this type of complex system. The correspond-ing statistical model can be formulated as a hierarchical model consisting ofconditionally-independent hidden Markov models. This particular structure isleveraged to propose an eﬃcient method in the context of Markov chain MonteCarlo (MCMC) by relying on an approximate solution to the correspondingﬁltering problem, in a similar fashion to particle MCMC. This approach isshown to outperform existing algorithms in a range of scenarios.

Keywords : possibility theory, Markov chain Monte Carlo, simulated annealing,multi-target tracking 1.

Introduction

We consider the problem of performing inference for multi-object dynamicalsystems under partial, corrupted and noisy observations. This class of problems,known as multi-target tracking in the engineering literature [8, 22, 33], arises inmany applications, e.g. bio-imaging [4], robotics [24] and surveillance [3], which canall beneﬁt from principled inference solutions in diﬀerent ways: i) when the numberof objects is too large to be treated by hand, ii) when the phenomena of interesttake place on extended periods of time or, conversely, when an immediate responseis needed, iii) when the data available about each object is scarce and iv) whenit is diﬃcult to tell one object from another. One of the main diﬃculties withthe considered type of system is that the number of objects is not known a prioriand might vary in time due to a birth-death process. Also, objects are observedunder multiple perturbations: i) each object might or might not be detected, ii) ifan object is detected then its state is only partially observed and the observationis subject to noise and iii) observations not related to any object, referred to asfalse alarms, are also received. The main task when inferring the number of objectsin a given system as well as their respective state is to solve the data associationproblem, that is, to estimate whether or not observations at diﬀerent time stepsoriginate from the same object. Each of the above-mentioned perturbations incursa signiﬁcant increase in the size of the set of all possible data associations, making ithighly combinatorial. Due to this combinatorial nature, the task of estimating thecurrent state of all objects based on all previous observations, referred to as multi-object ﬁltering, is a diﬃcult problem. It has been an active research topic for severaldecades and continues to be challenging in spite of the ever-increasing availablecomputation resources [8, 33]. In this article, we aim to tackle the even harderproblem of multi-object smoothing, that is, our objective is to keep evaluatingthe likelihood of data associations at previous times in light of newly received data. a r X i v : . [ s t a t . C O ] S e p J. HOUSSINEAU, J. ZENG, AND A. JASRA

This is an important problem in practice since the elicitation of objects’ trajectoriesand origins is fundamental for the evaluation of the objects’ identities and of theassociated situational awareness. Indeed, knowing the current state of each objectis not suﬃcient in many situations and maintaining an up-to-date estimate of theirpast trajectories is often crucial. For instance, in defence applications, if an objectlabelled as “ally” crosses path with another object labelled as “enemy” then beingable to tell one from the other at a later time can be more critical than having anaccurate estimate of their state at that time.In the context of ﬁltering, one of the most natural ways of improving the tra-jectory estimates over the last few time steps is referred to as ﬁxed-lag smoothing,where a sliding window made of a given number of time steps is updated based onthe latest observations. The advantage with ﬁxed-lag smoothing is that the compu-tational cost can easily be tuned by selecting an adequate lag. However, since ourobjective is to elicit particular events that might have taken place at arbitrary timesteps, we consider instead a “batch” alternative where a user-deﬁned time-windowof interest is ﬁxed.Deﬁning a standard statistical model for representing multiple objects requiressetting a number of probability distributions and parameters to characterise thediﬀerent aspects of the problem, including highly uncertain phenomena such asfalse alarms. Such models also usually ignore the disparity between the diﬀerentobjects of interest in terms of behaviour and detection proﬁle. In this article, weconsider an alternative representation of uncertainty [14, 12], based on possibilitytheory [7], that allows for acknowledging the lack of information about the diﬀer-ent aspects of multi-object dynamical systems with the objective of increasing therobustness to misspeciﬁcation of the derived solutions. The considered representa-tion of uncertainty has links with imprecise probabilities [35] and Dempster-Shafertheory [6, 31].The use of MCMC to solve data association problems has been previously ex-plored in [25] as well as in [34, 20, 19]. The approach considered in these articles isbased on local proposals in the set of data association, with [34, 20, 19] additionallyconsidering the estimation of the object’s trajectories. The objective in this articleis to show that the set of data association can be explored eﬀectively with globalproposals without signiﬁcantly aﬀecting the probability of acceptance of each move.This result is achieved by leveraging the eﬃciency of an approximate multi-objectﬁltering method. The use of MCMC in discrete spaces is discussed more generallyin [36]. MCMC has also been used in conjunction with, or as a replacement of,sequential Monte Carlo in the context of ﬁltering for multi-object systems, see e.g.[21, 30, 23]; however this type of approach is less directly related to the methodproposed in this article.Overall, the contributions of the articles are as follows: i) a full multi-objectmodel is deﬁned in the context of possibility theory, building up on the compo-nents of single- and multi-object models of [29] and [11]; ii) a possibilistic analogueof a scalable solution to multi-object ﬁltering [15] is introduced; iii) the tools ofpossibility theory are used to deﬁne a suitable structure on the set of data asso-ciations; iv) a new eﬃcient MCMC-based solution for the multi-object smoothingproblem is introduced and its performance is demonstrated.We introduce a new statistical model for representing multi-object systems inSection 2. This is followed by the presentation of the proposed method for exploringthe set of data association in Section 3, before considering an extension of thisapproach in Section 4. The performance of the proposed method is then assessedon simulated data in Section 5.

NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION 3 Model

We consider a ﬁxed number K of time steps and assume without loss of generalitythat time steps take integer values between 1 and K . At each time step k ∈{ , . . . , K } , a set of observations Z k is received, containing both object-originatedobservations and false alarms. Each observation in the set Z k is an element ofan observation set Z , which is assumed to be a subset of R d Z . In order to modelthat an object might not be detected, we introduce the notation φ for the emptyobservation, that is, an object for which detection has failed is associated with theempty observation φ . We assume, as is standard, that an object cannot generatemore than one observation at each time step. Therefore, denoting ¯ Z k = Z k ∪ { φ } the set of observations at time k augmented with the empty observation for any k ∈ { , . . . , K } , any sequence of observations generated by an object through the K time steps of the scenario can be seen as an element of O K = ¯ Z × · · · × ¯ Z K \ { φ } K where the sequence of observation containing empty observations only is not con-sidered. Elements of O K are also referred to as observation paths or simply as paths . Data association can then be seen as the problem of determining the prob-ability for all the paths in a given subset of O K to be the true paths of objects inthe system under consideration. Another standard assumption about multi-objectsystems is that each observation cannot originate from more than one object; as aconsequence, not all subsets of O K are considered feasible and we focus on the set A of subsets of O K such that for all A ∈ A , any two diﬀerent observations paths o and o (cid:48) in A must verify that either o k = o (cid:48) k = φ or o k (cid:54) = o (cid:48) k for all k ∈ { , . . . , K } ,where o k denotes the k -th element of the sequence o . Less formally, elements of A only contain paths that are diﬀerent where they are not both equal to the emptyobservation. The set A , in spite of being a strict subset of the power set of O K ,has a large cardinality and evaluating the credibility of each of its elements by ex-haustion can be diﬃcult even when the number of observations at each time stepis small. Assuming, for simplicity, that the number of observations at every timestep is constant and equal to m , the number of elements in the power set of O K isequal to 2 m K −

1, which is prohibitively large even for toy problems. It is generallydiﬃcult to devise algorithms that perform inference on a large discrete space suchas A , yet, MCMC methods can help to address part of this challenge since theyonly require being able to evaluate the credibility of a given association A ∈ A proposed via some user-deﬁned transition kernel.In practice, we also need to estimate the interval of existence of each object. Forthis purpose, we introduce a set T which is similar to A except that each path o will be paired with a time of appearance m ∈ { , . . . , K } and the last time ofexistence n ∈ { m, . . . , K } . Formally, for all T ∈ T , any ( o, m, n ) in T must verify o k = φ for any k / ∈ { m, . . . , n } and, for any ( o (cid:48) , m (cid:48) , n (cid:48) ) in T diﬀerent from ( o, m, n ),it must hold that either o k = o (cid:48) k = φ or o k (cid:54) = o (cid:48) k for all k ∈ { , . . . , K } , as for dataassociations. We denote by κ the function extracting paths from tracks, that is κ ( t ) = o for any track t with path o .2.1. Uncertain variable and possibility function.

We consider a representa-tion of uncertainty [14] which can be used as an alternative to subjective proba-bilities in a statistical model. The objective of this representation of uncertaintyis to model information rather than randomness and therefore to address commonissues with statistical modelling for complex systems and with the use of subjectiveprobabilities. In the context of multi-object systems, some these issues are:

J. HOUSSINEAU, J. ZENG, AND A. JASRA

1) the associated models are inherently hierarchical which precludes the use ofimproper priors on the ﬁrst level of this hierarchy; however, there is oftenno prior information on the location of appearing objects which means thatuninformative priors are needed;2) as with many complex systems, there is a large number of parameters whichare not necessarily known in practice and learning these parameters is bothchallenging computationally as well as potentially useless if they are likelyto change drastically from one time step to the other; this is for instancethe case with the probability of detection;As will be shown in the next few sections, the proposed approach allows for ad-dressing these issues while preserving most of the usual intuitive mechanisms inBayesian inference.We model a ﬁxed but unknown quantity as a mapping x from a sample space Ω toa set X , referred to as an uncertain variable . The diﬀerence with a random variableis that Ω is not equipped with a probability distribution and, instead, there is areference element in Ω, denoted ω ∗ , which correspond to the true value x ∗ = x ( ω ∗ )of the considered unknown quantity. The information about the true value of x isrepresented by a non-negative function f x on X verifying sup x ∈ x f x ( x ) = 1, referredto as a possibility function . The scalar f x ( x ) ∈ [0 ,

1] corresponds to the credibilityof the event x = x for any x ∈ X and the credibility of the event x ∈ A for any A ⊆ X is given by sup x ∈ A f x ( x ). In particular, f x is not a density and the integral isreplaced by a supremum, which is consistent with the fact that the event x ∈ X hascredibility 1 by construction. Possibility functions are not characterised by theircorresponding uncertain variables and, instead, we say that the possibility function describes the uncertain variable. If y is another uncertain variable in a set Y andif x and y are jointly described by the possibility function f x , y then y is describedby the marginal possibility function f y ( y ) = sup x ∈ X f x , y ( x, y ) , y ∈ Y , and the possibility function describing x given that y = y is f x ( x | y ) = f x , y ( x, y ) f y ( y ) = f y ( y | x ) f x ( x )sup x (cid:48) ∈ X f y ( y | x (cid:48) ) f x ( x (cid:48) ) , x ∈ X , which is the analogue of Bayes’ theorem for possibility functions [5]. In this context,we will refer to f x and f x ( · | y ) as the prior and posterior possibility functionsrespectively and f y ( y | · ) will be called the likelihood function; similarly, f y ( y ) willbe referred to as the marginal likelihood. If it holds that f x , y ( x, y ) = f x ( x ) f y ( y )for all ( x, y ) ∈ X × Y then x and y are said to be independently described. Thisform of independence only implies that the information about x is not related tothe information we hold about y .The expected value and variance can be deﬁned for possibility functions via thecorresponding law of large numbers and central limit theorem [14] as E ∗ ( x ) = arg max x ∈ X f x ( x ) V ∗ ( x ) = (cid:0) − ∆ f x (cid:0) E ∗ ( x ) (cid:1)(cid:1) − = E ∗ (cid:0) − ∆ log f x ( x ) (cid:1) − , where ∆ f x is the Laplacian of f x , with the variance being inﬁnite when E ∗ ( x ) isnot a singleton and undeﬁned when f x is not twice diﬀerentiable at E ∗ ( x ). Thevariance can be seen as being the inverse of an analogue of the Fisher information.Another useful notion of expected value, which is the direct analogue of the standardexpected value, can be deﬁned for any real-valued function ϕ on X as¯ E ( ϕ ( x )) = sup x ∈ X ϕ ( x ) f x ( x ) . NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION 5

The scalar ¯ E ( ϕ ( x )) can be interpreted as the maximum expected value of ϕ ( x ).Many concepts and results holding for probability distributions can be used forpossibility functions. For instance, if Y = X = R and if the likelihood function is anormal possibility function, i.e. f y ( y | x ) = exp (cid:16) − σ ( y − ax ) (cid:17) . = N( y ; ax, σ )for some a ∈ R and some σ >

0, then one can show that the posterior is also anormal possibility function if the prior f x is normal. In other words, the conceptof conjugate priors makes sense. This result can be extended to the multivariatecase and it has been shown in [13] that the posterior expected value and varianceof the Kalman ﬁlter can be recovered with possibility functions.If the objective is to ﬁnd the (subjective) probability p ( B ) of some event x ∈ B for some measurable subset B of X , then the credibility sup x ∈ B f x ( x ) can be seenas an upper bound for this probability and we ﬁnd that(1) 1 − sup x ∈ B c f x ( x ) ≤ p ( B ) ≤ sup x ∈ B f x ( x ) , where B c = X \ B is the complement of B in X . This interpretation implies that thepossibility function , which is equal to 1 everywhere on X , is the least informative.This uninformative possibility function is well deﬁned even when X is unbounded.It is also possible to interface uncertain variables and random variables in order tointroduce more sophisticated representations of uncertainty involving both lack ofinformation and randomness [12]. However, we will argue that all the elements ofthe introduced statistical model can be seen as subjective so that only possibilityfunctions will be used.2.2. Multi-object model.

We ﬁrst introduce the assumptions and notations formodelling the way objects appear, behave and disappear in Section 2.2.1 beforemoving on to the considered sensor modelling in Section 2.2.2. Most of the as-sumptions are standard in the ﬁeld of multi-object estimation.2.2.1.

Object and population dynamics.

We consider the case where there is noinformation about some or all of the components of the state of appearing objects.Typically, there might be no prior information about the position of objects whereasassumptions can be made about the velocity components. Denoting m ∈ { , . . . , K } the time step at which a given object has appeared, the state at this time stepis represented by an uncertain variable x m in a space X ⊆ R d X described by apossibility function f . With probabilistic modelling, improper priors might berequired in order to model the absence of information about appearing objects;however, the hierarchical nature of multi-object estimation implies that improperpriors cannot be used without adding heuristics at the level of data association[16, 27].We consider that there is a non-negligible heterogeneity between the dynamicsof the diﬀerent objects and that the characteristics of the objects’ motion is notnecessarily well known. As a consequence, we model the trajectory of an object asa sequence of uncertain variables { x k } nk = m +1 on X such that, for any k ∈ { m +1 , . . . , n } , x k is described by a possibility function f x k ( · | x m , . . . , x k − ) satisfying f x k ( x k | x m , . . . , x k − ) = g k ( x k | x k − ) , x k ∈ X , for some possibility function g k ( · | x k − ) on X . This is an analogue of the Markovproperty for uncertain variables.We take into account the fact that objects might completely disappear fromthe scene before the last time step, in which case we say that the object has “notsurvived”. This could be seen as a convenient way of dealing with objects that J. HOUSSINEAU, J. ZENG, AND A. JASRA are no longer detectable by the sensor(s). Object survival is not usually a randomevent so that we model it as an uncertain variable. The respective credibilities foran object with state x ∈ X to survive or not survive to the next time step aredenoted α s ( x ) and α ns ( x ). These credibilities must verify max { α s ( x ) , α ns ( x ) } = 1for any x ∈ X . We consider the case where α s = since we want to model thatobjects are unlikely to disappear right after appearing, for which we need to set α ns ( x ) (cid:28) x ∈ X . The subjective probability of survival for an object withstate x ∈ X is therefore restricted to the interval [1 − α ns ( x ) , x m : n ∈ X n − m +1 and of the corresponding last time of existence n ∈ { , . . . , K } for an object that is known to appear at time step m can be characterised by thepossibility function g ( x m : n , n | m ) = f ( x m ) α ns ( x n ) ( n

Most sensors acquire information about the objects of interestby measuring some signal over an array of resolution cells. This is the case forcameras, where these resolution cells are pixels, but also for most radars and sonars[32]. Considering for instance the case of a radar measuring range and azimuth,the internal processing of the radar image yields a set of resolution cells wherethe strength of the signal suggests the presence of an object in the correspondingdirections and at the speciﬁed distances. In addition, objects are often extendedand the signal can originate from diﬀerent edges and/or surfaces depending on their(unknown) orientations. As a consequence, we model the observation process viauncertain variables and consider the following form for the likelihood function: (cid:96) k ( z | x ) = exp (cid:16) −

12 ( z − h k ( x )) (cid:124) R − k ( z − h k ( x )) (cid:17) . = N( z ; h k ( x ) , R k ) , z ∈ Z , where R k is a d Z × d Z symmetric positive-deﬁnite matrix related to the size andshape of the resolution cells (assumed constant in Z ). The diﬀerence between thisnormal possibility function and the corresponding normal probability distributionwould not matter in a standard single-object tracking scenario since normalisingconstants would simplify in Bayes’ theorem; however, in multi-object tracking, theseconstants are important since they appear in the assessment of data associations.The credibility for an object with state x ∈ X to be detected is denoted α d ( x ) and,similarly, the credibility of a detection failure is denoted α nd ( x ). Since it must holdthat max { α d ( x ) , α nd ( x ) } = 1 for any x ∈ X , we will assume that α d = so thatit is unlikely for an object to remain undetected. Given a trajectory x m : n of anobject appearing at time step m and disappearing after time step n , it follows thatthe likelihood function for a path o ∈ O K is (cid:96) ( o | x m : n , m, n ) = n (cid:89) k = m α nd ( x k ) ( o k = φ ) (cid:96) k ( o k | x k ) ( o k (cid:54) = φ ) . NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION 7

The credibility for an observation z ∈ Z at time k ∈ { , . . . , K } to be a false alarmis denoted α k, fa ( z ), which will be assumed to be strictly lesser than 1; otherwise,if it were possible for all observation to be false alarms then this would be theposterior expected data association in general. The credibility for a given ﬁnitesubset Z of observations in Z to be false alarms is then f k, fa ( Z ) = (cid:89) z ∈ Z α k, fa ( z ) . As a possibility function on sets, f k, fa must verify that sup Z ⊆ Z f k, fa ( Z ) = 1.2.3. Target possibility function.

We now introduce the posterior possibilityfunction on the set T describing the unknown set of tracks, based on the modeldetailed in Section 2.2. For this purpose, we consider a track t = ( o, m, n ) and startby deﬁning the credibility π ( o, n | m ) of the pair ( o, n ) given the time of appearance m ∈ { , . . . , K } as π ( o, n | m ) = sup (cid:8) (cid:96) ( o | x n : m , m, n ) g ( x n : m , n | m ) : x n : m ∈ X n − m +1 (cid:9) . Other aspects such as false alarms and initial observations must be consideredjointly. We denote f fa the possibility function deﬁned on A as f fa ( A ) = K (cid:89) k =1 f k, fa ( Z k, fa ( A )) , for any A ∈ A , where Z k, fa ( A ) = { z ∈ Z k : ∀ o ∈ A, z (cid:54) = o k } is the set of false alarmsinduced by A at time step k . We also introduce f + as the possibility function on T deﬁned as f + ( T ) = K (cid:89) k =1 f k, + ( M k ( T ))where M k ( T ) = (cid:93) { ( o, m, n ) ∈ T : m = k } is the number of objects appearing attime step k ∈ { , . . . , K } . The functions f fa and f + , deﬁned respectively on A and T , are not possibility functions; instead, they are simply the joint credibility forobservations that are not in a given element of A to be false alarms and for tracksthat are in a given element of T to have appeared at the indicated time steps. Thetarget possibility function, i.e. the posterior possibility function Π on T describingthe unknown set of tracks, is then expressed for any T ∈ T as(2) Π( T ) ∝ f fa ( κ ( T )) f + ( T ) (cid:89) ( o,m,n ) ∈ T π ( o, n | m ) , and is such that max T ∈T Π( T ) = 1. The marginal likelihood for the set of paths A = κ ( T ) is then deﬁned as(3) ˆΠ( A ) = max { Π( T ) : T ∈ T , κ ( T ) = A } . MCMC for data association

Computational aspects of possibility theory.

Approximation methodsfor possibility functions must be devised in order to solve the corresponding infer-ence problems in general. Grid-based methods have the same shortcomings as in theprobabilistic case since it is often diﬃcult to anticipate where the posterior possi-bility function will take non-negligible values. Although one cannot sample directlyfrom a given possibility function f x , the latter can be used within MCMC togetherwith a proposal (probability) distribution. In this case, there is no requirementof targeting a given probability distribution and there is no concern regarding theindependence between samples. One of the consequences is that low-discrepancy J. HOUSSINEAU, J. ZENG, AND A. JASRA sequences can be used instead of pseudorandom numbers. The generated chain,say { X n } n ≥ , will simply be used to approximate the expected value¯ E ( ϕ ( x )) ≈ max n ≥ ϕ ( X n ) f x ( X n ) , for any real-valued function ϕ on X . As opposed to the standard Monte Carloapproximation, the possibility function f x appears explicitly in the expression of¯ E ( ϕ ( x )) since the density of samples in a given area conveys no information about f x ; instead, the chain { X n } n ≥ simply provides support points for the approxima-tion of f x as a function. If only the expected value E ∗ ( x ) of x is of interest, then thepossibility function f γ x for some γ > X containingat least 100(1 − α )% of the subjective probability mass deﬁned in (1), then areaswhere f x has value α must also be explored, hence justifying the use of a power γ strictly lesser than 1.When using the possibility function f γ x in a MCMC algorithm, it is the proba-bility distribution on X deﬁned as the renormalised version of f γ x that is targeted(assuming f γ x is integrable). This is not however the only possible approach. Indeed,(1) suggests that a possibility function can be seen as inducing an upper bound forprobability distributions. It follows that selecting the sampling distribution fromthe set of upper-bounded probability distributions is also meaningful. A particularchoice that is appropriate in many settings is to follow the maximum-entropy prin-ciple [18] and consider the maximum-entropy distribution that is upper boundedby f x as in (1), as proposed in [17]. When X is discrete, it is possible to furtherincrease the entropy by replacing the set-wise upper bound of (1) by a point-wiseupper bound of the form p ( x ) ≤ f x ( x ), x ∈ X , with p a probability mass function on X . This approach will be particularly useful in the context of multi-object inferencesince it will lead to an increase of the diversity of explored data associations whencompared to sampling from the distribution proportional to f x .3.2. Problem formulation.

The objective in the remainder of this section is todesign a proposal distribution for identifying the mode of the possibility function ˆΠdeﬁned in (3) via the Metropolis-Hastings algorithm. We assume for the momentthat this proposal distribution is given and express it as a Markov kernel Φ from A toitself. A natural starting point for exploring the set A is to consider the case whereall observations are false alarms, that is, we start from the element A = ∅ ∈ A . Weﬁrst assume that ˆΠ can be evaluated everywhere so that, given a previous sample A , a new sample A (cid:48) can be obtained from the probability distribution Φ( · | A ) andaccepted with probability(4) ˆ α t ( A, A (cid:48) ) = min (cid:18) , ˆΠ( A (cid:48) ) ρ t Φ( A | A (cid:48) )ˆΠ( A ) ρ t Φ( A (cid:48) | A ) (cid:19) , where t is the current iteration and ρ t is the inverse temperature deﬁned by ρ = 1and ρ t = ρ t − / (1 − c ) for some constant c .The main diﬃculty with the Metropolis-Hastings algorithm in the context of in-terest is to design a proposal distribution Φ with adequate properties. In particular,there are two issues with this approach which we will aim to solve in the remainderof this section:i) The possibility function ˆΠ on A is highly multimodal in general so thatmoves that are local both in space and time are unlikely to yield a suﬃcientexploration of the space. NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION 9 ii) Implementing moves on entire paths in the set O K would be more globalin nature; however this requires the non-trivial introduction of additionalstructure on this set.These two issues will be addressed in Sections 3.3 and 3.4 respectively. Section 3.5will then detail the construction of the proposal distribution Φ. Extensions ofthe MCMC algorithm introduced for ˆΠ to the possibility function Π on T will becovered in Section 4.3.3. Approximate multi-object ﬁltering.

In order to explore the diﬀerent pos-sible associations in the set T without getting stuck in local maxima and withoutincurring detrimental eﬀects on the mixing of the MCMC chain, we propose to usea multi-object ﬁlter to ensure that any proposed association is meaningful fromthe viewpoint of the model. The motivation for leveraging the capabilities of anapproximate ﬁltering algorithm to solve the corresponding smoothing problem isvery similar to the one behind particle MCMC [1]. To illustrate the challenge withproposing changes in data association, we consider the case where two objects havecrossing trajectories as in Figure 1a; if we only change one observation of a givenpath at a time, then it will take many moves to go from one high-credibility data as-sociation to another, and some of these moves will be in regions of arbitrarily smallcredibility. Alternatively, as is usual with MCMC algorithms, proposing biggermoves without taking into account the geometry of the target possibility functionwill result in an extremely low acceptance rate. This would be the case for instanceif we were to reassign paths by simply proposing new observations uniformly at ran-dom. The objective is therefore to obtain paths that are consistent with the modelgiven a restricted number of initial observations (ﬁrst observation in a path). Thecorresponding moves that we will construct will be global in the sense that theymight aﬀect all time steps but local in sense that only a restricted number of pathswill be (re)assigned. The considered ﬁltering algorithm should have a low com-plexity in order to limit the computational cost of the overall MCMC algorithm.A possible candidate could therefore be the probability hypothesis density (PHD)ﬁlter [22] or its analogue in the context of possibility theory [11]. However, the PHDﬁlter does not solve the data association problem and, as q consequence, cannot beused to propose paths. Instead, we consider an analogue of the hypothesised ﬁlterfor stochastic populations [15], or HISP ﬁlter, which is of the same complexity asthe PHD ﬁlter and which allows for distinguishing objects.At time step k ∈ { , . . . , K } , the HISP ﬁlter provides the marginal probabilitiesfor extending an existing path z k − ∈ ¯ Z × · · · × ¯ Z k − with an additional obser-vation z k ∈ ¯ Z k at the current time step. The standard version of the algorithmwould consider all such associations (at least the ones that are not too unlikely)and proceed to the next time step; however, we consider a modiﬁed version wherea feasible data association is drawn at every time step so that the number of con-sidered paths does not increase exponentially and the computational cost is furtherreduced. We also use the modelling based on possibility functions introduced inthe previous sections instead of the probabilistic modelling considered in [15]. Thediﬀerent steps of this modiﬁed HISP ﬁlter are given in the following sections.The context is as follows: since only part of the existing paths are reassignedand since observations can only be associated with one object, it follows that someof the observations are unavailable to the HISP ﬁlter; we denote by Z − k the setsof available observations at any time step k ∈ { , . . . , K } . We will assume in thissection that the credibility α nd of detection failure and the credibility α ns of non-survival are constant over the state space for the sake of simplicity; as opposed tothe probabilistic case, this can be achieved in general by selecting the (constant)credibility of detection failure to be sup x ∈ X α nd ( x ) and similarly for the credibility k=12345 (a) Reassignment of two paths when trajec-tories cross. k=123456 (b)

Track fragmentation and change of ob-servation.

Figure 1.

Examples of reassignment where black dots representobservations and diﬀerent line styles represent diﬀerent possibledata associations. Thin grey lines and red numbers show the timesteps to which diﬀerent observations belong.of non-survival. This operation can be seen as a voluntary loss of information withthe purpose of gaining a property of interest.3.3.1.

Initialisation.

We assume that, using local moves, the MCMC algorithmprovides a set Z c = { ( z i c , k i c ) } N c i =1 of N c pairs with z i c the initial observation for apath and with k i c the corresponding time step, i ∈ { , . . . , N c } . These observationsmight or might not be at the same time step but the pairs ( z i c , k i c ), i ∈ { , . . . , N c } ,are assumed to be diﬀerent from each other. A path will be initialised every timeone of these observations is encountered in the provided sets of observation Z − k .3.3.2. Prediction.

We denote by O k − the set of paths at time k −

1, that is thesubset of O k − = ¯ Z × · · · × ¯ Z k − \ { φ } k − composed of paths that have beenselected so far as potential sequences of object-originated observations. To eachpath o ∈ O k − corresponds a possibility function f k − ( · | o ) on the state space X .Recalling that g k is the Markov transition from X to itself describing the objects’dynamics, we obtain the predicted possibility function f k | k − ( x | o ) = sup x (cid:48) ∈ X g k ( x | x (cid:48) ) f k − ( x (cid:48) | o ) , x ∈ X . Such a prediction only considers the event where the object survives to the k -th timestep although it is possible for objects to disappear. We postpone considerationsof this aspect of the prediction to a further stage in the algorithm.3.3.3. Update.

At time step k , the set of observations Z − k is available to updatethe existing paths. For any path o in the set O k − of previously selected paths andfor any new observation z ∈ Z − k ∪ { φ } , the posterior possibility function associatedwith the extended path o : z , with “:” denoting concatenation, is deﬁned as f k ( x | o : z ) =  (cid:96) k ( z | x ) f k | k − ( x | o )sup x (cid:48) ∈ X (cid:96) k ( z | x (cid:48) ) f k | k − ( x (cid:48) | o ) if z ∈ Z − k f k | k − ( x | o ) if z = φ . NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION11

We can then select which observation in Z − k ∪ { φ } will be used to propagate thepath based on the credibility of the corresponding association. However, beforeexpressing the latter, we ﬁrst have to introduce the prior credibility of presence,which depends on the consecutive number of time steps for which the path underconsideration has not been detected. Indeed, there is some remaining ambiguitywhenever the empty observation φ is selected for a path since it is unclear in thiscase whether the detection has failed for the corresponding object or the object hasnot survived the last prediction step. We purposefully maintain this ambiguity andpostpone the decision in order to better estimate which of these two events occur.Indeed, the credibility of non-survival is most often much lower than the credibilityof detection failure, e.g. α nd = 0 . α ns = 0 . l time steps, then the credibility of the corresponding events, i.e. α l nd for the casewhere the object remains and α ns for the case where the object has disappeared,will rapidly favour a disappearance as opposed to a sequence of detection failures.For any path o ∈ O k − , we denote by l o the number of consecutive time steps forwhich φ has been selected, e.g. if o is of the form ( o , . . . , o k − , φ, φ ) with o k − (cid:54) = φ then l o = 2. We then compute the credibility that the corresponding object hassurvived/not survived since the last detection asˆ α s ( o ) = α l o nd α ns ∨ α l o nd , ˆ α ns ( o ) = α ns α ns ∨ α l o nd , with a ∨ b . = max { a, b } for any a, b ∈ R . The binary operator ∨ is assumed to havelower precedence than multiplication, so that a ∨ bc = a ∨ ( bc ) for any a, b, c ∈ R .We can now express the marginal credibility of association on Z − k ∪ { φ } for thepath o ∈ O k − as γ k ( z | o ) ∝ Γ k (cid:0) Z − k \ { z } | O k − \ o (cid:1) L k ( z | o )for any observation z ∈ Z − k ∪ { φ } , with L k ( z | o ) = (cid:40) ˆ α s ( o ) sup x ∈ X (cid:96) k ( z | x ) f k | k − ( x | o ) if z ∈ Z − k ˆ α ns ( o ) ∨ ˆ α s ( o ) (cid:0) α ns ∨ α nd (cid:1) otherwisethe marginal likelihood for the observation z and with Γ k ( Z | O ) the credibility forpaths in the set O ⊆ O k − to be associated with observations in the set Z ⊆ Z − k ,which can be expressed asΓ k ( Z | O ) = max σ : O → Z (cid:48) ∪{ φ } f fa ( Z \ σ ( O )) (cid:89) o ∈ O L k ( σ ( o ) | o ) , where the maximum is over all mappings σ from O to Z (cid:48) ∪ { φ } that are injectiveon Z (cid:48) and where σ ( O ) is the image of O by σ , i.e. σ ( O ) = { σ ( o ) : o ∈ O } .Although, the number of simultaneously reassigned paths will be limited in thecontext of interest, the number of observations in Z can be extremely large so thatthe computation of Γ k ( Z | O ) can be challenging. Yet, it is possible to rewrite thisterm by assuming that any two paths in O are unlikely to obtain large marginallikelihoods from a single observation in Z , that is, for any o, o (cid:48) ∈ O such that o (cid:54) = o (cid:48) and any z ∈ Z , there exists z (cid:48) ∈ Z such that L k ( z | o ) L k ( z | o (cid:48) ) ≤ L k ( z | o ) L k ( z (cid:48) | o (cid:48) ) . In the probabilistic version of this assumption [15], the left hand side needs to beequal to 0, which is more constraining. It follows that Γ k ( Z | O ) can be expressedas Γ k ( Z | O ) = f fa ( Z ) (cid:89) o ∈ O (cid:20) L k ( φ | o ) ∨ max z ∈ Z L k ( z | o ) α fa ( z ) (cid:21) . This result can be proved easily by developing the product in the approximatedexpression and removing the terms where a single observation is associated withseveral tracks. Using this expression, all the terms Γ k ( Z − k \ { z } | O k − \ o ), forany z ∈ Z − k ∪ { φ } and any o ∈ O k − , can be calculated with a computationalcomplexity of order | O k − || Z − k | . The approach is similar to the one detailed in [15]for the probabilistic case.We then select an observation in Z k ∪ { φ } at random for each of the pathsin O k − using the marginal credibility of association γ k ( · | o ) from the maximumentropy approach. There are two ways of enforcing the modelling assumption thatpaths cannot contain the same observation:i) use a rejection sampling strategy to ensure that only acceptable data asso-ciations are proposed, andii) completely reject the proposed data association if it contains overlappingpaths.The main drawback with the ﬁrst option is that calculating the probability ofproposing a given acceptable data association is combinatorial in nature and be-comes a computational bottleneck when the number of observations is large. Wetherefore consider the second option.Finally, we initialise a new path for any ( z i c , k i c ), i ∈ { , . . . , N c } , such that k i c = k . This path is of the form o = ( φ, . . . , φ, z i c ).At the last time step, a decision is taken for all observations paths, even theone ending with empty observations, and a set A c is deﬁned as the set of all cre-ated paths. The conditional probability for generating the set of paths A c giventhe initial observations Z c and the available observations Z − , . . . , Z − K is denoted P c ( A c | Z c , Z − K ).3.4. Structure on the set of paths.

In order to help exploring the set of dataassociations A , it is useful to equip the underlying set of paths O K with additionalstructure. The only natural structure on O K is the one inherited from the fact thatthe observations are in the set Z which is a subset of an Euclidean space. This is nothowever suﬃcient since simply measuring the distance between two observations z k and z (cid:48) k (cid:48) at two diﬀerent time steps k (cid:54) = k (cid:48) as (cid:107) z k − z (cid:48) k (cid:48) (cid:107) , with (cid:107)·(cid:107) the Euclidean norm,does not take into account the structure of the problem. Moreover, the notion ofdistance is very model-dependent and what is considered as “close” or “far” wouldneed to be adjusted for each scenario. Instead, we use the objects’ dynamical andobservation model as a reference and relate observations via the credibility for theseobservations to be generated by the same object. These observations can be seenas consistent if that credibility is close to 1 and inconsistent if it is close to 0. Inorder to simplify the calculations, we assume the existence of an upper boundingfunction g for the Markov transition g k such that(5) g k ( x | x (cid:48) ) ≤ g ( x | x (cid:48) ) , x, x (cid:48) ∈ X , for any k ∈ { , . . . , K } , with g of the form g ( x | x (cid:48) ) = N( x ; F x (cid:48) , Q ) , x, x (cid:48) ∈ X , for some d X × d X matrices F and Q .We consider two time steps k, k (cid:48) ∈ { , . . . , K } such that k < k (cid:48) as well as twoobservations z and z (cid:48) at time steps k and k (cid:48) respectively and introduce f k,k (cid:48) ( z (cid:48) | z )as the possibility for an object initialised from z at time step k to be observed againat time step k (cid:48) at z (cid:48) in the absence of any other observation, that is(6) f k (cid:48) | k ( z (cid:48) | z ) = sup x,x (cid:48) ∈ X (cid:96) k ( z (cid:48) | x (cid:48) ) g l ( x (cid:48) | x ) f k ( x | z ) NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION13 where l = k (cid:48) − k , where g l is the l -th fold convolution of the transition g , that is(7) g l ( x k (cid:48) | x k ) = sup x k +1 ,...,x k (cid:48)− ∈ X g ( x k (cid:48) | x k (cid:48) − ) . . . g ( x k +1 | x k ) , x k , x k (cid:48) ∈ X , and where f k ( · | z ) is the posterior possibility function deﬁned as f k ( x | z ) = (cid:96) k ( z | x ) f ( x )sup x (cid:48) ∈ X (cid:96) k ( z | x (cid:48) ) f ( x (cid:48) ) , x ∈ X . The possibility function g l ( · | x k ) is an upper bound for the convolution of theMarkov transitions g k +1 , . . . , g k (cid:48) . Assuming that f k ( · | z ) = N( m z , Σ ) and denotingby Σ l the covariance matrix after l predictions, e.g. Σ = F Σ F (cid:124) + Q , then thepossibility function f k (cid:48) | k ( · | z ) can be written f k (cid:48) | k ( z (cid:48) | z ) = N( z (cid:48) ; H k (cid:48) ,z,l F l m z , H k (cid:48) ,z,l Σ l H (cid:124) k (cid:48) ,z,l + R k (cid:48) )where H k (cid:48) ,z,l is the Jacobian of h k (cid:48) at the point F l m z ; the value of d k,k (cid:48) ( z, z (cid:48) ) canbe easily deduced.The main drawback of this notion of consistency is that observations tend tobecome more consistent as l increases since there is more uncertainty about thestate of the object as time passes by. To address this potential issue, we takethe credibility of detection α d ( · ) into account and focus on the credibility for anobservation z (cid:48) to be the next observation of the object after z . To ﬁt into theconsidered context, we introduce a lower bound a nd for the credibility of non-detection, i.e. a nd is such that α nd ( x ) ≥ a nd for any x ∈ X . It then follows that thepossibility for z (cid:48) to be the next observation after z isˆ f k (cid:48) | k ( z (cid:48) | z ) = a l − f k (cid:48) | k ( z (cid:48) | z ) , deﬁned for any l >

0. The function ˆ f k,k (cid:48) can be easily extended to Z ∪ { φ } bydeﬁning ˆ f k,k (cid:48) ( · | φ ) = ˆ f k,k (cid:48) ( φ | · ) = 0. Example 1.

To illustrate the use of the notion ˆ f k,k (cid:48) of consistency, a simple sce-nario consisting of 6 objects is considered as in Figure 2a. For each observation z ∈ Z k at some time step k ∈ { , . . . , K } we compute a marginal credibility for z as(8) ˆ f k ( z ) = max k (cid:48) : k (cid:48) >k (cid:16) max z (cid:48) ∈ Z k (cid:48) ˆ f k (cid:48) ,k ( z (cid:48) | z ) (cid:17) . The scalar ˆ f k ( z ) can be interpreted as the credibility for z to be followed by anotherobservation in Z k (cid:48) for some k (cid:48) > k . When creating a new track, we can then deﬁnethe probability for selecting z as the ﬁrst observation of the new track as a functionof a ˆ f k ( z ). A scatter plot displaying these credibilities for all observations is shownin Figure 2b.The advantage of relating observations in this way is that it can be easily ex-tended to paths. Indeed, we can deﬁne the consistency between an observation z ∈ Z k at some given time k with a path o in O K as(9) ˆ f k ( z, o ) = max (cid:110) max k (cid:48) : k (cid:48) k ˆ f k (cid:48) ,k ( o k (cid:48) | z ) (cid:111) , where the initial observation is either z or one of the observations in o . Similarly,the consistency between two paths o and o (cid:48) in O K is deﬁned asˆ f ( o, o (cid:48) ) = max (cid:110) max k,k (cid:48) : k (cid:48) k ˆ f k (cid:48) ,k ( o (cid:48) k (cid:48) | o k ) (cid:111) We can now propose to modify a given data association by changing nearby paths,and therefore focus the computational power on moves that are likely to be accepted.Although the approach considered here is not standard, it has two main advantages: -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (a) True trajectories as coloured lines, ini-tial/ﬁnal positions as circles/squares andobservations as black dots. -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (b) Scatter plot of the observations with asize proportional to the corresponding cred-ibilities of selection.

Figure 2.

Scenario with 6 objects as represented in (A) with thecorresponding scatter plot of the probability for ach observation tobe selected in (B).it relates observations together and applies to non-linear cases as long as a Gaussianupper-bounding function can be found.In practice, it might be necessary to reduce the time required for computingˆ f k (cid:48) ,k ( z (cid:48) | z ) between any pair ( z, z (cid:48) ) of observations, especially if the scenario runsover many times steps or if the number of observations at every time step is large. Inthat case, one can deﬁne a threshold τ (cid:48) such that if a l nd < τ (cid:48) then any observationsthat are l time steps apart will be arbitrarily assigned a credibility of 0.3.5. Design of the proposal distribution.

When designing a proposal distri-bution Φ for our MCMC algorithm, several requirements need to be considered: itshould be possible toi) reassign several paths simultaneously in order to address crossings as illus-trated in Figure 1a and track fragmentation,ii) reassign both the initial observation of a path and the subsequent path asin Figure 1b, andiii) create a new path.Requirement i) can be easily fulﬁlled by using the approach presented in Section 3.3however, instead of simply choosing the paths at random, it is more eﬃcient tofocus on nearby paths. In order to simultaneously reassign the initial observationsof a given set of paths A r as needed in Requirement ii), we consider the notion ofconsistency deﬁned in (9). Once a new initial observation has been selected, theapproach of Section 3.3 can be used to reassign the rest of the chosen path. Finally,the marginal consistency deﬁned in (8) can be used for Requirement iii) in orderto identify observations that are likely to be initial observations.The general objective is to ﬁnd a proposal distribution Φ that is as simple aspossible and such that the associated MCMC kernel is irreducible and reversible.Starting from a given set of paths A of size s = | A | , we suggest to proceed asfollows:1) Sample a number N r of paths to reassign from a probability mass function(p.m.f.) p r ( · | s ) such that N r ≤ s almost surely (a.s.), e.g. a truncated Poissondistribution. Then, sample the number N c of paths to be created from the p.m.f. NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION15 p c ( · | N r ) on the set of non-negative integers N deﬁned as p c ( n | N r ) (cid:40) δ ( n ) if N r = 0˜ p c ( n − N r ) otherwise,with ˜ p c a p.m.f. on {− , , } to be deﬁned. With this model, there will be onecreated path a.s. when none are reassigned (there is limited interest in creating sev-eral paths at once in this case) and the number of paths will be increased by one,kept constant or decreased by one in case of reassignment. Reducing the number ofpaths by one will address the issue of track fragmentation, keeping the number ofpaths constant is appropriate when considering objects with crossing trajectories,and leaving the possibility of increasing the number of paths is required to en-sure reversibility. Indeed, when evaluating the probability of the reverse proposal,created paths will become reassigned paths and vice versa.2) If N r = 0 then deﬁne A r = ∅ and proceed to the next step, otherwise, selectthe set A r = { o i } N r i =1 of paths to be reassigned as follows: the ﬁrst path o is pickeduniformly at random from the set of paths A then the N r − o : o i ∼ P o i − (cid:0) ˆ f ( o , · ) (cid:1) for any 1 < i ≤ N r , where P o i − ( · ) is a function transforming possibility func-tions into probability distributions, e.g. the maximum-entropy distribution upper-bounded point-wise by ˆ f ( o , · ), which we assume to verify (cid:88) o ∈ A P o i − (cid:0) ˆ f ( o , · ) (cid:1) ( o ) = 1 and P o i − (cid:0) ˆ f ( o , · ) (cid:1) ( o j ) = 0 , j ∈ { , . . . , i − } . Therefore, the set of paths A r is sampled without replacement from the set A .When evaluating the probability P r ( A r | N r ) for sampling the subset A r of A , allpossible ways of obtaining such a subset must be taken into account, that is P r ( A r | N r , A ) =  s (cid:88) σ ∈ Sym( N r ) N r (cid:89) i =2 P ( o σ (1) ,...,o σ ( i − ) (cid:0) ˆ f ( o , · ) (cid:1) ( o σ ( i ) ) if N r > δ ∅ ( A r ) otherwise,where Sym( n ) is the set of permutations of { , . . . , n } . Although the computa-tional complexity for this term is combinatorial, N r is usually small so the actualcomputational time is limited.3) If N c = 0 then deﬁne Z c = ∅ and proceed to the next step, otherwise, selectthe N c initial observations Z c = { ( z i c , k i c ) } N c i =1 from the set (cid:83) Kk =1 { ( z, k ) : z ∈ Z − k } of available observations, with Z − k deﬁned for any k ∈ { , . . . , K } as Z − k = (cid:8) z ∈ Z k : ∀ ( o, k − ) ∈ A \ A r , z (cid:54) = o k (cid:9) . The selection of the initial observations is performed without replacement asˆ z i c ∼ P (ˆ z ,..., ˆ z i − ) (cid:0) ˆ f N r c (cid:1) where ˆ z j c stands for the pair ( z j c , k j c ) for any j ∈ { , . . . , N c } , and where the possi-bility function ˆ f N r c is deﬁned as the marginal consistency (8) if N r = 0 and as theconsistency (9) with the future observation in the paths in A r otherwise. Indeed,when reassigning N r > originate from objects. The probability of proposing the subset Z c of observationstakes a similar form as for path reassignment and can be expressed as˜ P c (cid:0) Z c | N c , Z − K (cid:1) =  (cid:88) σ ∈ Sym( N c ) N c (cid:89) i =1 P (ˆ z σ (1)c ,..., ˆ z σ ( i − ) (cid:0) ˆ f N r c (cid:1) (ˆ z σ ( i )c ) if N c > δ ∅ ( Z c ) otherwise.The comment regarding computational complexity made about P r ( · | N r ) appliesequally here.4) Apply the approximate multi-object ﬁlter of Section 3.3 to the set of initialobservations Z c and with the sets of available observations Z − , . . . , Z − K and denote A c the generated set of paths. If A c ∩ A r (cid:54) = ∅ then we reject the proposal, otherwise,the proposed set of paths is A (cid:48) = ( A \ A r ) ∪ A c . The reason for rejecting the proposalwhen A c ∩ A r (cid:54) = ∅ is to ensure that A c and A r can be recovered from A and A (cid:48) as A c = A (cid:48) \ A and A r = A \ A (cid:48) .If the proposal has not been already rejected during its construction, the prob-ability Φ( A (cid:48) | A ) to go from the previous set of paths A to the new set of paths A (cid:48) is computed asΦ( A (cid:48) | A ) = P c (cid:0) A c | Z c , Z − K (cid:1) ˜ P c (cid:0) Z c | N c , Z − K (cid:1) P r ( A r | N r , A ) p c ( N c | N r ) p r ( N r | s ) . The probability ˆ α ( A, A (cid:48) ) of accepting the proposed set of paths A (cid:48) can then becomputed using (4). 4. MCMC on the set of tracks

We now want to design a MCMC algorithm that targets the possibility functionΠ as introduced in (2). In this case, the Metropolis-Hastings acceptance ratio is(10) α t ( T, T (cid:48) ) = min (cid:18) , Π( T (cid:48) ) ρ t Ψ( T | A )Φ( A | A (cid:48) )Π( T ) ρ t Ψ( T (cid:48) | A (cid:48) )Φ( A (cid:48) | A ) (cid:19) , with A and A (cid:48) the set of paths in T and T (cid:48) respectively. We therefore have topropose a time of appearance and a last time of existence for each path in A .These time steps will sampled independently from their previous values in T .4.1. Proposing the interval of existence.

The objective in this section is topropose a time of appearance m and a last time of existence n for a given path o ∈ O K , using the diﬀerent quantities introduced in Section 3.3. We consider a path o ∈ O K of the form o (cid:48) : φ . One can sample the lag corresponding to the last timeof appearance according to the probability mass function p − ( · | o ) on { , . . . , l o } deﬁned as the maximum-entropy distribution bounded by l (cid:55)→ α ( l

Evaluating the marginal likelihood.

So far, the proposed approach doesnot assume a speciﬁc model for the dynamics and for the observation process.Indeed, although the likelihood (cid:96) k ( · | x ) is assumed to take the form of a Gaussianpossibility function, the function h relating states to observations is general. We willhowever distinguish two diﬀerent cases for the evaluation of the marginal likelihood:the linear-Gaussian case in which Kalman ﬁltering can be used and the non-linearcase where sequential Monte Carlo techniques are a natural alternative.4.2.1. Linear-Gaussian case.

If the Markov transition g k is of the form g k ( · | x (cid:48) ) =N( F k x (cid:48) , Q k ) for some d X × d X matrices F k and Q k and for any k ∈ { , . . . , K } and if the observation function h k is of the form h k ( x ) = H k x then the posteriordistribution of the state at any time step can be computed analytically via theKalman ﬁlter. In particular, for a given path o ∈ O k − , we denote by m ok and Σ ok the mean and variance of the state at time k ∈ { , . . . , K } given the observations inthe path o . The only diﬀerence with the standard Kalman ﬁltering equation is themarginal likelihood which, due to the form of the likelihood, is expressed at timestep k asˆ (cid:96) k ( z | o ) = sup x ∈ X (cid:96) k ( z | x ) f k | k − ( x | o ) = N( z ; H k m ok , H k Σ ok H (cid:124) k + R k )for any z ∈ Z k .4.2.2. Non-linear case.

If either the objects’ dynamics or the observation function isnot linear, then there is no analytical form for the ﬁltering distributions at diﬀerenttime steps in general. Sequential Monte Carlo (SMC) methods are an alternativeto the Kalman ﬁlter in this case. An analogue [17] of the bootstrap particle ﬁlter [9]can be used, see also [28, 29]. In particular, for a given path o ∈ O k − , we denoteby { ( w ok − ,i , x ok − ,i ) } Ni =1 the indexed family of weighted particles approximating thepredicted possibility function f k | k − ( · | o ), i.e.¯ E ( ϕ ( x k ) | o ) ≈ max ≤ i ≤ N w ok − ,i ϕ ( x ok − ,i )for any real-valued function ϕ on X , with the uncertain variable x k being describedby f k | k − ( · | o ). Then ¯ E ( ϕ ( x k ) | o : z ) ≈ max i w o : zk,i ϕ ( x ok − ,i )max i w o : zk,i for any z ∈ Z k , where w o : zk,i = w ok − ,i (cid:96) k ( z | x ok − ,i ) for any i ∈ { , . . . , N } . In thissituation, the marginal likelihood at time step k can be approximated byˆ (cid:96) k ( z | o ) ≈ max ≤ i ≤ N w o : zk,i . Simulations

In all the cases to be considered, K = 50 and X = R . States at time step k are of the form x k = ( x k , y k , ˙ x k , ˙ y k ) (cid:124) , where x k and y k are the coordinates ofthe position in the 2-dimensional Euclidean space and where ˙ x k and ˙ y k are thecoordinates of the velocity. The duration of one time step is denoted ∆ and themotion model is assumed to be of the form q k ( x k | x k − ) = N ( x k ; F x k − , Q )with F =   and Q = σ  ∆ / / / / /

00 ∆ /  , where σ a is the standard deviation of the zero-mean random acceleration, which isconsidered as a noise term. This model is referred to as the nearly-constant velocitymodel. We will consider in particular the case where ∆ = 1 and σ a = 0 . x k , y k ) (cid:124) of an object is observed directly, which leads to h ( x k ) = Hx k with H = (cid:20) (cid:21) . The variance R is of the form σ I with σ > I the identity matrix ofdimension 2. This model is useful when tracking directly in the coordinate systemsdeﬁned by a sensor such as the image plane of a camera. Other situations where thismodel arises are when multiple sensors provide complex observations which can becombined into a single observation before being used in a tracking algorithm suchas with GPS or with multiple-input multiple-outputs sensor systems [2, 10, 26]. Wewill consider in particular the case where σ = 0 . Y = [ − , × [ − , Parametrisation of the proposed algorithm.

If the probability of detec-tion is p d then the possibility of detection failure is set to α nd = 1 − p d and thepossibility of detection α d is set to 1. The same approach is used with the prob-ability of survival. The possibility function α k, fa is assumed to be constant andequal to 10 − for all scenarios; this is in spite of the fact that the number of falsealarms will vary signiﬁcantly across the considered settings. The reason for this isthat α nk, fa is seen as an upper bound for the probability of having n false alarms. Asimilar approach is used for appearing objects with f k, + ( n ) = α n + with α + = 10 − .The other model parameters such as σ and σ a are assumed to be known.The proposed approach is compared to the MCMC for Data Association (MCMC-DA) method introduced in [25]. In order to make the two methods comparable,the possibility function Π is used to evaluate the log-likelihood of the proposed setsof tracks. However, as opposed to the proposed approach, MCMC-DA is providedwith the true parameters of the model in the design of the corresponding proposaldistribution.5.2. Choice of parameter.

We assume that the current sample from Π is T ∈ T and denote by A = κ ( T ) the corresponding set of paths. We then comment on thechoice of parameters for the diﬀerent steps in the proposal mechanism.The number N r of tracks to reassign is chosen from a Poisson distribution withparameter λ r = 1, truncated to the interval { , . . . , | A |} . The parameter λ r can beadjusted depending on the considered scenario: if objects are expected to be veryclose to each other and to frequently have crossing trajectories, then λ r could beincreased to raise the average number of tracks that are reassigned at once. Largereassignments are however less likely to be accepted so that a trade-oﬀ betweenexploration and mixing must be found, as is usual with MCMC.The distribution ˜ p c ( · | N r ) on {− , , } drives the increase or decrease of thenumber of tracks in the proposal step. Since one of the main issues with theMCMC approach for data association is track fragmentation, i.e. the representationof a single object by a series of shorter tracks, it is generally helpful to focus onreducing the number of tracks. We therefore consider the following parametrisation:˜ p c ( δ | N r ) =  / δ = − / δ = 01 / δ = 1.5.3. MCMC on the data association set.

The choice of parameter as well asthe performance of the proposed approach are assessed on diﬀerent scenarios.

NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION19 -60 -40 -20 0 20 40 60-60-40-200204060 (a)

Trajectories (with circles indicating ini-tial position) and observations (black dots). -60 -40 -20 0 20 40 60-60-40-200204060 (b)

Distance from one given observation tothe nearest observation. time (s) -3400-3300-3200-3100-3000-2900-2800-2700-2600-2500 l og - li k e li hood TrueHISPDA (c)

Comparison between diﬀerent methods. time (s) -2700-2680-2660-2640-2620-2600-2580 l og - li k e li hood c = 0.0005c = 0.001c = 0.005c = 0.01 (d) Trace plot for diﬀerent values of c . Figure 3.

Simple scenario with performance comparison for dif-ferent parameter choices.5.3.1.

Simple scenario.

We ﬁrst consider a simple scenario, as shown in Figure 3a,with 10 false alarms and 0 . p d = 0 .

9. The simplicity of the scenario is illustratedin Figure 3b where it appears that most of the false alarms are far from any otherobservation and, conversely, object-originated observations are close to each other.The performance of the two considered approaches is ﬁrst assessed on a single runin Figure 3c where the evolution of the log-likelihood is displayed as a function of thecomputational time. “HISP” refers to the proposed approach whereas “DA” refersto the MCMC-DA. The diﬀerence in behaviour between the proposed approachand MCMC-DA is due to the use of the simulated annealing in the former. Bothmethods provide satisfactory results in this case and the MCMC-DA’s chain mixeswell. Figure 3d, which displays the performance averaged over 50 repeats, showsthat setting the parameter c in the inverse temperature ρ t to 0 .

001 provides thebest performance throughout the duration of the runs.5.3.2.

Scenario with high false-alarm rate.

We consider a ﬁrst type of challengingscenario, depicted in Figure 4a, with the following challenging characteristics: thereare 100 false alarms and 0 . p d is equal to 0 .

8. In this case, it is the large numberof false alarms that make the estimation diﬃcult due to the fact that they arelikely to form coherent observation sequences over 2 to 3 time steps. This aspectis illustrated in Figure 4b where many false alarms can be seen to be near other observations. Figure 4c considers diﬀerent choices for the Poisson parameter λ r withthe log-likelihood being once again averaged over 50 repeats. The choice λ r = 1allows for rapidly creating tracks while proposing the simultaneous reassignmentof 2 tracks often enough to prevent track fragmentation, whereas setting λ r to 0 . . p c , with the log-likelihood being averaged over 50 repeats. Theassessed options are˜ p c (( − , , +1) | N r ) =  (1 / , / , /

3) as “uniform”(1 / , / , /

4) as “focus on − / , / , /

4) as “focus on 0”(1 / , / , /

2) as “focus on +1”,where ˜ p c (( δ , δ , δ ) | N r ) = ( p , p , p ) is a shorthand notation for ˜ p c ( δ i | N r ) = p i for i ∈ { , , } . The results in Figure 4d show that focusing on δ = − δ = 0. Once again, this can beattributed to the reduction in track fragmentation. The inﬂuence of the parameter c is considered once more in Figure 4e where it appears that c = 0 . c = 0 .

001 still provides good performancethroughout the run time and is considered for the other simulations. Figure 4fcompares the performance of the propose approach with MCMC-DA and showsthat the latter does not mix as well as in the ﬁrst scenario and fails to identify mostof the tracks. The fact that the proposed approach does not reach the true log-likelihood can be attributed to local maxima in the posterior possibility function Πas well as to identiﬁability issues. The trace plots are shown for 1 repeat as well asfor 50 repeats in order to show that the low average performance of the MCMC-DAis not due to averaging.5.3.3.

Scenario with low probability of detection.

To further assess the performanceof the considered approach, we consider another challenging scenario, as shown inFigure 5a, with the following characteristics: there are 25 false alarms and 0 . p d is equalto 0 .

5. The diﬃculty of this scenario is illustrated in Figure 5b where it appearsthat the inter-observation distance is not suﬃcient to clearly identify the objects;in particular, the observations belonging to the object at the bottom right barelyappear in Figure 5b, emphasising the fact that a probability of detection of 0 . References [1] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342,2010.[2] I. Bekkerman and J. Tabrikian. Target detection and localization using mimo radars andsonars.

IEEE Transactions on Signal Processing , 54(10):3873–3883, 2006.[3] B. Benfold and I. Reid. Stable multi-target tracking in real-time surveillance video. In

CVPR2011 , pages 3457–3464. IEEE, 2011.[4] N. Chenouard et al. Objective comparison of particle tracking methods.

Nature methods ,11(3), 2014.[5] B. De Baets, E. Tsiporkova, and R. Mesiar. Conditioning in possibility theory with strictorder norms.

Fuzzy Sets and Systems , 106(2):221–229, 1999.[6] A. P. Dempster. Upper and lower probability inferences based on a sample from a ﬁniteunivariate population.

Biometrika , 54(3-4):515–528, 1967.

NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION21 -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (a) Trajectories (with circles indicating ini-tial position) and observations (black dots). -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (b) Distance from one given observation tothe nearest observation. time (s) -2.42-2.41-2.4-2.39-2.38-2.37-2.36 l og - li k e li hood r = 0.5 r = 1 r = 1.5 (c) Trace plot for λ r ∈ { . , , . } . time (s) -2.42-2.41-2.4-2.39-2.38-2.37-2.36 l og - li k e li hood Uniformfocus on -1focus on 0focus on +1 (d)

Trace plot for diﬀerent choices for ˜ p c . time (s) -2.405-2.4-2.395-2.39-2.385-2.38-2.375-2.37-2.365-2.36-2.355 l og - li k e li hood c = 0.0005c = 0.001c = 0.005c = 0.01 (e) Trace plot for diﬀerent values of c . time (s) -2.5-2.45-2.4-2.35 l og - li k e li hood TrueHISP (50 repeats)DA (50 repeats)HISP (1 repeat)DA (1 repeat) (f)

Comparison between diﬀerent methods.

Figure 4.

Scenario with high false-alarm rate. [7] D. Dubois and H. Prade. Possibility theory and its applications: Where do we stand? In

Springer Handbook of Computational Intelligence , pages 31–60. Springer, 2015.[8] T. E. Fortmann, Y. Bar-Shalom, and M. Scheﬀe. Multi-target tracking using joint probabilis-tic data association. In , pages 807–812. IEEE, 1980.[9] N. J. Gordon, D. J. Salmond, and A. F. Smith. Novel approach to nonlinear/non-GaussianBayesian state estimation. In

IEE proceedings F (radar and signal processing) , volume 140,pages 107–113. IET, 1993.[10] A. M. Haimovich, R. S. Blum, and L. J. Cimini. MIMO radar with widely separated antennas.

IEEE Signal Processing Magazine , 25(1):116–129, 2007. -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (a) Trajectories (with circles indicating ini-tial position) and observations (black dots). -60 -40 -20 0 20 40 60 x position -60-40-200204060 y po s i t i on (b) Distance from one given observation tothe nearest observation. time (s) -6900-6800-6700-6600-6500-6400-6300-6200-6100-6000 l og - li k e li hood TrueHISP (50 repeats)DA (50 repeats)HISP (1 repeat)DA (1 repeat) (c)

Comparison between diﬀerent methods.

Figure 5.

Scenario with low probability of detection. [11] J. Houssineau. A linear algorithm for multi-target tracking in the context of possibility theory. arXiv preprint arXiv:1801.00571 , 2018.[12] J. Houssineau. Parameter estimation with a class of outer probability measures. arXivpreprint arXiv:1801.00569 , 2018.[13] J. Houssineau and A. N. Bishop. Smoothing and ﬁltering with a class of outer measures.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 6(2):845–866, 2018.[14] J. Houssineau, N. K. Chada, and E. Delande. Elements of asymptotic theory with outerprobability measures. arXiv preprint arXiv:1908.04331 , 2019.[15] J. Houssineau and D. E. Clark. Multitarget ﬁltering with linearized complexity.

IEEE Trans-actions on Signal Processing , 66(18):4957–4970, 2018.[16] J. Houssineau and D. Laneuville. PHD ﬁlter with diﬀuse spatial prior on the birth processwith applications to GM-PHD ﬁlter. In , 2010.[17] J. Houssineau and B. Ristic. Sequential Monte Carlo algorithms for a class of outer measures. arXiv preprint arXiv:1708.06489 , 2017.[18] E. T. Jaynes. Information theory and statistical mechanics.

Physical review , 106(4):620, 1957.[19] L. Jiang and S. S. Singh. Tracking multiple moving objects in images using Markov ChainMonte Carlo.

Statistics and Computing , 28(3):495–510, 2018.[20] L. Jiang, S. S. Singh, and S. Yıldırım. Bayesian tracking and parameter learning for non-linearmultiple target tracking models.

IEEE Transactions on Signal Processing , 63(21):5733–5745,2015.[21] Z. Khan, T. Balch, and F. Dellaert. MCMC-based particle ﬁltering for tracking a variablenumber of interacting targets.

IEEE transactions on pattern analysis and machine intelli-gence , 27(11):1805–1819, 2005.[22] R. P. S. Mahler. Multitarget Bayes ﬁltering via ﬁrst-order multitarget moments.

IEEE Trans-actions on Aerospace and Electronic systems , 39(4):1152–1178, 2003.

NCERTAINTY MODELLING AND COMPUTATIONAL ASPECTS OF DATA ASSOCIATION23 [23] V. Maroulas and P. Stinis. Improved particle ﬁlters for multi-target tracking.

Journal ofComputational Physics , 231(2):602–611, 2012.[24] J. Mullane, B.-N. Vo, M. D. Adams, and B.-T. Vo. A random-ﬁnite-set approach to BayesianSLAM.

IEEE T. on Robotics , 27(2), 2011.[25] S. Oh, S. Russell, and S. Sastry. Markov chain Monte Carlo data association for multi-targettracking.

IEEE Transactions on Automatic Control , 54(3):481–497, 2009.[26] Y. Pailhas, J. Houssineau, Y. R. Petillot, and D. E. Clark. Tracking with MIMO sonarsystems: applications to harbour surveillance.

IET Radar, Sonar & Navigation , 11(4):629–639, 2016.[27] B. Ristic, D. Clark, B.-N. Vo, and B.-T. Vo. Adaptive target birth intensity for PHD andCPHD ﬁlters.

IEEE Transactions on Aerospace and Electronic Systems , 48(2):1656–1668,2012.[28] B. Ristic, J. Houssineau, and S. Arulampalam. Robust target motion analysis using thepossibility particle ﬁlter.

IET Radar, Sonar & Navigation , 13(1):18–22, 2018.[29] B. Ristic, J. Houssineau, and S. Arulampalam. Target tracking in the framework of possibilitytheory: The possibilistic Bernoulli ﬁlter.

Information Fusion , 62:81–88, 2020.[30] F. Septier, S. K. Pang, A. Carmi, and S. Godsill. On MCMC-based particle methods forBayesian ﬁltering: Application to multitarget tracking. In ,pages 360–363, 2009.[31] G. Shafer.

A mathematical theory of evidence , volume 42. Princeton university press, 1976.[32] M. I. Skolnik.

Radar handbook second edition . McGrawHill, 1990.[33] B.-N. Vo, B.-T. Vo, and D. Phung. Labeled random ﬁnite sets and the bayes multi-targettracking ﬁlter.

IEEE Transactions on Signal Processing , 62(24):6554–6567, 2014.[34] T. Vu, B.-N. Vo, and R. Evans. A particle marginal Metropolis-Hastings multi-target tracker.

IEEE Transactions on Signal Processing , 62(15):3953–3964, 2014.[35] P. Walley.

Statistical reasoning with imprecise probabilities . Chapman and Hall, 1991.[36] G. Zanella. Informed proposals for local MCMC in discrete spaces.

Journal of the AmericanStatistical Association , 2019.

Department of Statistics of the University of Warwick, UK

E-mail address : [email protected] Department of Statistics and Applied Probability, National University of Singapore

E-mail address : [email protected] Computer, Electrical and Mathematical Science and Engineering Division, King Ab-dullah University of Science and Technology, Thuwal, 23955-6900, KSA

E-mail address ::