[PDF] Cluster Prediction for Opinion Dynamics from Partial Observations

Abstract

We present a Bayesian approach to predict the clustering of opinions for a system of interacting agents from partial observations. The Bayesian formulation overcomes the unobservability of the system and quantifies the uncertainty in the prediction. We characterize the clustering by the posterior of the clusters' sizes and centers, and we represent the posterior by samples. To overcome the challenge in sampling the high-dimensional posterior, we introduce an auxiliary implicit sampling (AIS) algorithm using two-step observations. Numerical results show that the AIS algorithm leads to accurate predictions of the sizes and centers for the leading clusters, in both cases of noiseless and noisy observations. In particular, the centers are predicted with high success rates, but the sizes exhibit a considerable uncertainty that is sensitive to observation noise and the observation ratio.

Full PDF

aa r X i v : . [ s t a t . C O ] J u l Cluster Prediction for Opinion Dynamicsfrom Partial Observations

Zehong Zhang, and Fei Lu

Abstract —We present a Bayesian approach to predict theclustering of opinions for a system of interacting agents frompartial observations. The Bayesian formulation overcomes theunobservability of the system and quantiﬁes the uncertainty inthe prediction. We characterize the clustering by the posteriorof the clusters’ sizes and centers, and we represent the pos-terior by samples. To overcome the challenge in sampling thehigh-dimensional posterior, we introduce an auxiliary implicitsampling (AIS) algorithm using two-step observations. Numericalresults show that the AIS algorithm leads to accurate predictionsof the sizes and centers for the leading clusters, in both casesof noiseless and noisy observations. In particular, the centersare predicted with high success rates, but the sizes exhibit aconsiderable uncertainty that is sensitive to observation noiseand the observation ratio.

Index Terms —Clustering prediction, opinion dynamics,Bayesian approach, state space model, sequential Monte Carlo

I. I

NTRODUCTION C LUSTERING behavior in a network of interacting agentsor particles arises in a vast range of disciplines [1]–[3].In the context of opinion dynamics of social networks, localinteractions among agents cause opinions to evolve, formu-lating one or more clusters of opinions. While the strikingphenomenon of consensus (one cluster) has attracted long-standing interest, non-consensus clustering, in which multiplestable clusters coexist, has attracted increasing interest toresemble the real-life social network [4]–[7]. Such clusteringof opinions or communities have a profound impact on thenetwork, so it is of great importance to predict these clustersfrom observations, which are often partial, at an early stage.We investigate the prediction of clusters for multi-agentopinion dynamics with multiple clusters, from short-timepartial observations which may be contaminated by noise. Inparticular, our objective is to predict the sizes and centers ofthe leading clusters. We assume the system is known (we referto [8]–[10] and the references therein for the learning of thegoverning equation from data). To predict the clustering, onemay estimate all agents’ current opinions and use them as aninitial conﬁguration for prediction. However, we show that it isan ill-posed inverse problem to estimate the current state frompartial observations (widely-studied as observability in control,see e.g., [11]). We propose a Bayesian formulation to makethe problem well-posed: we estimate the posterior distributionof the states conditional on the observations. We represent theposterior by samples, which provide initial conﬁgurations for

Zehong Zhang and Fei Lu are with the Department of Mathematics, JohnsHopkins University, Baltimore, MD 21218, USA (e-mail: [email protected];[email protected]). prediction. This procedure yields a posterior for the clusters’sizes and centers, quantifying the uncertainty in prediction.The major challenge in the Bayesian approach is to gen-erate samples for the high-dimensional posterior. Due to theintrinsic symmetry of the nonlinear opinion dynamics, thenon-Gaussian posterior has multiple local extrema, whichposed a hurdle for the performance of Sequential MonteCarlo(SMC) methods [12]–[14], including the optimal (one-step-observation) importance sampling methods such as im-plicit sampling [15]. The symmetry between the agents alsoprevents the feedback control or nudging methods [16]–[22]based on dominating modes in the observation.We overcome the challenge by introducing an AuxiliaryImplicit Sampling (AIS) algorithm that makes use of two-stepobservations, which is a sequential Monte Carlo method thatcombines the ideas from auxiliary particle ﬁlters [23], implicitsampling [15] and feedback control [20]. We also introducean MCMC-move step to reduce sample degeneracy and aninformation move step to reject non-physical samples.Numerical tests show that our AIS algorithm leads toaccurate prediction of the sizes and centers for the leadingclusters, in both cases of noiseless and noisy observations. Inparticular, the centers of the leading clusters are predicted witha high success rate, but the size of the leading cluster exhibitsa considerable uncertainty that is sensitive to observation noiseand the observation ratio. Our AIS algorithm brings improve-ment to implicit sampling, and both outperform the sequentialimportance sampling with resampling (SIR) method.The exposition in our manuscript proceeds as follows. InSection II, we deﬁne clusters for opinion dynamics with localinteractions, prove that the inverse problem of state estimationfrom partial observation is ill-posed, and propose a Bayesianformulation for cluster prediction. To represent the posterior,we introduce in Section III an auxiliary implicit samplingalgorithm that designs importance densities based on two-step observations. Section IV examines the performance ofthe AIS algorithm in numerical simulations. Finally, SectionV concludes the paper with discussions.II. B

AYESIAN APPROACH TO CLUSTER PREDICTION

Consider a group of N agents, each with an opinion at time t quantiﬁed by x it ∈ R d , interacting with each other accordingto a ﬁrst-order difference system: x it +1 − x it = αN N X j =1 φ ( k x jt − x it k )( x jt − x it ) . (1)Here, the positive constant α is a scaling parameter and theinteraction kernel φ is a non-negative function supported on [0 , R ] . The agents interact locally, only with those opinionsthat are “close” in the sense that the pairwise distance k x it − x jt k is less than R , with the strength of interaction the kernel.Our goal is to predict the clustering of the system, particu-larly the sizes and the centers of the leading clusters, supposingthat we only observe the trajectories of N of the N agents fora relatively short time, far before the system forms clusters.In this section, we provide a quantitative deﬁnition forclustering and discuss clustering prediction from partial ob-servations. We show that it is an ill-posed inverse problemto predict the clustering by estimating all agents’ trajectories.We introduce a Bayesian approach to make the problem well-posed, providing a probabilistic quantiﬁcation of the uncer-tainty in the prediction. A. Deﬁnition of clusters

Due to the local interaction between agents, clusters ofopinions will emerge, in which each agent only interacts withagents within the same cluster. More precisely, we deﬁne thesystem is in a clustered status as follows:

Deﬁnition 1 (Clustered status):

Let x t ∈ R dN be the stateof the system (1) with a local interaction kernel φ supportedon [0 , R ] . We say the system is clustered if the index set { , , . . . , N } of agents can be partitioned into disjoint clusters C ( t ) , ..., C m ( t ) such that for any i ∈ C k ( t ) and j ∈ C k ( t ) :(i) if k = k , then k x it − x jt k < R ,(ii) if k = k , then k x it − x jt k > R .An essential feature of the clustered status is that it isinvariant in time: a clustered system will remain clusteredwith the same clusters. In particular, each cluster is isolatedfrom other clusters; in each cluster, the agents formulatea self-contained dynamics and concentrate towards a localconsensus, the center of the cluster, since the interactionis symmetric (we refer to [3] for detailed discussions onclustering for local interactions). We summary this invariantfeature as a property of the system. Property 1 (Invariants of a clustered system):

Suppose thatat time t c , the system (1) is clustered into {C , ..., C K } . Then,the system will remain clustered with the same clusters for all t ≥ t c . In particular, the sizes and the centers of the clustersare invariant in time: for all t ≥ t c , |C k | : = |C k ( t ) | = |C k ( t c ) | ,x C k : = 1 |C k ( t ) | X i ∈C k ( t ) x it = 1 |C k ( t ) | X i ∈C k ( t ) x it c . (2)for each k = 1 , . . . , K , where |C k | and x C k denote the size(number of agents) and center (mean opinion of agents) ofcluster C k , respectively.These invariants characterize the clustering (the large time be-havior) of the opinion system. Therefore, our goal of clusteringprediction is to estimate these invariants: the sizes and centersof the clusters, particularly those of the largest clusters. B. Cluster identiﬁcation from partial observations

In practice, it is often the case that we can only observeor track partial of the agents. We consider the case that

TABLE 1Notation of variables in the state-space modelNotation Description x = ( x , ..., x N ) ∈ R dN state variable of the system Hx = ( x , ..., x N ) , H i x = x i opinions of observed agents Gx = ( x N +1 , ..., x N ) opinions of unobserved agents |C i | and x C i size and center of cluster C i x t = ( x , ..., x t ) ∈ R tdN trajectory of all agents z t = ( x , ..., x t ) ∈ R tdN trajectory of observed agents N out of the N agents are observed, with z T ∈ R T dN denoting their trajectories. We will consider either noiselessor noisy observations. The original model (1), together withan observation equation, can be written as the following statespace model: ( x t +1 = g ( x t ) ,z t = Hx t + ξ t , (3)where g ( x t ) is the right-hand-side of (1), and H : R dN → R dN is a projection operator mapping the vector of opinionsof all agents to its observed part. Without lost of generality,we assume that the ﬁrst N agents are observed. For simplicityof notation, we denote Hx = ( x , ..., x N ) ∈ R dN with H = [ I dN | × I dN ] and with H i x = x i as the i -thobserved agent. Similarly, for the unobserved agents, we deﬁneprojection operator G : R dN → R dN from the state x toits unobserved part, denoting Gx = ( x N +1 , ..., x N ) ∈ R dN with G = [0 × I dN | I dN ] and with G i x = x N + i as the i -thunobserved agent. We summarize the notation in Table 1.To predict the clustering, which is the large time behaviorof the dynamics, based on observations up to time T , a naturalidea is to (i) estimate the state of the system at time T , and(ii) use the estimated state as an initial condition for a longtime simulation until the system is clustered. For Step (i), onemay wish to ﬁnd a trajectory of the state variable that ﬁts theobservation data. However, the following section shows thateven with noiseless partial observations, it is an ill-posed in-verse problem to identify the trajectory x T from observation z T . Also, whereas a regularization can make the problemwell-posed in a variational approach, it leads to a challenginghigh-dimensional optimization problem on the path space andthere may be many local minima caused by the symmetry ofthe system. Instead, we adopt a Bayesian approach that avoidshigh-dimensional optimization and quantiﬁes the uncertaintyin prediction. C. State estimation and observability

In general, it is an ill-posed inverse problem to estimate thetrajectory of all agents from partial noiseless observations. Wedemonstrate this by an example of symmetric trajectories andby proving that the unobserved trajectories can not be uniquelydetermined in linear systems, referred to as unobservabilityin control (see e.g., [11]), when more than one agents areunobserved.The next example shows that as long as more than twoagents are unobserved, there could be symmetric trajectories,making it an ill-posed problem to identify the trajectories. -1.5 -1 -0.5 0 0.51st coordinate-0.6-0.4-0.200.20.40.6 c oo r d i na t e Observed initialend

Unobserved initialend -1.5 -1 -0.5 0 0.51st coordinate-0.6-0.4-0.200.20.40.6 c oo r d i na t e Observed initialend

Unobserved initialend

Fig. 1: Illustration of symmetric trajectories: same observed trajectories (bluepoints) are generated from different conﬁgurations (with different unobservedtrajectories in green). The color changes from light to dark to indicate timeincreasing from initial to end-time of observation.

Example 1 (Symmetric trajectories):

Consider a system with N = 4 agents in R and suppose that we observe N = 2 ofthem. Figure 1 illustrates that two different conﬁgurations canlead to the same observations. The symmetric positions of thetwo unobserved agents canceled out their different inﬂuenceon the observed agents.The following theorem show that it is an ill-posed problemto estimate the states of the system when more than one agentsis unobserved in the case of linear systems. Theorem 1 (Observability for linear opinion dynamics):

Consider the linear dynamics with φ ≡ in (1), and supposethat we observed the trajectory of N agents. Then, the trajec-tories of the unobserved agents can be uniquely determined ifand only if N ≥ N − . Proof 1:

We only need to consider N ≤ N − . We canwrite the system as ( x t +1 = αAx t + x t ,z t = Hx t , where A ∈ R dN × R dN is a constant matrix, A =  c I d c I d · · · c I d c I d c I d · · · c I d ... ... . . . ... c I d c I d · · · c I d  with c = − ( N − N and c = N . By the observability theory[11], the trajectory x T can be uniquely determined from theobservations z T if and only if rank ( W ) = dN , where W := (cid:2) H T | A T H T | ... | ( A T ) n − H T (cid:3) . To compute rank( W ) , note that A T = A and A = Q Λ Q T ,where Λ = diag( − I d ( N − , × I d ) and Q is a unitary matrix.Recalling that H = [ I dN | × I dN ] , we have ( A T ) k H T =( − k − AH T for k = 1 , . . . , n − . Thus, rank( W ) = rank([ H T | AH T ]) = ( N + 1) × d. D. Bayesian estimation of states and clusters

In a Bayesian approach, we view the states and the in-variants of the clusters as random variables and we aim torepresent their posteriors conditional on the observations.

TABLE 2Notation of variables in the Bayesian approachNotation Description p ( x T | z T ) , posterior of x T conditional on z T b p ( x T | z T ) empirical approximation of p ( x T | z T ) { x ( s )1: t , w ( s ) t } samples and weights p ( |C i | | z T ) , p ( x C i | z T ) posteriors of |C i | and x C i We begin by writing the system in the form of a state spacemodel: ( x t +1 = g ( x t ) + ǫ t , x ∼ µ ( · ) ,z t = Hx t + ξ t , (4)where the state-noise ǫ t and observation-noise ξ t are inthe noiseless case. When these two noise terms are zero, therandomness of the states comes from the initial distribution µ .Conditional on observations z T , we denote by p ( |C i | | z T ) ,and p ( x C i | z T ) the posteriors of the size and center of cluster C i , and similarly for the state variables, as in Table 2.These posteriors of the invariants depend on the initialdistribution as well as the system, and can not be expressedanalytically in general. They depend on the posterior of thestate p ( x T | z T ) , particularly p ( x T | z T ) when theopinion dynamics is deterministic. The analytical expressionof these posteriors of the state variables may be available, butthey are high-dimensional and non-Gaussian.We approximate these distributions by Monte-Carlo meth-ods: we draw a set of weighted samples (with normalizedweights), { x ( s )1: t , w ( s ) t } s ∈{ ,...,S } , by a sequential Monte Carlomethod (to be introduced in the next section) from the targetdistribution p ( x T | z T ) , and obtain empirical approx-imations of these distributions. For instance, the posterior p ( x T | z T ) is approximated by b p ( x T | z T ) = S X s =1 w ( s ) T δ x ( s ) T ( x ) . By running the original system from each of the samples { x ( s ) T } until the status of clustered, we obtain weighted sam-ples for the invariance of clusters { x ( s ) C i , w ( s ) T } s ∈{ ,...,S } and {|C ( s ) i | , w ( s ) T } s ∈{ ,...,S } . With these weighted samples, we havethe empirical posterior to quantify the uncertainty in clusterprediction:  b p ( x C i | z T ) = S X s =1 w ( s ) T δ x ( s ) C i ( x C i ) , b p ( |C i | | z T ) = S X s =1 w ( s ) T δ |C ( s ) i | ( |C i | ) . (5)With the weighted sample, we can efﬁciently approximatestatistics by the samples. For example, the expectations of thesize and center of cluster C i are E ( x C i ) ≈ c x C i := S X s =1 x ( s ) C i · w ( s ) T , E ( |C i | ) ≈ c |C i | := S X s =1 |C ( s ) i | · w ( s ) T . (6) III. S

AMPLING THE POSTERIOR

To initiate the ensemble simulation for prediction, we drawsamples from the conditional distribution of the current state, p ( x T | z T ) , which is the marginal distribution of the posteriordistribution p ( x T | z T ) . This posterior is high-dimensional,nonlinear and non-Gaussian, therefore it is difﬁcult to sampledirectly, even when its analytical form is explicitly available.We will adopt a Sequential Monte Carlo (SMC) strategy(we refer to [12] for a review), with a combination of im-plicit sampling [15] and Auxiliary particle ﬁltering, and somespecialized MCMC-move and information-move.To avoid degenerate distributions, we introduce noises to thestate-space model (4) from section II-D by setting the noises ǫ t and ξ t to be i.i.d. Gaussian, ( x t +1 = g ( x t ) + ǫ t , x ∼ µ ( · ) ,z t = Hx t + ξ t . A. Sequential Monte Carlo sampling

The SMC methods, or particle ﬁlters, are a set of sequentialimportance sampling algorithms that approximates the highdimensional distribution p ( x t | z t ) by its empirical distri-bution from weighted samples { x ( s )1: t , w ( s ) t } s ∈{ ,...,S } : b p ( x t | z t ) := 1 P Ss =1 w ( s ) t S X s =1 w ( s ) t δ x ( s )1: t ( x ) , where δ is Dirac delta mass. The samples { x ( s )1: t } are drawnfrom an importance distribution q ( x t | z t ) and the weightsare computed from w ( x t | z t ) = p ( x t | z t ) q ( x t | z t ) . (7)The key idea of SMC is to generate the weighted samplessequentially from a recursive importance density, q ( x t | z t ) = q ( x ) t Y k =2 q ( x k | x k − , z k ) , (8)which is constructed based on the recursive representation ofthe posterior distribution: p ( x t | z t ) = p ( x t − | z t − ) p ( x t | x t − ) p ( z t | x t ) p ( z t | z t − ) , (9)That is, at time t , conditional on previous samples { x ( s )1: t − , w ( s ) t − } s ∈{ ,...,S } , one generates weighted samples { x ( s ) t } from importance densities { q ( x t | x ( s )1: t − , z t ) } andcompute their weights by w ( s ) t = w ( s ) t − · p ( z t | x ( s ) t ) · p ( x ( s ) t | x ( s ) t − ) q ( x ( s ) t | x ( s )1: t − , z t ) . (10)Clearly, the above weight w ( s ) t is proportional to the ana-lytical weight w ( x ( s )1: t | z t ) since p ( x ( s )1: t | z t ) ∝ p ( x ( s )1: t − | z t − ) · p ( x ( s ) t | x ( s ) t − ) p ( z t | x ( s ) t ) and q ( x ( s )1: t | z t − ) = q ( x ( s )1: t − | z t − ) · q ( x ( s ) t | x ( s )1: t − , z t ) .Due to the recursive computation in (10), all but a few ofthe weights will be almost zero as t increases, and this is called sample degeneracy [12]. As a result, the variance ofour estimation { x ( s ) t } may increase exponentially with t (seee.g. [24]). Resampling techniques are widely used to reducesample degeneracy: it replaces samples with relatively lowweights by samples with relatively high weights. A simpleresampling technique is to deﬁne effective sample size (ESS)[25] by ESS t = ( P Si =1 w ( i ) t ) / ( P Si =1 ( w ( i ) t ) ) . If at timet, ESS t reaches below a threshold (typically S ), then oneresamples. In our study, we use the resample algorithm in[26], i.e. Sample u ∼ U [0 , s ] and deﬁne a set of real number { U j := u + j − S } j =1 ,...,S . Then count the number of theset { U j | P i ′ − i =1 w ( i ) t P si =1 w ( i ) t ≤ U j ≤ P i ′ i =1 w ( i ) t P si =1 w ( i ) t } as the number of’children’ of sample x ( i ′ ) .The essential of SMC methods is the design of importancedensities, so that all samples have (almost) equal weightsin each recursive step while staying on the trajectories withhigh likelihood. The algorithm based on a simple choiceof q ( x t | x t − , z t ) = p ( x t | x t − ) , often referredas sequential importance sampling with resampling (SIR),performs poorly (see section section IV-C). Inspired by theideas of implicit sampling (see Section III-C) and Auxiliaryparticle ﬁltering, we propose to construct Gaussian importancedensities by a combination of them (see Section III-C). Torejuvenate the samples, we will also introduce MCMC-moveand information-move algorithms, which will be discussed inSection III-D and III-E, respectively. B. Optimal one-step importance sampling

The one-step optimal importance density, which is the one-step posterior density that combines the prior from the statemodel and the likelihood of observation, is q opt ( x t | x t − , z t ) = p ( z t | x t ) · p ( x t | x t − ) p ( z t | x t − ) . (11)It is optimal in the sense that it leads to minimum variancein the incremental weights in (7). In general, it is difﬁcultto draw samples from the optimal importance density andimplicit sampling [15], [27] can efﬁciently draw samples inthe high probability region. Thanks to the simple Gaussianstructure of the noises ǫ t and ξ t in the state-space model (4),the optimal importance density is Gaussian and can be sampleddirectly. We can follow the procedure implicit sampling [15] tocompute the mean and variance of the Gaussian posterior: themean is the maximum a posteriori (MAP), and the variance isthe Hessian of the negative logarithm of the posterior. Morespeciﬁcally, we compute the minimizer and Hessian of thenegative log function of p ( z t | x ) p ( x | x t − ) : F ( x ) = ( z t − Hx ) ξ ,t + ( x − g ( x t − )) ǫ ,t . This function is quadratic and its minimizer, denoted by x ∗ t = x ∗ ( x t − , z t ) = [ Hx ∗ t , Gx ∗ t ] , is:  Hx ∗ t = ǫ ,t z t + ξ ,t Hg ( x t − ) ǫ ,t + ξ ,t = Hg ( x t − ) + α t ( z t − Hg ( x t − )) ,Gx ∗ t = Gg ( x t − ) , (12)where α t = ( ξ ,t /ǫ ,t ) ( ξ ,t /ǫ ,t ) + 1 depends on the ratio betweenstate-noise and observation-noise. The Hessian matrix of F ( x ) is Hess( F ) i,j =  ξ ,t + 1 ǫ ,t , i = j ∈ { , ..., N } , ǫ ,t , i = j ∈ { N + 1 , ..., N } , , otherwise , (13)Thus, the optimal importance density is Gaussian and canbe sampled directly: q ( x t | x t − , z t ) ∼ N ( x ∗ t , Hess( F ) − ) (14)with x ∗ t in (12) and with Hess( F ) in (13).The center x ∗ t of the optimal importance density can beviewed as an effort of nudging solutions to the observation z t . More generally, one can replace α t ( z t − Hg ( x t − )) by amore general construction of the nudging term [20], [28]: λ t M t ( z t − Hg ( x t − )) , (15)where λ t ∈ R represents the nudging strength, and M t ∈ R dN × dN is called nudging matrix. Clearly, x ∗ t can be put into this frame with a nudging matrix: M t =[ I dN × dN ; dN × dN ] and λ t = α t , optimal in the sense ofmaximizing the one-step posterior.Though optimal for one-step sampling, the above impor-tance density comes with drawbacks: the mean of the unob-served variables, Gx ∗ t = Gg ( x t − ) , is simply a projectionof the forward equation from the previous state, without usingany information from the current observation z t . This can alsobe seen from the nudging matrix in (15), in which a block dN × dN does not provide any updates to the unobservedvariables. Empirically, one may seek a nudging matrix that up-dates the unobserved variables to achieve better performance.Since the next observation z t +1 is a function of the currentunobserved variables Gx t , it is natural to update Gx t usingthe information in z t +1 as in auxiliary particle ﬁlter [23]. C. Auxiliary sampling with two observations

The auxiliary particle ﬁlter is an SMC algorithm that makesuse of the information from the next observation. To keepthe recursive form as in (9), we need to consider targetdensities p ( x t | z t +1 ) instead of p ( x t | z t ) , and write itrecursively as p ( x t | z t +1 ) ∝ p ( x t − | z t ) × p ( x t | x t − ) p ( z t | x t ) p ( z t +1 | x t ) p ( z t | x t − ) , Since the analytical expression of p ( z t +1 | x t ) is unknown,we approximate it by p ( z t +1 | x t ) ≈ p ( z t +1 | g ( x t )) andobtain: b p ( x t | z t +1 ) ∝ b p ( x t − | z t ) × p ( x t | x t − ) p ( z t | x t ) p ( z t +1 | g ( x t )) p ( z t | g ( x t − )) . (16)With an importance density q ( x t | x t − , z t : t +1 ) dependingon z t +1 , the recursively updating weight becomes w ( x t | z t +1 ) = w ( x t − | z t ) α ( x t − t , z t : t :1 ) , where the associ-ated incremental weight is given by: α ( x t − t , z t : t :1 ) = p ( x t | x t − ) p ( z t | x t ) p ( z t +1 | g ( x t )) p ( z t | g ( x t − )) q ( x t | x t − , z t : t +1 ) . (17)Next, we construct the importance density q ( x t | x t − , z t : t +1 ) and draw samples from it. We start from thenegative log function of the posterior distribution p ( z t +1 | g ( x ) p ( x | x t − ) p ( z t | x ) : b F ( x ) = | z t +1 − Hg ( x ) | ξ ,t + | x − g ( x t − ) | ǫ ,t + | z t − Hx | ξ ,t . Since the state variable is high-dimensional and its compo-nents being indistinguishable agents, it is difﬁcult and com-putationally costly to ﬁnd the minimizer of e F , who is likelyto have multi-modes. This rules out a direct application ofimplicit sampling. However, by a linear approximation of thenonlinear function g ( x t ) , we can directly construct a Gaussianimportance density q ( x t | x t − , z t : t +1 ) as the previous section.We linearize Hg ( x ) at x ∗ t since it is the most likely positionbefore the next observation: Hg ( x ) ≈ Hg ( x ∗ t ) + ∇ Hg ( x ∗ t ) T ( x − x ∗ t ) , where ∇ Hg ( x ∗ t ) ∈ R dN × dN is the gradient of Hg . Inpractice, when the interaction function φ is piecewise constant,the approximation of ▽ Hg is computed in follows: ▽ Hg ( x ) ≈ R IH ( x ) + L H ( x ) ∈ R dN × dN , (18)where the block matrices R IH ( · ) ∈ R dN × dN and L H ( · ) ∈ R dN × dN are composed by submatrices R IIi,j ( · ) and L Ii,j ( · ) ∈ R d × d , respectively: R IHi,j ( x ) = 1 N φ ( || x i − H j x || ) I d , L Hi,j ( x ) =  − N N X k =1 φ ( || x k − x i || ) I d , if ≤ j = i ≤ N , × I d , otherwise . Then, b F ( x ) can be approximated by a quadratic function: e F ( x ) = (cid:12)(cid:12) z t +1 − Hg ( x ∗ t ) − ∇ Hg ( x ∗ t ) T ( x − x ∗ t ) (cid:12)(cid:12) ξ ,t + | x − g ( x t − ) | ǫ ,t + | z t − Hx | ξ ,t = 12 y T Ay − y T b + C with y = x − x ∗ t , C = | z t +1 − Hg ( x ∗ t ) | ξ ,t + | g ( x t − ) − x ∗ t | ǫ ,t + | z t − Hx ∗ t | ξ ,t , and  A = ∇ Hg ( x ∗ t ) ∇ Hg ( x ∗ t ) T ξ ,t + I N ǫ ,t + H T Hξ ,t ,b = ∇ Hg ( x ∗ t ) T [ z t +1 − Hg ( x ∗ t )] ξ ,t + g ( x t − ) − x ∗ t ǫ ,t + H T [ z t − Hx ∗ t ] ξ ,t . (19)Then, e F has a minimizer µ ( x t − , z t , z t +1 ) given by: µ ( x t − , z t , z t +1 ) = x ∗ t + A − b (20)and its Hessian is A . This suggests the following importancedensity q ( x t | x t − , z t , z t +1 ) : q ( x t | x t − , z t , z t +1 ) ∼ N ( µ ( x t − , z t , z t +1 ) , A − ) , (21)where µ ( x t − , z t , z t +1 ) is deﬁned by (20) and A = A ( x t − , z t ) is deﬁned by (19).We summarize the above in the following algorithm: Algorithm 1

Auxiliary implicit samplingAt time t ≤ T − , for s = 1 , , ..., S , do: • Evaluate x ∗ t = x ∗ ( x ( s ) t − , z t ) as in (12), and then compute A ( x ( s ) t − , z t ) = A as in (19) and µ ( x ( s ) t − , z t , z t +1 ) as in(20). • Draw a sample x ( s ) t from a normal distribution withmean µ ( x ( s ) t − , z t , z t +1 , ) and covariance A ( x ( s ) t − , z t ) − ;evaluate weights b w ( s ) t = w ( s ) t − · α ( s ) t as in (17). • Resample to obtain equally-weighted samples (if a crite-rion is met).At time t = T , for s = 1 , , ..., S , do Implicit sampling: • Evaluate x ∗ t = x ∗ ( x ( s ) t − , z t ) as in (12). • Draw a sample x ( s ) t from N ( x ∗ t , Hess ( F ) − ) as in (14);evaluate weights b w ( s ) t by (10). D. MCMC-move

To further reduce the inevitable degeneracy of SMC algo-rithms, we introduce an MCMC-move step [29]. We considertwo types of moves: a directional move aiming for agent-wiseposition improvement, and a local trajectory move aiming toreplace a low weight short-trajectory by a higher weighted oneThe directional move randomly selects m of the unob-served agents for each sample, and resample each of themusing a Metropolis-Hastings step as follows. For each selectedagent k , ﬁrst draw a sample from N ( x k ∗ , Σ k ∗ ) , where x k ∗ is aminimizer of the function e g ( x kt | z t +1 , x k − ,k +1: Nt ) = k z t +1 − Hg ( x t ) k , and Σ k ∗ is the Hessian of the function e g at x k ∗ , that is, x k ∗ = arg min x ∈ R d e g ( x ); Σ k ∗ = Hess e g ( x k ∗ ) . (22) Then, accept the sample if it leads to a higher likelihood forobservation z t . The number m is chosen as ⌊ αN ⌋ (we pick α = 0 . ). We summarize the directional move in Algorithm2. Algorithm 2

Directional moveAt time t ∈ checking-time ⊂ { , ..., T } , for each sample, do: • Randomly select m = ⌊ . N ⌋ of the unobserved agents. • Move the selected agents: for k = 1 , ..., N , if the agentis among those selected, sample e x kt ∼ N ( x k ∗ , Σ k ∗ ) , where x k ∗ and Σ k ∗ are deﬁned in (22); else, set e x kt = x kt . • Accept the move and set x ′ t = [ x t − , e x t ] if u ∼U [0 , ≤ min (cid:26) , p ( z t | e x t ) p ( z t | x t ) (cid:27) ; otherwise, reject the moveand set x ′ t = x t The local trajectory move randomly selects low-weightedsamples and replaces their local trajectories by thosewith a higher probability. More precisely, at a prescribedtime, a sample with index s is selected with probability max (cid:26) , − w ( s ) t c t (cid:27) , where c t is the value of the lowestquartile of the weights { w ( s ) t } Ss =1 . Intuitively speaking, allsamples with weight higher than c t will be kept and a samplewith weight less than c t will be selected randomly, accordingto a probability that increases when its weight decreases. Onceselected, its local trajectory x ( s ) t − T : t is moved to e x ( s ) t − T : t as inAlgorithm 3. Algorithm 3

Local-trajectory moveAt time t ∈ checking-time ⊂ { , ..., T } , with samples { x ( s )1: t , w ( s ) t } Ss =1 and the weights { w ( s ) t − T } Ss =1 , do: • Select low-weighted samples: for s ∈ { , ..., S } , set anindicator Θ ( s ) t = 1 with probability max (cid:26) , − w ( s ) t c t (cid:27) ,where c t is the value of the lowest quartile of the weights { w ( s ) t } Ss =1 ; • Move the low-weighted samples: for s ∈ { , ..., S } , if Θ ( s ) t = 1 , replace the local trajectory x ( s ) t − T : t by asfollows: – Draw a sample e x ( s ) t − T from the samples { x ( s ) t − T , w ( s ) t − T } Ss =1 ; – Implement a directional move for e x ( s ) t − T as in Al-gorithm 2, in which, draw new positions for m ofthe unobserved agents from the initial distribution,instead of drawing samples from N ( x k ∗ , Σ k ∗ )) ; – Draw e x ( s ) t − T : t by a one-sample SMC algorithm withimportance density function in (21) from t − T to t with initial value e x ( s ) t − T ; – Accept the move and set x ( s ) t − T : t = e x ( s ) t − T : t , w ( s ) t = c t if u ∼ U [0 , ≤ min (cid:26) , p ( z t | e x t ) p ( z t | x t ) (cid:27) ; otherwise,reject the move and keep them as original. E. Rejection of non-physical samples

To avoid non-physical samples, we introduce an informa-tion move step, rejecting non-physical samples. We say asample is non-physical if it violates the basic properties ofthe opinion dynamics. For example, recall the following con-traction of radius property of opinion dynamics [3, Proposition2.1]: for any constant c ∈ R d , we have: max i k x it − c k ≤ max i k x it ′ − c k , ∀ t ≥ t ′ . This property requires information of all agents, and it cannot be directly applied to our partial observations. Since themean position of all agents does not change in time, we saya sample at time t is non-physical if max i k x it − x t − t k > max i k x it − t − x t − t k + α, ∀ t ≥ t ′ . (23)where t and α > are constants (in practice, t = 10 and α = 0 . × supp ( φ ) ), representing the time length we back-track for checking and the tolerance for extending the maximaldistance, respectively.We also reject agents that do not interact with any of theobserved agents. Such agents may be connected to the ob-served agents through other unobserved agents (recall that x it and x jt are connected if there exists a path of agents { x i k t } Kk =0 with i = i and i K = j such that φ ( k x i k t − x i k +1 t k ) > ),but their positions are difﬁcult to estimate from the limitedinformation. It is of interest to replace them by agents thatinteract with the observations: if they evolve to disconnectfrom the observed agents, their clustering can not be estimatedfrom the observations, thus we can view them as “non-physical” (or with little information); otherwise, they willmove toward the center of the cluster, and the replacementwill accelerate their move.In addition, to avoid over-correction and to maintain com-putational efﬁciency, we apply the rejection-moves at a pre-speciﬁed time steps that performed progressively less fre-quently as observation increases.The information move algorithm is summarized in Algo-rithm 4: Algorithm 4

Information moveFor t ∈ checking-time ⊂ { , ..., T } and for each sample, do: • For j = 1 , ..., N , check if x t is non-physical as in (23)or if it has agents disconnected from the observed agents.If yes, draw a sample e x t from the importance density in(21) and repeat until the sample is physical and connectedwith the observed agents. • If u ∼ U [0 , ≤ min (cid:26) , p ( z t | e x t ) p ( z t | x t ) (cid:27) , set x t = e x t ; else,repeat from the previous step until accepted. F. Summary

We combine all the above sampling techniques into Algo-rithm 5, which we refer it as auxiliary implicit sampling (AIS).

Algorithm 5

Auxiliary implicit sampling with MCMC moves(AIS) At time t = 1 , initialization: draw uniform-weighted samples { x ( s )1 , w ( s )0 } from µ ( x ) . For time t ≥ , do:(a) Draw weighted sample { x ( s )1: t , w ( s ) t } by the Auxiliaryimplicit Sampling Algorithm 1.(c) Improve the samples by two MCMC moves: the Direc-tional Move Alorithm 2 and the Local Trajectory MoveAlgorithm 3.(d) Reject non-physical samples by the Information MoveAlgorithm 4.IV. N UMERICAL EXPERIMENTS

In this section, we predict the clustering of the opiniondynamics using partial observations, following the Bayesianapproach discussed in Section II, using the auxiliary implicitsampling (AIS) algorithm introduced in Section III. We ﬁrstdescribe the settings of the model and the sampling methodin Section IV-A. Then, we present results on state estimationin Section IV-B. We report the prediction of clustering in atypical simulation in Section IV-C and in many simulations inSection IV-D.

A. Numerical settings

We consider the opinion dynamics (1) with N = 60 agents, x it +1 − x it = 1 N N X j =1 φ ( k x jt − x it k )( x jt − x it )∆ t where x it ∈ R d with d = 2 represents the opinion of theagent i at discrete times indexed by t . This system is an Eulerapproximation of the corresponding differential equations withtime step size ∆ t = 0 . . The communication function φ is apiecewise-constant function: φ ( r ) =  , r ∈ [0 , √ / , . , r ∈ [ √ / , , , r ∈ [1 , ∞ ) . We are interested in the cases when the system formulatesmultiple clusters, instead of a consensus. For the above com-munication function, we obtain cases of multiple clusters byselecting the initial conditions as follows: we randomly drawinitial condition for each agent from µ ∼ Unif([ − , d ) , inother words, the initial distribution of ( x , . . . , x N ) is µ ⊗ N on R dN , and reject those leading to consensus. We will callthe empirical distribution of these selected initial conditionsas initial distribution of the opinion dynamics. This initialdistribution injects randomness into the dynamics.Our goal is to predict the clustering of the system, par-ticularly the sizes and the locations of the largest clusters,supposing that we only observe the trajectories of N of the N agents for a relatively short time, far before the clustersare formulated. In particular, we assume that we observe thesystem for only n = 300 time steps, when the observationscan not tell if the clusters have formulated. The clustering (a) The 1st coordinate of two unobserved agents(b) Trajectories of all agents: truth and a sampleFig. 2: State estimation (noiseless observations): Estimation of the trajectoryof agents for system without noise, observing N = 30 of the N = 60 agents. (2a) shows the paths of the ﬁrst coordinate of two unobserved agentsfor all the S = 100 samples (blue dots). At each time, the blue curve is thesmoothed empirical density of the samples (sample density), representing themarginal posterior. For each agent, samples become concentrated around thetruth (the red dash line) as time increases, with the marginal posterior peaksnear the truth. (2b) shows the trajectories of all agents, where the blue andgreen dots are the observed and unobserved truth, and the red diamonds arethe unobserved agents estimate by a sample; all with color changing from lightto dark as time increases. The estimated trajectories of unobserved agents bythe sample can be far away from the truth, particularly at the initial time, butthe clustering of the sample is close to the truth. usually takes more than 30 time units, or equivalently 600time steps. (For instance, the system described by ﬁgure (4)clustered at about 1500 time steps.) As discussed in SectionII, a Bayesian approach provides a probabilistic frameworkfor state estimation and cluster prediction, with uncertaintiesquantiﬁed by the posterior. We sample the posterior by theauxiliary implicit sampling (AIS) algorithm in Algorithm 5with ensemble size S = 100 . For the sake of the AIS, werewrite the system in the form of a state-space model: ( x t +1 = g ( x t ) + ǫ t , x i ∼ µ, ∀ i,z t = Hx t + ξ t , where ǫ t ∼ N (0 , α I dN ); ξ t ∼ N (0 , β I dN ) .We consider systems with both noiseless and noisy obser-vations. To avoid degenerate distributions, we set an artiﬁcialnoise for the deterministic state model. For the case ofnoiseless observations, we set α = 10 − and β = − .For noisy observations with β = 0 . (which represents asignal-to-noise ratio about 1% -2%, ), we set α = 0 . . B. State estimation

As a Bayesian approach, our goal of state estimation is torepresent the posterior of the states, which is approximated bythe empirical measure of the samples in our sequential MonteCarlo algorithm. We demonstrate the state estimation by themarginal posteriors of the trajectories of the ﬁrst coordinate (a) The 1st coordinate of two unobserved agents(b) Trajectories of all agents: truth and a sampleFig. 3: State estimation (noisy observations): Estimation of the trajectoryof agents when observing N = 30 of the N = 60 agents with additiveGaussian noise. In (3a), the samples (blue dots) of the 1st coordinate becomeconcentrated around the truth (the red dash line) as time increases, with themulti-mode sample density (blue line) peaks near the truth. (3b) shows thetrajectories of all agents, where the blue and green dots are the observed andunobserved truth, and the red diamonds are the unobserved agents estimatedby a sample; all with color changing from light to dark as time increases. Theestimated trajectories of unobserved agents by the sample can be far awayfrom the truth, particularly at the initial time, but the clustering of the sampleis close to the truth. of two unobserved agents. We also show the trajectories of allagents, comparing the estimated path of unobserved agents ina sample with the truth. a) Noiseless observations: Consider ﬁrst the case whenthe system is deterministic and half of the N = 60 agentsare observed without noise. Figure (2a) shows the trajectoriesof all the S = 100 samples for the ﬁrst coordinate of twounobserved agents, along with the smoothed sample densityat each time, representing the marginal posterior. For eachagent, samples become concentrated around the truth (the reddash line) as time increases, with the marginal posterior peaksnear the truth. Such a concentration of the sample agreeswith the intuition that the uncertainty in the posterior of thestates decreases when more observations are available, sincethe system is deterministic and the randomness comes onlyfrom the initial condition. The marginal posterior has multiplemodes, reﬂecting the symmetry between agents and the non-identiﬁability of the states as discussed in Section II-C.Figure (2b) shows the trajectories of all agents, comparingthe estimated paths of unobserved agents estimated by asample with the truth. The estimated trajectories by the samplecan be far away from the truth, particularly at the initial time,but the clustering of the sample is close to the truth. b) Noisy observations: We also consider the case whenhalf of the N = 60 agents are observed with additive Gaussiannoise N (0 , β I dN ) . Similarly, Figure (3a) shows the trajecto-ries of all the S = 100 samples for the ﬁrst coordinate of twounobserved agents, along with the smoothed empirical sample (a) Posterior of cluster center: the largest (left) and 2nd largest (right) cluster(b) Posterior of cluster size: the largest (left) and 2nd largest (right) cluster Cluster index C l u s t e r s i z e ( nu m be r o f agen t s ) TrueAISSIRIS (c) Sizes of all clusters (estimated by sample mean)Fig. 4: Clustering prediction (noiseless observations): prediction of centersand sizes of clusters when observing N = 30 of the N = 60 agents for ashort time. AIS algorithm outperforms SIR and IS at presenting concentratingaround the truth posteriors of the centers and sizes for the leading clusters,and at providing accurate estimation of sizes for all clusters by sample mean. density at each time, representing the marginal posterior. Foreach agent, the sample density is more wide-spread and hasmore modes than those in Figure 2 for the deterministicsystem, indicating more uncertainty due to the noises inthe system and in the observation. But all samples becomeconcentrated around the truth eventually as time increases,with the marginal posterior peaks near the truth.Figure (3b) shows the trajectories of all agents, comparingthe estimated paths of unobserved agents estimated by asample with the truth. Again, the estimated trajectories by thesample can be far away from the truth, particularly at the initialtime, but the clustering of the sample is close to the truth.In summary, in both noiseless and noisy observations, themarginal posteriors of the state can be multi-mode, presentinga large uncertainty; the trajectories of the samples can be farfrom the truth, but the clustering pattern of the samples isclose to the truth. C. Clustering prediction: a typical simulation

In this and the next section, we consider the predictionof clusters from partial observations. We exhibit the clusterprediction of a typical simulation in this section, and we reportthe performance in many simulations in the next section.Recall that in a typical clustering prediction, we characterizethe clustering by the posteriors of the sizes and centers of the (a) Posterior of cluster center: the largest (left) and 2nd largest (right) cluster(b) Posterior of cluster size: the largest (left) and 2nd largest (right) cluster

Cluster index C l u s t e r s i z e ( nu m be r o f agen t s ) TrueAISSIRIS (c) Sizes of all clusters (estimated by sample mean)Fig. 5: Clustering prediction (noisy observations): prediction of centers andsizes of clusters when observing N = 30 of the N = 60 agents for a shorttime with additive Gaussian noise. AIS algorithm performs slightly better thanIS and clearly outperforms SIR, at presenting posteriors of the centers andsizes for the leading clusters, and at providing accurate estimation of sizesfor all clusters by sample mean. clusters, particularly the leading clusters. We compare our AISalgorithm with two other SMC algorithms: SIR and implicitsampling (denoted by IS). Since the degeneracy of samples inSIR is too severe for any meaningful prediction, we reducethe degeneracy by inﬂating its weights to keep more samplesin each step. For the prediction of the sizes, we also compareAIS with the predictions simply based on connected agentsfrom observation x T only, since in practice one may treat theobservation as a random sample of the population. a) Noiseless observations.: Figure (4a)–(4b) show theempirical posteriors of the centers and sizes for the largestcluster and the second largest cluster (deﬁned in Eq.(5)).Consider ﬁrst the centers of the clusters in Figure (4a). Thetrue center of the largest cluster locates at the white bar in theleft plot. All samples from AIS are close to the true center(with a distance less than . , resulting in a bar overlappingwith the white bar of the true center). IS has about half samplesat the true center and the other half far away from the truecenter. Nearly all samples of SIR mispredicted the true center.Similar results can be seen in the right plot.Consider next the cluster sizes in Figure (4b). The largestcluster has 27 agents, and the second largest cluster has 18agents. The predicted sizes based on the observation x T only(denoted by “From Obs”) are both 22, not being able to identify the leading clusters. This suggests that the observation x T itself is not enough to make an accurate prediction of theclustering. Among the SMC algorithms, AIS leads to highlyconcentrated samples, at the true size for the largest clusterand near the true size for the second largest cluster. Implicitsampling (IS) leads to samples scattering around the truth. Thesamples of SIR scatter widely, tending to overestimate the sizeof the largest cluster and underestimate the size of the secondlargest cluster.Since the system is deterministic and the observations arenoiseless, the true posterior of concentrates around the truth.AIS outperforms SIR and IS at representing such concentratedposterior.Figure (4c) shows the sizes of all the clusters, estimated bysample mean as in (6). AIS accurately captures the sizes of allthe clusters except the smallest one, which is too small to bepredicted. IS performed relatively well in predicting the largestcluster, but misses the 2nd largest cluster. SIR identiﬁes thelargest cluster with a relatively large error (5 relative to 27),but it misses all the other clusters. b) Noisy observations: Figure 5 shows the predictionswhen the observations are noisy. Due to the observation noise,the posteriors of the centers and sizes would present a largeruncertainty than the case of noiseless observations. Figure (5a)shows the posteriors of centers for the two largest clusters. Forthe largest cluster, all three SMC algorithms yield samplesscatter close to the true centers, with samples of AIS and ISconcentrating at the true center more than those of SIR. Forthe second largest cluster, an unusual result appears: all theSMC algorithms lead to samples mostly missing the positionof the true center. The reason is that the second and the thirdlargest clusters have similar sizes (as shown in Figure (5c),the sizes are 16 and 13, respectively), causing difﬁculty indistinguishing them.Figure (5b) presents posteriors of the sizes for the twolargest clusters. The largest cluster has 26 agents, and thesecond largest cluster has 16 agents. AIS leads to samplesconcentrating at the true size for the largest cluster and nearthe true size for the second largest cluster. IS leads to samplesscattering slightly wider than AIS, but are still around the truth.The samples of SIR scatter widely, tending to overestimatethe size of the largest cluster and underestimate the size ofthe second largest cluster. The predicted sizes based on theobservation x T only (denoted by “From Obs”) are 28 and12, slightly overestimating the size of the largest cluster andunderestimating the size of the second largest cluster.Figure (5c) shows the sizes of all the clusters, estimated bysample mean as in (6). Only AIS accurately captures the sizeof the largest cluster, IS slightly overestimates the size, andSIR has an estimation that is too large. All three algorithmsare able to lead to similar sizes for the second and the thirdclusters, with AIS being the closest to the truth. Note that allthe SMC algorithms predict an untrue ﬁfth cluster, but AIShas the smallest error.In summary, for predicting centers and sizes of clusters fromeither noiseless or noisy observations, the performance of AISalgorithm is much better than that of SIR, which is usually not satisfactory, and is better than that of IS, which is oftenreasonably good. D. Clustering prediction: success rates in many simulations

We further investigate the clustering prediction by AuxiliaryImplicit Sampling (AIS) in 100 independent simulations. Weconsider three cases: observing , , and of the agents.We assess the performance by studying the success rate inpredicting the centers for the largest two clusters, and thedistribution of errors in size estimation. a) Assessment of the prediction performance: Recall thatwe estimate the centers and sizes of clusters by their posteriormeans. More precisely, the center x C i and size |C i | of cluster C i are estimated by their sample means c x C i and c |C i | as deﬁnedby (6). We denote by C and C the largest and the secondlargest clusters.For each simulation, we say the center of the largest cluster C is predicted successfully if there exists an estimated clusterwith a size in [ |C | − K, |C | + K ] and with a center suchthat dist( x C , c x C j ) < L . Here K and L are the levels of errortolerance. More speciﬁcally, we deﬁne an indicator functionfor a successful prediction of C by Ω =  , if dist (cid:16) x C , c x C j (cid:17) ≤ L for some j such that |C | − K < d |C j | < |C | + K ;0 , otherwise , (24)In following simulations, we pick L = 0 . (in general L should depend on the communication function φ , recall thatour φ is supported in [0 , ) and the range of the value K ∈ { , , } . Similarly, we deﬁne a successful predictionfor the center of the second largest cluster and its indicatorfunction Ω .We access the prediction of the sizes of the largest twoclusters by the distribution of the absolute error: e i = (cid:12)(cid:12)(cid:12) |C i | − c |C i | (cid:12)(cid:12)(cid:12) , for i = 1 , . (25)The error e i should be close to zero in a successful prediction.A heavy tail in the distribution of e i would indicate that it isdifﬁcult to predict the cluster size accurately. b) Noiseless observations: Figure 6 illustrates the per-formance of prediction of the largest two clusters in 100 inde-pendent simulations. We consider three observation ratios: , , or , that is, observing , and of the N = 60 agentsin the system. The distributions of errors in the estimation ofcluster sizes are shown in Figure (6a)-(6b), and the successrate in predicting the centers are shown in Figure (6c)-(6d).The prediction of cluster sizes depends on the observationratio. When observing or of all agents, the majoritysimulations (more than 70%) can predict the cluster sizes withan error within 4. But when the observation ratio is , manysimulations have large errors (larger than 4 for more than50% of the simulations), suggesting that the observations donot provide enough information for accurate prediction of thecluster sizes.The prediction of cluster centers exhibits a high success rate,regardless of the observation ratio. Figure (6c)-(6d) show that Error for size of the largest cluster pe r c en t age ( % ) (a) Absolute error of predicted sizeof the largest cluster Error for size of the second largest cluster pe r c en t age ( % ) (b) Absolute error of predicted sizeof the second largest cluster K, tolerance of cluster size s u cc e ss r a t e ( % ) (c) Center of the largest cluster K, tolerance of cluster size s u cc e ss r a t e ( % ) (d) Center of the 2nd largest clusterFig. 6: Cluster prediction in 100 simulations (noiseless case): observing , , or of the N = 60 agents in the system. In (6a)-(6b), the majoritysimulations (more than 70%) can predict the cluster sizes reasonably, holdingan error within 4, when the observation ratio is ether or ; but whenthe observation ratio is , many simulations have large errors. Figure (6c)-(6d) show that the centers of the leading clusters can be located with highprobability ( - for the largest cluster and - for the secondlargest cluster) even only observing of all agents. The success rate dependslittle on the tolerance level K . In short, the observation ratio affects theprediction of cluster sizes, but not the cluster centers. the centers of the leading clusters can be located with highprobability - for the largest cluster and - forthe second largest cluster, and that the successes rate dropsslightly when the observation ration decreases from to .Also, the success rate depends little on the tolerance level K . c) Noisy observations: When the trajectories are ob-served with additive Gaussian noise, similar to the case ofnoiseless observations, the prediction for sizes is more sensi-tive to observation ratio than the prediction for centers. Due tothe additional uncertainty from the observation noise, the errorfor size estimation is larger than the noiseless case. In (7a)-(7b), when the observation ratio is or , about of thesimulations hold an error size less than 6; when the observationratio is , about of the simulations have errors largerthan . In particular, the size of the second largest cluster ispredicted more accurately than the largest cluster, indicatingthat the observation noise is mostly absorbed in the predictionof the leading cluster.The observation noise also slightly reduces the success ratein the prediction of cluster centers. In (7c)-(7d), the centersare predicted with a probability (around and for thelargest and the second largest clusters, respectively), regardlessof the observation ratio.In summary, for either noiseless or noisy observations, thecluster center can be predicted with a high success rate,regardless of the observation ratio. The cluster size, on theother hand, has a larger uncertainty that is sensitive to theobservation noise or the ratio of observation. Error for size of the largest cluster pe r c en t age ( % ) (a) Absolute error of predicted sizeof the largest cluster Error for size of the second largest cluster pe r c en t age ( % ) (b) Absolute error of predicted sizeof the second largest cluster K, tolerance of cluster size s u cc e ss r a t e ( % ) (c) Center of the largest cluster K, tolerance of cluster size s u cc e ss r a t e ( % ) (d) Center of the 2nd largest clusterFig. 7: Cluster prediction in 100 simulations (noisy observations), observing , , or of the N = 60 agents in the system with additive Gaussiannoise. Similar to the case of noiseless observations: the prediction for sizesis more sensitive to observation ratio than the prediction of centers. In (7a)-(7b), the error for size estimation is relatively large: when the observationratio is or , about of the simulations hold an error size less than6; when the observation ratio is , many simulations have large errors. In(7c)-(7d), the centers are be predicted with a probability (around and for the largest and the second largest clusters, respectively), regardlessof the observation ratio. V. D

ISCUSSION AND CONCLUSION

We presented a Bayesian formulation for clustering pre-diction of opinion dynamics from partial observations, char-acterizing the prediction by the posterior of the clusters’sizes and centers. To overcome the challenge in samplingthe high-dimensional posterior with multiple local maxima,we introduced an auxiliary implicit sampling (AIS) algorithmusing two-step observations, which is a sequential Monte Carlomethod that combines the ideas from auxiliary particle ﬁlters[23] and implicit particle ﬁlters [15]. In both cases of noiselessand noisy observations, the AIS algorithm leads to accuratepredictions of the sizes and centers for the leading clusters.We found that uncertainty increases when the ratio of theobserved population decreases. Also, the cluster center canbe predicted with a high success rate, but the cluster size, onthe other hand, has a more considerable uncertainty that issensitive to the observation noise or the ratio of observation.There are three directions for future research. First, improvethe information in observation by a random selection ofagents to observed at each time. The observations in thisstudy are trajectories of a ﬁxed set of agents, which mayyield little information about other clusters when these agentsconcentrate in one cluster. Random selection of agents mayavoid such an information loss by providing an unbiasedsampling of all the agents’ opinions. Second, improve thescalability of the proposed techniques when the number ofagents is large. One may describe the concentration densityof the opinions by the mean-ﬁeld equation [30] and apply sequential Monte Carlo methods for the mean-ﬁeld equation[22] or use reduced models [31], [32] to achieve efﬁciency.Third, learn the communication function [9], [10] along withnetwork topology [33], as well as clustering prediction, frompartial observations.A CKNOWLEDGMENT

The authors would like to thank Mauro Maggioni and SuiTang for inspiring discussions. The authors are grateful forsupports from NSF-1913243, NSF-1821211, MARCC, andJohns Hopkins University.R

EFERENCES[1] U. Krause, “A discrete nonlinear and non-autonomous model of consen-sus formation,”

Communications in difference equations , vol. 2000, pp.227–236, 2000.[2] T. Vicsek and A. Zafeiris, “Collective motion,”

Physics Reports , vol.517, pp. 71 – 140, 2012.[3] S. Motsch and E. Tadmor, “Heterophilious dynamics enhances consen-sus,”

SIAM review , vol. 56, no. 4, pp. 577–621, 2014.[4] J. Shao, S. Havlin, and H. E. Stanley, “Dynamic Opinion Model andInvasion Percolation,”

Phys. Rev. Lett. , vol. 103, no. 1, p. 018701, 2009.[5] Q. Li, L. A. Braunstein, H. Wang, J. Shao, H. E. Stanley, and S. Havlin,“Non-consensus Opinion Models on Complex Networks,”

J Stat Phys ,vol. 151, no. 1-2, pp. 92–112, 2013.[6] A. V. Proskurnikov, A. S. Matveev, and M. Cao, “Opinion Dynamicsin Social Networks With Hostile Camps: Consensus vs. Polarization,”

IEEE Trans. Automat. Contr. , vol. 61, no. 6, pp. 1524–1536, 2016.[7] A. Lancichinetti and S. Fortunato, “Consensus clustering in complexnetworks,”

Sci Rep , vol. 2, no. 1, p. 336, 2012.[8] M. Bongini, M. Fornasier, M. Hansen, and M. Maggioni, “Inferring in-teraction rules from observations of evolutive systems I: The variationalapproach,”

Math. Models Methods Appl. Sci. , vol. 27, no. 05, pp. 909–951, 2017.[9] F. Lu, M. Zhong, S. Tang, and M. Maggioni, “Nonparametric inferenceof interaction laws in systems of agents from trajectory data,”

Proc NatlAcad Sci USA , vol. 116, no. 29, pp. 14 424–14 433, 2019.[10] F. Lu, M. Maggioni, and S. Tang, “Learning interaction kernels inheterogeneous systems of agents from multiple trajectories,” arXivpreprint arXiv:1910.04832 , 2019.[11] M. Tucsnak and G. Weiss,

Observation and control for operatorsemigroups . Springer Science & Business Media, 2009.[12] A. Doucet and A. M. Johansen, “A tutorial on particle ﬁltering andsmoothing: Fifteen years later,”

Handbook of nonlinear ﬁltering , vol. 12,no. 656-704, p. 3, 2009.[13] W. R. Gilks and C. Berzuini, “Following a moving target Monte Carloinference for dynamic Bayesian models,”

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , vol. 63, no. 1, pp. 127–146,2001.[14] C. Berzuini and W. Gilks, “Resample-move ﬁltering with cross-modeljumps,” in

Sequential Monte Carlo Methods in Practice . Springer,2001, pp. 117–138.[15] A. J. Chorin and X. Tu, “Implicit sampling for particle ﬁlters,”

Proc.Natl. Acad. Sci. USA , vol. 106, no. 41, pp. 17 249–17 254, 2009.[16] E. Lunasin and E. S. Titi, “Finite determining parameters feedbackcontrol for distributed nonlinear dissipative systems - a computationalstudy,”

ArXiv150603709 Math , 2017.[17] C. Foias, C. F. Mondaini, and E. S. Titi, “A discrete data assimilationscheme for the solutions of the 2D Navier-Stokes equations and theirstatistics,”

ArXiv160205995 Math , 2016.[18] X. Liu, W. Yu, J. Cao, and S. Chen, “Discontinuous lyapunov approachto state estimation and ﬁltering of jumped systems with sampled-data,”

Neural Networks , vol. 68, pp. 12–22, 2015.[19] H. Nijmeijer, “A dynamical control view on synchronization,”

PhysicaD: Nonlinear Phenomena , vol. 154, no. 3-4, pp. 219–228, 2001.[20] D. Auroux and J. Blum, “Back and forth nudging algorithm fordata assimilation problems,”

Comptes Rendus Mathematique , vol. 340,no. 12, pp. 873–878, 2005.[21] M. Zhu, P. J. Van Leeuwen, and J. Amezcua, “Implicit equal-weightsparticle ﬁlter,”

Quarterly Journal of the Royal Meteorological Society ,vol. 142, no. 698, pp. 1904–1919, 2016. [22] F. Lu, N. Weitzel, and A. Monahan, “Joint state-parameter estimationof a nonlinear spde model from sparse noisy data,” preprint , 2019.[23] M. K. Pitt and N. Shephard, “Filtering via simulation: Auxiliary particleﬁlters,”

Journal of the American statistical association , vol. 94, no. 446,pp. 590–599, 1999.[24] A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations andbayesian missing data problems,”

Journal of the American statisticalassociation , vol. 89, no. 425, pp. 278–288, 1994.[25] J. S. Liu,

Monte Carlo strategies in scientiﬁc computing . SpringerScience & Business Media, 2008.[26] G. Kitagawa, “Monte carlo ﬁlter and smoother for non-gaussian non-linear state space models,”

Journal of computational and graphicalstatistics , vol. 5, no. 1, pp. 1–25, 1996.[27] M. Morzfeld, X. Tu, J. Wilkening, and A. Chorin, “Parameter estimationby implicit sampling,”

Communications in Applied Mathematics andComputational Science , vol. 10, no. 2, pp. 205–225, 2015.[28] C. Paniconi, M. Marrocu, M. Putti, and M. Verbunt, “Newtonian nudgingfor a richards equation-based distributed hydrological model,”

Advancesin Water Resources , vol. 26, no. 2, pp. 161–178, 2003.[29] C. Berzuini, N. G. Best, W. R. Gilks, and C. Larizza, “Dynamic con-ditional independence models and markov chain monte carlo methods,”

Journal of the American Statistical Association , vol. 92, no. 440, pp.1403–1412, 1997.[30] P.-E. Jabin and ,CSCAMM and Dpt of Mathematics, University ofMaryland, College Park, MD 20742, “A review of the mean ﬁeld limitsfor Vlasov equations,”

Kinet. Relat. Models , vol. 7, no. 4, pp. 661–711,2014.[31] A. J. Chorin, F. Lu, R. M. Miller, M. Morzfeld, and X. Tu, “Sampling,feasibility, and priors in data assimilation,”

Discrete Contin. Dyn. Syst.A , vol. 36, no. 8, pp. 4227–4246, 2016.[32] F. Lu, X. Tu, and A. J. Chorin, “Accounting for model error from unre-solved scales in ensemble kalman ﬁlters by stochastic parameterization,”

Mon. Wea. Rev. , vol. 145, no. 9, pp. 3709–3723, 2017.[33] M. Coutino, E. Isuﬁ, T. Maehara, and G. Leus, “State-Space NetworkTopology Identiﬁcation From Partial Observations,”