A Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation
NNoname manuscript No. (will be inserted by the editor)
A Fully Bayesian Infinite Generative Model forDynamic Texture Segmentation
Sahar YousefiM. T. Manzuri ShalmaniAntoni B. Chan
Received: date / Accepted: date
Abstract
Generative dynamic texture models (GDTMs) are widely used fordynamic texture (DT) segmentation in the video sequences. GDTMs representDTs as a set of linear dynamical systems (LDSs). A major limitation of thesemodels concerns the automatic selection of a proper number of DTs. Dirichletprocess mixture (DPM) models which have appeared recently as the corner-stone of the non-parametric Bayesian statistics, is an optimistic candidate to-ward resolving this issue. Under this motivation to resolve the aforementioneddrawback, we propose a novel non-parametric fully Bayesian approach for DTsegmentation, formulated on the basis of a joint DPM and GDTM construc-tion. This interaction causes the algorithm to overcome the problem of auto-matic segmentation properly. We derive the Variational Bayesian Expectation-Maximization (VBEM) inference for the proposed model. Moreover, in the E-step of inference, we apply Rauch-Tung-Striebel smoother (RTSS) algorithmon Variational Bayesian LDSs. Ultimately, experiments on different video se-quences are performed. Experiment results indicate that the proposed algo-rithm outperforms the previous methods in efficiency and accuracy noticeably.
Keywords
Dynamic Texture Segmentation · Generative Dynamic Texturemodel · Nonparametric models · Dirichlet process · Variational Bayesianapproximation
Sahar YousefiSharif University of Technology, Tehran, IranE-mail: syousefi@ce.sharif.eduM. T. Manzuri ShalmaniSharif University of Technology, Tehran, IranE-mail: [email protected] B. ChanCity University of Hong Kong, Hong KongE-mail: [email protected] a r X i v : . [ c s . G R ] J a n Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan
Dynamic Texture (DT) segmentation has received considerable attention overthe past decade [1]-[2] . DTs are composed of ensembles of particles subjectto their stochastic motion. In fact, DTs represent important characteristics ofvideo sequences which have stationary properties in appearance, i.e. spatialdomain, and motion, i.e. time domain. From the appearance point of view,textures can be divided into: structural and stochastic. The former is mostlyartificial and periodic and can be described by Texton, the putative units ofpre-attentive human texture perception [3], such as Tartan, Chessboard, etc.On the contrary, the latter is natural and quasi-periodic such as ocean waves,grass field, etc. In the literature, DTs from the perspective of their particlesare divided into three categories: (i) microscopic, such as plumes of smoke,water flow, (ii) macroscopic, such as blowing leaves in wind, (iii) objects suchas crowd of people, vehicles in traffic jams [4]. Since DT Segmentation relies onboth appearance and motion changes, effective texture segmentation in videosis one of the most complicated issues in studies of video processing. There is agreat deal of efforts ongoing for this purpose [5], [6]-[7]. Although [8] mentionsfive categories of approaches for DTs recognition, the DTs segmentation ap-proaches can be grouped into one of three categories: motion-based methods[5], [9], [10], spatiotemporal feature based methods [11], [12], [13], [14] andgenerative dynamic texture models (GDTMs) [4], [15], [16].The motion-based methods are one of the most commonly used approachesfor DT segmentation. In these approaches, DTs are represented by estimatinga sequence of motion patterns [13], where optical flows are frequently used forthis purpose. Amiaz et al. presented an optical flow based method in orderto segment videos into static and dynamic texture regions [10]. This approachuses a variant of the brightness constancy assumption. After that, they ex-panded their work by using gradient constancy and color constancy assump-tions for the goal [17]. These approaches detect dynamic regions but do notconsider any difference between co-occurring DTs. Vidal and Ravichandranproposed a three-steps motion-based method for DT modeling and segmen-tation [9]. While good results can be achieved in these approaches, accuratemotion analysis itself is a challenging task due to the difficulties raised byaperture problem, occlusion, and video noise. Moreover, the optical flow meth-ods are based on the well-known brightness constancy assumption [18]. Hence,any variation in the lighting within the scene violates the brightness constancyconstraint.Gon¸calves and Bruno proposed a memory-based approach using partiallyself-avoiding walks on three orthogonal planes [13], in which the segmentationis performed by clustering appearance and motion features. In this approach,the features extraction process is illumination-sensitive, memory-consumingand suffers from heavy computation burden.Chen et al. proposed an approach based on local descriptors and opticalflows [5],[12]. In this method, for considering appearance mode, spatial tex-ture descriptor, i.e. local binary pattern (LBP) and Weber local descriptor
Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 3 (WLD), and for motion mode optical flow and local temporal texture descrip-tor are used. The LBP [14] is a gray-scale invariant texture descriptor whichdescribes the texture with coding the Textons into binary patterns. Chen etal. proposed volume based LBP, in which for each voxel, a binary code is pro-duced by thresholding its neighborhood with the value of the center voxel.Using the extracted features, the method performs the segmentation by hier-archical splitting, agglomerative merging and pixel-wise classification. Hence,the segmentation results exhibit obvious jaggedness of the boundaries. Fur-thermore, LBPs are sensitive to noise and illumination variations. Since LBPdescribes the Textons, it assumes that the textures are structural and cannotachieve a proper result for stochastic textures with the quasi-periodic com-ponents, such as ocean waves. Moreover, since different dynamic textures arecomposed of varying sizes of Textons, finding the best neighbor cube size is achallenging issue. On the other hand, as the neighbor cube gets much larger,more binary patterns are needed.In the literature, GDTMs are introduced as a convenient mean for DTanalysis. Figure 1 categorizes some of the recent GDT models for different ap-plications. There are a wide variety of works which consider GDTMs for DTsegmentation [15], [7], [16], [19], [20]. The GDTM poses DTs as a stochasticvisual process over time and space. In these approaches, DTs are modeled bylinear dynamical systems (LDSs) [16]. Demonstrating a wide variety of com-plex patterns of motion and spatiotemporal model, GDTMs are able to reachan admissible solution which can model complex senses [4], [20]. The GDTMsuffers from the restrictive assumption that the video sequences do not containco-occurring DTs. For addressing this limitation, various extensions of GDTMshave been proposed [1], [15]. Dynamic texture mixture (DTM) [4] and layeredDynamic Texture (LDT) [15] are two of these efforts. The DTM supposes thata collection of spatiotemporal video patches are modeled as samples from a setof underlying dynamic textures. As DTs are globally homogenous and locallyinhomogeneous, patch-based approaches lead to poor results. Moreover, theDTM, like all clustering models, is not a global generative model for video ofco-occurring textures [21].On the contrary, LDT is a global generative modelwhich supposes videos are modeled as a superposition of DTs. In this method,in order to estimate the parameters of the model, Expectation-Maximization(EM) algorithm for maximum-likelihood is used. Moreover, Gibbs sampler isused for inference. The DTM uses an initial partition and LDT is initializedby the results of the DTM segmentation.To our knowledge, despite the promising results achieved by the recentmethods, all of them need an important prerequisite to segment the DTs:determination of the optimal number of DTs in video sequences. As the differ-ent frames of video sequences may contain a different number of DTs, usingexpert knowledge for deriving region segments may be an important restric-tion for systematic approaches. In this paper, a non-parametric fully Bayesiangenerative approach based on Dirichlet process (DP) will be introduced foraddressing this limitation. DP is a distribution over distributions which com-monly used as a prior on the parameters of the mixture model with countably
Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan2000 2005 2010 2015Recognition[22], Synthesis[23] 2001GDT modeling[24] 2002Synthesis & Recognition[16], [25] 2003Segmentation[9], Classification[20] 2005Registration[4] 2011Segmentation[26] 2010Segmentation[4] 2008MDTM[15],LDTM[15],Variational LDT[21] 2009Registration[27] 2016Segmentation[7] 2014
Fig. 1: History for Generative Dynamic Texture Modelsinfinite components. In other words, DP is an infinite mixture of distributionwith given parametric distribution [28]. Although the model is defined for in-finite mixtures, the inference is tractable because parameters of a finite set ofmixtures are needed to be determined. For this purpose, Monte Carlo tech-nique [29] or Variational Bayesian approximation [30] can be used, where thelatter outperforms the former in speed.The idea of using DPs to define mixture models with an infinite numberof components for image segmentation, infinite hidden Markov random field(iHMRF), has been previously explored in [31]. Due to the motion mode inDTs, the iHMRF is inadequate for DT segmentation. Beal et al. proposeda model known as the infinite hidden Markov model (iHMM), in which thenumber of hidden states of the hidden Markov model is allowed to be countablyinfinite [32]. Due to the continuity of the hidden states in LDSs, iHMM is notproper for DT segmentation. Beal et al. also, proposed Variational KalmanSmoother for one layer LDSs [33].Under this motivation, we introduce a novelnon-parametric generative Bayesian model for DT segmentation. CombiningGDTM and DP allows us to introduce a novel model that has the ability to setthe number of DTs automatically. In our approach, the prior probabilities ofthe model are jointly affected by DP and GDTM with countably infinite DTs.For inference, we use Variational Bayesian Expectation Maximization (VBEM)approximation which is facilitated by means of mean-field approximation. InVBE step, 1st-order and 2nd-order expected values of the hidden states areneeded. For this purpose, we perform Rauch-Tung-Striebel smoother (RTSS)on Variational Bayesian LDSs.
Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 5 x x x x T y y y y T . . .. . . Fig. 2: Graphical model of generative dynamic texture modelThe two main contributions of this paper are: (i) a novel non-parametricformulation of GDTM based on DP for unsupervised co-occurring DT seg-mentation which resolves the problem of determining the proper number ofDTs. (ii) a fully Bayesian generative model which resolves the sensitivity ofthe GDTM to the initialization of parameters.This paper is organized as follows: in Section 2, a brief overview of gen-erative dynamic texture model and Dirichlet process is provided. In section3 the proposed infinite generative dynamic texture model is introduced andan elegant truncated Variational Bayesian inference algorithm for the modelis derived. Section 4 describes implementation of the proposed method com-prehensively. The evaluation of the efficiency of the proposed model throughthree types of aforementioned dynamic textures (i.e. microscopic, macroscopicand objects) in comparison with the previous works is presented in section 5.Finally, the conclusion is obtained in section 6. x t ∈ (cid:60) N , and a sequenceof observations, y t ∈ (cid:60) M , as below (cid:26) x t = Ax t − + ζ t y t = Cx t + ξ t , t ∈ [1 , T ] , (1)in which A ∈ (cid:60) N × N is the transition matrix, C ∈ (cid:60) M × N is the observationmatrix, T is the temporal length of the video, ζ t ∈ (cid:60) N and ξ t ∈ (cid:60) M arethe state noise and the observation noise respectively which are independentand identically distributed (i.i.d.) sequences drawn from a known distributionsuch as Gaussian. The initial state is distributed from a normal distributionas x ∼ N ( x | δ, S − ). Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan { θ ∗ i } Li =1 , the random measures are drawn from a DP ( G , α ), where G is a base distribution and α is a positive scaling param-eter i.e. G | { G , α } ∼ DP ( G , α ) θ ∗ i | G ∼ G i = 1 , ..., L. (2)Let { θ ∗ j } Kj =1 be the set of distinct values taken by variables { θ ∗ i } L − i =1 . Denot-ing as f L − j the number of values in { θ ∗ i } L − i =1 that equal to θ j , the conditionaldistribution of θ ∗ L given { θ ∗ n } Π − n =1 has the form p (cid:0) θ ∗ L | { θ ∗ i } L − i =1 , G , α (cid:1) = αα + L − G + K (cid:88) j =1 f L − j α + L − δ θ j (3)in which δ θ j denotes the distribution concentrated at a single point θ j [34].The parameter α is a tradeoff between sampling a new parameter from thebase distribution and sharing a previously sampled parameter.Therefore α indicates a key role in determining the number of distinctparameters. As α gets larger and larger G converges to G . On the contrary,as α gets smaller and smaller, all the values { θ ∗ i } Li =1 tend to a single randomvariable.Drawing samples from DP can be regarded in terms of stick-breaking con-struction [35]. A stick breaking prior on the space has the form G = ∞ (cid:88) j =1 π j ( ν ) δ θ j , (4)where π j ( ν ) = ν j j − (cid:89) j (cid:48) =1 (1 − ν j (cid:48) ) , π j ( ν ) ∈ [0 , , (5)and ∞ (cid:88) j =1 π j ( ν ) = 1 , (6)in which { ν j } ∞ j =1 and { θ j } ∞ j =1 are the random variables drawn from ν j i.i.d. ∼ Beta (1 , α ) and θ j i.i.d. ∼ G respectively. Suppose y = { y i } Li =1 be the set of obser-vations which is modeled by DP. Each observation y i is assumed to be drawnfrom its relative conditional probability density function p (cid:0) y i | θ ∗ j (cid:1) which isparametrized by θ j . Introducing a discrete random variable z = { z i } Li =1 , inwhich z i = j denotes that y i is drawn from θ ∗ j , Dirichlet process mixturemodel with DP can be defined as Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 7 y i | z i = j ; θ j ∼ p ( y i | θ j ) ,z i | π ( ν ) ∼ M ult ( π ( ν )) ,ν j | α ∼ Beta (1 , α ) ,θ j | G ∼ G , (7)in which π ( ν ) = ( π j ( ν )) ∞ j =1 is given by (7), and M ult ( . ) denotes Multino-mial distribution. In this section, we propose a novel probabilistic model named infinite genera-tive dynamic texture model (IGDTM). In this model, it is supposed that thevideo sequences are composed of infinite number of DTs and each DT can bemodeled by a LDS. In other words, the model considers videos as a superpo-sition of the output of countable infinite disjoint GDTMs. In this approachthe fully Bayesian reasoning is used for reasoning which endeavors to estimateparameters of an underlying distribution based on the observed distribution.This requires us to specify prior distributions on parameters. For this purpose,the exponential families are used.The graphical model of the proposed method illustrated in figure 3. In thismodel, the j th DT is modeled by a separate LDS contains a set of hiddenstates, x ( j ) = { x ( j ) t | x ( j ) t ∈ (cid:60) N } Tt =1 , in which T is the temporal length ofthe video. Moreover, The model contains a set of observed variables, y = { y t | y t ∈ (cid:60) M , y t = { y it } Li =1 } Tt =1 (where y it determines the i th pixel, i ∈ [1 , L ], onthe t th frame, t ∈ [1 , T ]), and a lattice of sites, z = { z i } Li =1 , which representsa Markov random field. The linear dynamical equations of IGDTM are as (cid:26) x ( j ) t = A ( j ) x ( j ) t − + ζ t , j ∈ [1 , ∞ ) y it = C z i i x z i t + ξ z i t , t ∈ [1 , T ] , (8)where A ( j ) is the N × N state transition matrix where A ( j ) = a . . . a N ... . . . ... a N . . . a NN , (9)in which a nn (cid:48) iid ∼ N (0 , σ A ) , for n, n (cid:48) ∈ [1 , N ], and C ( j ) is the L × N observationmatrix where C z i = j = c . . . c N ... c L . . . c LN , (10)where for i − th row c ln iid ∼ N (0 , Σ C ).The initial state of the j th LDS is as x ( j )1 ∼ N ( x ( j )1 | δ j , S − j ), in which δ j ∈ (cid:60) N and S j ∈ ℵ N + are the mean vector and the precision matrix of the Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan x ( j )1 x ( j )2 x ( j )3 x ( j ) T y i y i y i y iT . . . δ j , S j Q j A ( j ) µ j , r j Zπ j αη , η w , λ , w , Ψ m , λ , w , Ψ w , Ψ σ A C ( j ) i Σ C η , η ∞ L Fig. 3: Graphical model of IGDTM j th initial state of the LDS respectively ( ℵ N + denotes all symmetric positivedefinite N × N matrices).Moreover, the state noise process of the j th LDS in the model is given by ζ t ∼ N ( ζ t | , Q − j ) , (11)where Q j ∈ ℵ N + is the precision matrix of the state noise. Furthermore, theobservation noise process of the j th LDS in the model is given by ξ z i t ∼ N ( ξ z i t | , r − j ) , (12)where r j is the precision of the observation noise.As IGDTM is the superposition of infinite disjoint DTs, understanding thevideos is required to understanding the DTs. The sequence of states in the j th LDS in IGDTM is a Gauss-Markov process which is defined as p ( x ( j ) ) = p ( x ( j )1 ) T (cid:89) t =2 p ( x ( j ) t | x ( j ) t − ) , (13)in which the distribution of each state is defined by Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 9 f ( t ) Fig. 4: Markov neighbourhood system x ( j ) t | x ( j ) t − , A ( j ) , Q j ∼ N ( x ( j ) t | A ( j ) x ( j ) t − , Q − j ) . (14)The distribution of the observations is defined as below: y it | x ( j ) t , z i = j, r j , µ j ∼ N ( y it | C ( j ) i x ( j ) t + µ j , r − j ) , (15)where µ j ∈ (cid:60) is the mean of the j th observations of the LDS.The distribution of the sites on the label field, z i , are defined by z i | ˆ z δ i , π ( ν ) ∼ p ( z i = j, | ˜ z δ i , β ) p ( z i = j | π ( ν )) , (16)while p ( z i = j | ˜ z δ i , β ) is defined as p ( z i = j | ˜ z δ i , β ) = − (cid:80) i ∈ c V c (˜ z ij | β ) (cid:80) Kj (cid:48) =1 exp( − (cid:80) i ∈ c V c (˜ z ij (cid:48) | β )) , (17)which is the approximate point wise probabilities of the { z i } Li =1 in order toimpose MRF. In equation (17), ˜ z ij ≡ ( z i = j ), β is the inverse temperature ofthe model, V c (˜ z ij | β ) are the clique potentials, c is the a member of the set ofthe clique included in the neighbourhood system, C , ˜ z δ i is the neighbour setof the cite z i , and V c (˜ z ij | β ) is comprised of the singleton potential function V i ( z i ) = ς , j = 1... ς K , j = K, (18)and the doubleton potential function ∀ z i (cid:48) ∈ ˜ z δ i : V i,i (cid:48) ( z i , z i (cid:48) ) = (cid:26) γ , z i = z i (cid:48) , z i , z i (cid:48) ∈ f t γ , z i (cid:54) = z i (cid:48) , z i , z i (cid:48) ∈ f t , (19)where ς , . . . , ς K , γ , γ are constants and f t means the t th frame. Figure 4illustrates neighbourhoods systems. In this figure, the the red sites indicate z i ,the green sides indicate z i (cid:48) ∈ f t .In this approach, we use the fully Bayesian reasoning in which a prior overall the hidden variables and unknown parameters, p ( Ψ ), is introduced andthe computation of posterior distribution given all the observations, y , andhyper-parameters, Ξ , (i.e. p ( Ψ | Ξ, y )) is interested. For simplicity, we use theconjugate priors. In order to perform Bayesian inference, we apply variationalapproximation which described in the next section. G in equation (4). It can be done by considering the truncated stick-breakingrepresentation. For this goal, the mixture proportion, π j ( ν ) , is supposed to bezero for j > K , where K is a fixed integer value [36].Bayesian inference technique introduces a set of appropriate prior distri-bution over the parameters of the model. For simplicity [16], in this paper, theconjugate exponential priors are defined. Hence, we impose Wishart distribu-tion over the covariance matrix of the state noise, Q j , as defined in equations(20). Q j | w , Ψ ∼ W ( Q j | w , Ψ ) , (20)in which w is the mean vectors and Ψ is the covariance matrix of Q j .Additionally, a joint one-dimensional Normal-Wishart distribution is im-posed as the prior of the mean and covariance matrix of the observations, µ j , r j , and the mean and covariance of the initial state, σ j , S j as defined inequations (21) and (22) respectively. µ j , r j | m , λ , w , Ψ ∼ N W ( µ j , r j | m , λ , w , Ψ ) , (21) δ j , S j | m , λ , w , Ψ ∼ N W N ( δ j , S j | m , λ , w , Ψ ) , (22)in which m , m are the mean vector, λ , λ , w , w are four real numbers, and Ψ , Ψ are the scale matrices.We use Gamma prior for the Scaling parameter in DP, as α | η , η ∼ Gam ( α | η , η ) , (23)where η , η are the rate parameters.Let Ψ = { x T , z i , ν j , α, δ j , S j , Q j , µ j , r j } | L,Ki =1 ,j =1 be the set of all hiddenvariables and unknown parameters and Ξ = { w , Ψ , m , λ , w , Ψ , m , λ , w ,Ψ , η , η , σ ( j ) A , Σ ( j ) C } be the set of all hyper-parameters of the imposed priors.The joint distribution p ( Ψ, y | Ξ ) is defined as: p ( Ψ, y | Ξ ) = T (cid:89) t =2 K (cid:89) j =1 P (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q j (cid:17) K (cid:89) j =1 P ( Q j | w , Ψ ) × L (cid:89) i =1 K (cid:89) j =1 T (cid:89) t =1 P (cid:16) y it | C ( j ) i x ( j ) t + µ j , z i = j, r j (cid:17) I ( z i = j ) K (cid:89) j =1 P ( µ j , r j | m , λ , w , Ψ ) × K (cid:89) j =1 P (cid:16) x ( j )1 | δ j , S j (cid:17) P ( δ j , S j | m , λ , w , Ψ ) × K (cid:89) j =1 P ( v j | α ) P ( A ( j ) | , σ A ) P ( C ( j ) | , Σ C ) P ( α | η , η ) P ( Z ) , (24) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 11 in which P (cid:16) C ( j ) | , Σ C (cid:17) = L (cid:89) i =1 N (cid:16) C ( j ) (cid:124) i | , Σ C (cid:17) , (25)and P ( Z ) = L (cid:89) i =1 K (cid:89) j =1 p ( z i = j | ˜ z δ i , β ) p ( z i = j | v ( π j )) , (26)where p ( z i = j | ˜ z δ i , β ) are the point-wise prior probabilities of the MRF,given by 17, and p ( z i = j | v ( π j )) are the prior probabilities of the modelstates stemming from the imposed DP, given by 5 and 7.Variational Bayesian inference is used for approximating intractable inte-grals arising in Bayesian inference and machine learning. In this method, theactual posterior over the set of all hidden variables and unknown parame-ters, i.e. p ( Ψ | Ξ, y ) , is approximated by a Variational distribution, knownas q Ψ ( Ψ ). Mean-field variational Bayesian is the most common type of Vari-ational Bayes, which uses the Kullback − Leibler divergence (KL − divergence)of p ( Ψ | Ξ, y ) from q ( Ψ ) as the dissimilarity function. Under this assumption,the log marginal likelihood of model yieldslog p ( y ) = (cid:96) ( q Ψ ) + KL( q Ψ || p ) , (27)in which (cid:96) ( q Ψ ) = (cid:90) q Ψ ( Ψ )log p ( Ψ | Ξ, y ) q Ψ ( Ψ ) dΨ, (28)where KL( . ) stands for KL-divergence. Since KL divergence is non-negative, (cid:96) ( q Ψ ) forms a strict lower bound of the log marginal likelihood and will beequal when KL( q Ψ || p ) = 0 or q Ψ ( Ψ ) = p ( Ψ | Ξ, y ). As log marginal likelihoodis constant by maximizing (cid:96) ( q Ψ ), KL( q Ψ || p ) will be minimized.As we defined the conjugate exponential priors, the Variational posterior q Ψ ( Ψ ) is expected to be the same distribution of p ( Ψ | Ξ, y )[37], therefore it isexpected that q Ψ ( Ψ ) factorized as below q ( Ψ ) = q x ( x T ) q z ( z ) K (cid:89) j =1 q ν ( ν j ) q α ( α ) K (cid:89) j =1 q δ,S ( δ j , S j ) × K (cid:89) j =1 q Q ( Q j ) K (cid:89) j =1 q µ,R ( µ j , r j ) q ( A ) q ( C ) , (29)where q x ( x T ) = T (cid:89) t =1 q ( x t ) , q z ( z ) = L (cid:89) i =1 q z ( z i ) , (30)and q ( A ) = K (cid:89) j =1 N (cid:89) n =1 q ( a ( j ) n ) , q ( C ) = K (cid:89) j =1 N (cid:89) i =1 q ( C ( j ) i ) (31) Table 1: Random variables and Parameters in IGDTM
Random variable/ Name Dimension Distributed fromParameter x ( j ) t States x ( j ) t ∈ (cid:60) N N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) µ j , r j Covariance matrix of r j ∈ (cid:60) NW ( µ j , r j | m , λ , w , Ψ )the state noise Q j Covariance matrix of Q j ∈ ℵ N + W N ( Q j | w , Ψ )the observation noisematrix of observations δ j , S j Mean and covariance δ j ∈ (cid:60) N , S j ∈ ℵ N + NW N ( δ j , S j | m , λ , w , Ψ )matrix of the initial state α Scaling parameter in DP α ∈ (cid:60) K Gam ( α | η , η ) ν j Parameter of DP ν j ∈ (0 , ∞ ] Beta ( ν | , α ) { z i } Li =1 Label field { z i } ∈ , ..., ∞ Mult ( z | π ( ν )) Making a full mean field assumption in which q x ( x T ) = (cid:81) Tt =1 q x ( x t ) losescrucial information about the hidden state chain needed for accurate inference.Thus, we employ RTSS algorithm which computes the expected statistics ofthe hidden states in time O ( T ).By replacing q ( Ψ ) in (27), we will have (cid:96) ( q Ψ ) = (cid:90) dΨ q x ( x T ) (cid:81) Li =1 q z ( z i ) (cid:81) Kj =1 q ν ( ν j ) q α ( α ) × (cid:81) Kj =1 q δ,S ( δ j , S j ) (cid:81) Kj =1 q Q ( Q j ) × (cid:81) Kj =1 q µ,r ( µ j , r j ) (cid:81) Kj =1 (cid:81) Nn =1 q ( a ( j ) n ) (cid:81) Kj =1 (cid:81) Ni =1 q ( c ( j ) i ) × ln P ( Ψ, y | Ξ ) − ln q x ( x T ) − (cid:80) Li =1 ln q z ( z i ) − (cid:80) Kj =1 ln q ν ( ν j ) − ln q α ( α ) − (cid:80) Kj =1 ln q δ,S ( δ j , S j ) − (cid:80) Kj =1 ln q Q ( Q j ) − (cid:80) Kj =1 ln q µ,r ( µ j , r j ) − (cid:80) Kj =1 (cid:80) Nn =1 q ( a ( j ) n ) − (cid:80) Kj =1 (cid:80) Ni =1 q ( C ( j ) i ) . (32)Variational Bayesian can be seen as an extension of the expectation maxi-mization (EM) algorithm which contains VB-E step and VB-M step and suc-cessively converges on optimum parameter values [30]. In M-Step, by maxi-mizing of (cid:96) ( q Ψ ) over each of the factor q Ψ ( Ψ ) in turn, holding the other fixed,the Variational posterior distribution q Ψ ( Ψ ) is derived. The following sectionwill describe the computation of the approximated Variational distributionsin M-step.Table 1 and 2 describe the random variables/parameters and hyperparam-eters of the proposed method in detail, respectively. Variational Bayesian M-step (VBM):
In this section, the VBM is described,(see Appendix A for derivations). According to the mean field VariationalBayesian, defining q ( z i = j (cid:48) ) = E [ I ( z i = j (cid:48) )] ; j (cid:48) ∈ [1 , K ] Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 13
Table 2: Hyper Parameters in IGDTM
Hyper Parameter Name Random Variable/ Properties/Parameter Dimension w Degree of freedom Q j w > N − Ψ Scale matrix Q j Ψ ∈ ℵ N + m Mean µ j m ∈ (cid:60) λ A real number µ j λ > w A real number r j w > Ψ r j Ψ ∈ (cid:60) m Mean δ j m ∈ (cid:60) N λ A real number δ j λ > w A real number S j w > N − Ψ Scale matrix S j Ψ ∈ ℵ N + η Shape parameter α η > η Rate parameter α η > σ ( j ) A variance A σ ( j ) A ∈ (cid:60) Σ ( j ) C covariance matrix C ( j ) (cid:124) i ; i ∈ [1 , L ] Σ ( j ) C ∈ ℵ N + The distribution of covariance matrix of the observation noise is approxi-mated by q Q ( Q j ) = W ( Q j | ˆ w , ˆ Ψ ), in whichˆ w = w + ( T −
1) (33)ˆ Ψ − = Ψ − + T (cid:88) t =2 E \ Q j (cid:104)(cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17) (cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17) (cid:124) (cid:105) (34)The distribution of Mean and covariance matrix of observations is approx-imated by q µ,r ( µ j , r j ) = N W (cid:16) ˆ m , ˆ λ , ˆ w , ˆ Ψ (cid:17) (35)in which, according to the below notations N (2) j = L (cid:88) i =1 q ( z i = j ) , (36)¯ y tj = 1 N (2) j L (cid:88) i =1 q ( z i = j ) y it , (37)we will have ˆ w = w + T N (2) j , (38)ˆ λ = λ + T N (2) j , (39)ˆ m = λ m + N (2) j (cid:80) Tt =1 ¯ y tj ˆ λ , (40) ˆ Ψ − = Ψ − − λ (cid:16) λ m m (cid:124) + (cid:80) Tt =1 ¯ y tj N (2) j m (cid:124) λ + m λ N (2) (cid:124) j (cid:80) Tt =1 ¯ y (cid:124) tj + (cid:80) Tt =1 ¯ x tj N (2) j N (2) (cid:124) j (cid:80) Tt =1 ¯ y (cid:124) tj (cid:17) + λ m m (cid:124) + (cid:80) Tt =1 (cid:80) Li =1 q ( z i = j ) y it y (cid:124) it − (cid:80) Tt =1 (cid:80) Li =1 q ( z i = j )ˆ µ ( j ) C i E [ x ( j ) t ] y (cid:124) it − (cid:80) Tt =1 (cid:80) Li =1 q ( z i = j ) y it E (cid:104) x ( j ) (cid:124) t (cid:105) ˆ µ ( j ) (cid:124) C i + (cid:80) Tt =1 (cid:80) Li =1 q ( z i = j ) (cid:80) Nn =1 (cid:80) Nn (cid:48) (cid:16) ˆ Σ ( j ) C inn (cid:48) + µ ( j ) C in µ ( j ) C in (cid:48) =1 (cid:17) E (cid:104) x ( j ) tn (cid:48) x ( j ) tn (cid:105) (41)The distribution of the mean and the covariance matrix of initial state isapproximated by q δ,S ( δ j , S j ) = N W ( ˆ m , ˆ λ , ˆ w , ˆ Ψ ) (42)in which, according to the below notations N (3) j = L (cid:88) i =1 q ( z i = j ) , (43)¯ x (3) j = 1 N (3) j L (cid:88) i =1 q ( z i = j ) E (cid:104) x ( j )1 (cid:105) , (44) ∆ (3) j = L (cid:88) i =1 q ( z i = j ) E (cid:104)(cid:16) x ( j )1 − ¯ x (3) j (cid:17) (cid:16) x ( j )1 − ¯ x (3) j (cid:17) (cid:124) (cid:105) (45)we will have ˆ w = w + N (3) j , (46)ˆ λ = λ + N (3) j , (47)ˆ m = λ m + N (3) j ¯ x (3) j ˆ λ , (48)ˆ Ψ − = Ψ − + ∆ (3) j + λ N (3) j ˆ λ ( m − ¯ x j ) ( m − ¯ x j ) (cid:124) . (49)The distribution of Dirichlet process scaling parameter is approximated by q α ( α ) = Gam ( α | ˆ η , ˆ η ) . (50)where ˆ η = η , (51)ˆ η = η − K − (cid:88) j =1 E \ α [ln (1 − π j )] . (52)The distribution of Dirichlet process parameter is described by q ν ( ν j ) = Beta (cid:16) ν j | ˆ β j, , ˆ β j, (cid:17) (53) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 15 where ˆ β j, = L (cid:88) i =1 q ( z i > j (cid:48) ) + 1 , (54)ˆ β j, = E [ α ] + L (cid:88) i =1 q ( z i = j ) . (55)The distribution of q ( a ( j ) ) is described by q ( A ( j ) ) = N (cid:89) n =1 N (cid:16) a ( j ) n | ˆ µ ( j ) A n , ˆ σ ( j ) A n (cid:17) (56)where ˆ σ ( j ) A n = (cid:32) ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E (cid:104) x ( j ) (cid:124) t − n x ( j ) t − n (cid:105) + σ ( j ) A q ( n, n ) (cid:33) (57)andˆ µ ( j ) A n = ˆ σ ( j ) − A (cid:32) N (cid:88) n (cid:48) =1 ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E (cid:104) x ( j ) (cid:124) t − n x ( j ) t n (cid:48) (cid:105) + N (cid:88) n (cid:48) =1 ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E (cid:104) x ( j ) (cid:124) t − n x ( j ) t − n (cid:48) (cid:105) a ( j ) n (cid:48) + N (cid:88) n (cid:48) =1 σ ( j ) A q ( n, n (cid:48) ) a ( j ) n (cid:48) (cid:33) (58)For q ( C ( j ) (cid:124) i ) q (cid:16) C ( j ) (cid:124) i (cid:17) = N (cid:16) C ( j ) (cid:124) i | ˆ µ ( j ) C i , ˆ Σ ( j ) C i (cid:17) (59)where ˆ Σ ( j ) C i = E (cid:104) x ( j ) t (cid:105) ( E [ r j ] y it − E [ r j µ j ]) (60)and ˆ µ ( j ) C i = ˆ Σ ( j ) − C i (cid:32) T (cid:88) t =1 q ( z i = j ) E (cid:104) x ( j ) t x ( j ) (cid:124) t (cid:105) E [ r j ] + Σ ( j ) C i (cid:33) (61) Finally, the distribution of label field is approximated by lnq z ( z i (cid:48) = j ) = ln p ( z i (cid:48) = j | ˜ z δ i , β ) + K (cid:88) j =1 q ( z i (cid:48) = j ) × E (cid:16) x ( j ) (cid:124) S j x ( j )1 − δ (cid:124) j S j x ( j )1 − x ( j ) (cid:124) S j δ j + δ (cid:124) j S j δ j (cid:17) + (cid:80) Tt =2 E (cid:32) x ( j ) (cid:124) t Q j x ( j ) t − x ( j ) (cid:124) t − A ( j ) (cid:124) Q j x ( j ) t − x ( j ) (cid:124) t Q j A ( j ) x ( j ) t − + x ( j ) (cid:124) t − A ( j ) (cid:124) Q j A ( j ) x ( j ) t − (cid:33) + (cid:80) Tt =1 E (cid:32) y (cid:124) i (cid:48) t R j y it − x ( j ) (cid:124) t C ( j ) (cid:124) R j y it − y i (cid:48) t R j C ( j ) x ( j ) t + x ( j ) (cid:124) t C ( j ) (cid:124) R j Cx ( j ) t (cid:33) + (cid:80) Tt =1 E (cid:0) y (cid:124) i (cid:48) t Σy i (cid:48) t − µ (cid:124) j Σ j y it − y (cid:124) i (cid:48) t Σ j µ j + µ (cid:124) j Σ j µ j (cid:1) − E (cid:0) ln | S j | (cid:1) − ( T − E (cid:0) ln | Q j | (cid:1) + T E (cid:0) ln ν j (cid:1) − T E (cid:0) ln | R j | (cid:1) − T E (cid:0) ln Σ j (cid:1) + T (cid:88) t =1 K (cid:88) j =1 q ( z i (cid:48) > j ) E (cid:0) ln(1 − ν j ) (cid:1) . (62)In each iteration of Variational Bayesian inference, after updating equations( ?? )-(62), the estimation of ˆ z i must be updated as the last step. For thispurpose we update ˆ z i by maximization of q z ( z i = j ) over j asˆ z i = arg max j =1: K q z ( z i = j ) . (63)In Variational Bayesian E-step, the expected values of the parameters arecomputed. The equations of this step will be explained in the next section. Variational Bayesian E-step (VBE):
In this step, the below sufficient statisticsare required E q Q [ Q j ] = ˆ w ˆ Ψ , (64) E q µ,r [ r j ] = ˆ w ˆ Ψ , (65) E q δ,S [ S j ] = ˆ w ˆ Ψ , (66) E q µ,r [ r j µ j ] = ˆ w ˆ Ψ ˆ m , (67) E q δ,S [ S j δ j ] = ˆ w ˆ Ψ ˆ m , (68) E q α [ln ( α )] = ψ (ˆ η ) − ψ (ˆ η ) , (69) E q π [ln(1 − π j )] = ψ (ˆ η ) − ψ (ˆ η + ˆ η ) , (70) E q Q [ln | Q j | ] = N ln 2 + ln (cid:12)(cid:12)(cid:12) ˆ Ψ (cid:12)(cid:12)(cid:12) + N (cid:88) n =1 ψ (cid:18) ˆ w − n + 12 (cid:19) , (71) E q µ,r [ln | r j | ] = ln 2 + ln (cid:12)(cid:12)(cid:12) ˆ Ψ (cid:12)(cid:12)(cid:12) + ψ (cid:18) ˆ w (cid:19) , (72) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 17 E q δ,S [ln | S j | ] = N ln 2 + ln (cid:12)(cid:12)(cid:12) ˆ Ψ (cid:12)(cid:12)(cid:12) + N (cid:88) n =1 ψ (cid:18) ˆ w − n + 12 (cid:19) , (73) E q ν [ln ν j ] = ψ (cid:16) ˆ β j, (cid:17) − ψ (cid:16) ˆ β j, + ˆ β j, (cid:17) , (74) E q ν [ln (1 − ν j )] = ψ (cid:16) ˆ β j, (cid:17) − ψ (cid:16) ˆ β j, + ˆ β j, (cid:17) , (75)where ψ ( . ) denotes the digamma function. Moreover, we need the expectedvalue of E (cid:104) x ( j ) t (cid:105) , (76) E (cid:104) x ( j ) t x ( j ) (cid:124) t (cid:105) , (77) E (cid:104) x ( j ) t x ( j ) (cid:124) t − (cid:105) . (78)For this purpose, we use RTSS algorithm on Variational Bayesian Lin-ear Dynamical system (VBLDS) model, which will be described in the nextsection.3.2 Rauch-Tung-Striebel smoother (RTSS) algorithmIn this section, RTSS algorithm for IGDTM is described. RTSS method con-tains two steps: Forward recursion and backward recursion. In forward recur-sion we define α t ( x ( j ) t ) to be as the posterior over the hidden states at time t given observed data up to and including time t , i.e.: α t ( x ( j ) t ) ≡ p ( x ( j ) t | ¯ y ( j )1: t ) , (79)in which, ¯ y ( j ) t (cid:48) = (cid:80) Li =1 q ( z i = j ) y it (cid:48) (cid:101) N ij ; t (cid:48) ∈ [1 , T ] , j ∈ [1 , K ] , (80)and (cid:101) N ij = L (cid:88) i =1 q ( z i = j ) . (81)It can be shown that α t ( x ( j ) t ) = N ( x ( j ) t | ˙ µ t , ˙ Σ − t ), in which ˙ µ t and ˙ Σ t are themean and covariance matrix respectively and are defined as˙ µ t = ˙ Σ − t (cid:32) (cid:101) N ij j (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i r j (¯ y ( j ) t − µ j ) + Q (cid:124) j A ( j ) ¨ Σ − (cid:124) t − ˙ Σ t − ˙ µ t − (cid:33) , (82) ˙ Σ t = (cid:32) Q j + 1 (cid:101) N ij j (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) R j (cid:101) N ij j (cid:88) i =1 q ( z i = j ) C ( j ) − Q (cid:124) j A ( j ) ¨ Σ − (cid:124) t − A ( j ) (cid:124) Q j (cid:17) , (83)in which − (cid:124) denotes the inverse transpose of matrices, and ¨ Σ t is defiend as¨ Σ t − = (cid:16) ˙ Σ t − + A ( j ) (cid:124) Q j A ( j ) (cid:17) (84)The derivations can be found in the Appendix B. In the backward recursionwe define β t − ( x ( j ) t − ) to be β t − ( x ( j ) t − ) = p (¯ y ( j ) t : T | x ( j ) t − ) . (85)In the Appendix B, It is shown that β t − ( x ( j ) t − ) = N ( x ( j ) t − | η t − , ψ − t − ), inwhich ψ t − = (cid:16) A ( j ) (cid:124) Q j A ( j ) − A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t Q j A ( j ) (cid:17) , (86) η t − = ψ − t − A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t (cid:32) ψ t η t + 1 (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i r j (¯ y ( j ) t − µ j ) (cid:33) , (87)where ψ (cid:48) t is defined as: ψ (cid:48) t = (cid:32) Q j + ψ t + 1 (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i r j (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) i (cid:33) . (88)According to the below notationsIt is easy to show that E (cid:104) x ( j ) t (cid:105) = ω t , (89) E (cid:104) x ( j ) t x ( j ) (cid:124) t (cid:105) = Γ − t,t + ω t ω (cid:124) t , (90) E (cid:104) x ( j ) t x ( j ) (cid:124) t − (cid:105) = A ( j ) ( Γ − t − ,t − + ω t − ω (cid:124) t − ) , (91)in which Γ t,t = (cid:16) ˙ Σ t + A ( j ) (cid:124) Q j A ( j ) + ψ t (cid:17) , (92) Γ t +1 ,t +1 = (cid:32) Q j + 1 (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i R j (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) i (cid:33) , (93) Γ t +1 ,t = (cid:16) Q j A ( j ) (cid:17) , (94) Γ t,t +1 = (cid:16) A ( j ) (cid:124) Q j (cid:17) , (95) X = (cid:0) Γ − t,t +1 Γ t,t − Γ − t +1 ,t +1 Γ t +1 ,t (cid:1) , (96) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 19 ω t = X − (cid:16) Γ − t,t +1 (cid:16) ˙ Σ t ˙ µ t + ψ t η t (cid:17) − Γ − t +1 ,t +1 (cid:32) (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i r j ¯ y ( j ) t +1 (cid:33)(cid:33) , (97) ω t +1 = X − (cid:16) Γ − t,t (cid:16) ˙ Σ t ˙ µ t + ψ t η t (cid:17) − Γ − t +1 ,t (cid:32) (cid:101) N ij L (cid:88) i =1 q ( z i = j ) C ( j ) (cid:124) i r j ¯ y ( j ) t +1 (cid:33)(cid:33) . (98)The derivations can be found in Appendix B. We explained the infinite generative dynamic texture model (IGDTM) in theprevious sections. In this section, considering the tables 1 and 2 and computedequations were described previously, we explain the steps of the proposedmethod. The process diagram of our method is represented in figure 5. Firstof all, the input video sequences are de-interlaced using an algorithm such asthe spatiotemporal median filter. Then, the IGDTM is applied on the featuresand the label filed is resulted as the output.The flowchart of the IGDTM, with respect to the derived equations inprevious sections, is demonstrated in figure 6. Firstly, the hyper-parametersare initialized randomly. Then, while the convergence condition is not satisfied,for all of the LDSs, the Bayesian variational EM algorithm, (which composedof VBE-step, RTSS algorithm, and VBM-step), is performed. The VBEM stepsare surrounded by the red rectangle. Finally, the label field is computed bythe steps are surrounded by the blue rectangle.Figure 7 illustrates the process of IGDTM segmentation for a frame videocontains four segments for the initial value of K = 7. Here the results of the IGDTM, the DTM [4] and the LDT [15] and SAW [13]are compared. We evaluated the proposed method on three types of textures:microscopic, macroscopic and objects. For this goal, we used UCSD Synthdb[4], Dyntex [38] and UCSD Pedestrian [4] datasets. UCSD Synthdb containssynthetic video sequences consist of microscopic and macroscopic textures suchas smoke, fire, sea water, vegetation, escalator etc. Dyntex is a comprehensivedatabase of dynamic textures providing a large and diverse database of high-quality dynamic textures, which have been deinterlaced with a spatiotemporal
Input: Video sequence DeinterlacingIGDTM SegmentationOutput: Lable field
Fig. 5: Process diagram of the IGDTMTable 3: Please write your table caption here
Dataset Resolution TexturesUCSD Synthdb 160 ×
110 Multiple co-occurring textures in a single videoDyntex 352 ×
288 Sea, Grass, Trees, Vegetation, Calm water,Fountains, Smoke, Traffic, etc.UCSD Pedestrian 238 ×
158 Pedestrians on UCSD walkways median filter. UCSD Pedestrian contains video of pedestrians on UCSD walk-ways, from two viewpoints taken with a stationary camera. Table 3 illustratesthe properties of the datasets in detail. As before mentioned, the DTM usesan initial contour and LDT uses the result DTM as the initial partition. Fig-ure 8 illustrates the initial contour of DTM for UCSD Synthdb segmentation.Moreover, the ground truth of the UCSD Synthdb for videos with 2 and 3segments are shown in figure 9.Figure 10 illustrates the results of IGDTM, DTM [4] and LDT [15] onthe UCSD Synthdb dataset for two dynamic textures. Rand index (r-index)of the segmentation results are given below the images. The dynamic texturenumbers, which are estimated by IGDTM is defined by “IGDTM Seg no”tag, are given below the IGDTM results. Figure 11 illustrates the results ofIGDTM, DTM and LDT on the UCSD Synthdb dataset for three dynamictextures. Similar to the previous figure, the Rand index of the segmentationresults are given below the images.
Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 21StartRead images and ex-tract observationsInitialize hyper pa-rameters randomlyConverged? j = 1Compute { q z ( z i = j ) } L,Ki =1 ,j =1 ˆ z i = arg max j =1: K q z ( z i = j ) j < K ?Variational BaysianExpectation for j th LDS; eq. (64)-(75)Forward-Backward algorithm for j th LDS1. Compute forward messages; eq. (79)-(84)2. Compute backward messages; eq. (85)-(88)3. Compute 1 th -order and 2 nd -order expectation of states; eq. (89)-(91)Variational BaysianMaximization for j th LDS; eq. ( ?? )-(62), j = j + 1End YesNo NoYesDone for all j ∈ [1 , K ] Fig. 6: Flowchart of the IGDTM x (7)1 x (7)2 x (7) T y (7)1 y (7)2 y (7) T . . .x (6)1 x (6)2 x (6) T y (6)1 y (6)2 y (6) T . . .x (5)1 x (5)2 x (5) T y (5)1 y (5)2 y (5) T . . .x (4)1 x (4)2 x (4) T y (4)1 y (4)2 y (4) T . . .x (3)1 x (3)2 x (3) T y (3)1 y (3)2 y (3) T . . .x (2)1 x (2)2 x (2) T y (2)1 y (2)2 y (2) T . . .x (1)1 x (1)2 x (1) T y (1)1 y (1)2 y (1) T . . . IGDTM ' LDSs initialization x (7)1 x (7)2 x (7) T y (7)1 y (7)2 y (7) T . . .x (6)1 x (6)2 x (6) T y (6)1 y (6)2 y (6) T . . .x (5)1 x (5)2 x (5) T y (5)1 y (5)2 y (5) T . . .x (4)1 x (4)2 x (4) T y (4)1 y (4)2 y (4) T . . .x (3)1 x (3)2 x (3) T y (3)1 y (3)2 y (3) T . . .x (2)1 x (2)2 x (2) T y (2)1 y (2)2 y (2) T . . .x (1)1 x (1)2 x (1) T y (1)1 y (1)2 y (1) T . . . IGDTM segmentation resultOriginal image IGDTM segmentation result
Fig. 7: IGDTM segmentation for K = 7, Red plate: The parameters of K LDSsand their corresponding label field is initialized randomly, Greet plate: afterperforming IGDTM the number of textures is determined automatically, Thenon-black label fields illustrate their corresponding LDSs which are consideredfor segmentation.
Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 23(a) (b)
Fig. 8: Initial contours of the DTM[4], (a) Initial contour of synthdb 2K, (b)Initial contour of synthdb 3K (a) (b)
Fig. 9: Ground truth of UCSD Synthdb dataset, (a) Ground truth of syn-thdb 2K, (b) Ground truth of synthdb 3KTable 4: Rand index comparison for some approaches
Method Synthdb 2K Synthdb 3KLDT[15] 0.942 0.921DTM[4] 0.892 0.841IGDTM
As mentioned before, the segmentation process in DTM and LDT is donesubjectively using expert knowledge which is a drastic limitation for system-atic database approaches. The IGDTM eliminates the mentioned restriction.Therefore, the method finds the optimum textures number automatically. Fig-ure 12 and 11 illustrates two results which IGDTM over segmented the videosequence. These videos are segmented by one more texture number.Table 4 shows the result of segmentation for IGDTM, LDT and DTM onSynthdb dataset quantitatively based on average R-index.Figure 14 illustrates the segmentation results of IGDTM in comparisonwith the proposed method in [17] and [13]. As the results show, despite theelimination of the initial contour and expert knowledge, IGDTM overcomes thesegmentation problem properly. Figure 13 illustrates the segmentation resultsof IGDTM on some of the video sequences of Dyntex dataset qualitatively.Ultimately, we present segmentation results of some object-based-dynamic-textures. Figure 16 shows the segmentation results of two object-based dy-namic textures from the Dyntex dataset. The IGDTM segmentation resultsis compared with the proposed method in [39]. As results show, the IGDTM
Fig. 10: Results on the UCSD synthdb database (synthdb 2K), left to right:a frame of the original video, the segmentation result of the DTM [4], thesegmentation result of the LDT [15], the segmentation result of the IGDTM;The r-index and IGDTM Seg no of the segmentation is given below the images.produces the segmentation results with smoother boundaries. Moreover, figure17 illustrates the segmentation results for some of the video sequences fromUCSD Pedestrian dataset in compared with the ground truth available in thedataset. We set the value of K to 20. Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 25r=0.6588 r=0.6820 r=0.7483IGDTM Seg no=3r=0.5951 r=0.6163 r=0.7893IGDTM Seg no=3r=0.6101 r=0.6272 r=0.7276IGDTM Seg no=3
Fig. 11: Results on the UCSD synthdb database (synthdb 3K), left to right: aframe of the original video, the segmentation result of the DytexMicIC [4], thesegmentation result of the LDT [15], the segmentation result of the IGDTM,The r-index and IGDTM Seg no of the segmentation is given below the images.
The aim of this paper was to propose a proper dynamic texture segmenta-tion approach which eliminates the expert knowledge about the number ofthe dynamic textures and initial partitioning in the previous methods. Forthis goal, we introduced a novel fully Bayesian non-parametric formulation ofgenerative dynamic texture models for robust unsupervised dynamic texturesegmentation. By deriving a suitable posterior distribution over the number oftextures, the proposed model resolved the problem of determining the propernumber of texture segmentation automatically. Because of a better perfor-mance of Variational Bayesian approximation methods compared with MonteCarlo techniques, we used the Variational Bayesian EM for inference. In or-der to compute the 1st order and 2nd order sufficient statistics of the hiddenstates, Variational Bayesian Rauch-Tung-Striebel smoother (RTSS) is applied.Finally, a comprehensive comparison of the proposed method with the stateof the art methods, for the three dynamic texture categories (i.e. microscopic,macroscopic and object based dynamic texture), was presented.
Fig. 12: Segmentation results for a video sequence of UCSD synthdb 2K, (a) avideo frame, segmentation with: (b) Ising initialized using DTM [1], (c) DTM,(d) LDT, (e) IGDTM; The r-index and IGDTM Seg no of the segmentationis given below the images. r=0.5373 r=0.6549r=0.6563 r=0.7672IGDTM Seg no=4
Fig. 13: Segmentation results for a video sequence of UCSD synthdb 3K, (a)a video frame of UCSD Synthdb 3K, segmentation with: (b) GPCA [9], (c)DTM, (d) LDT, (e) IGDTM; The r-index and IGDTM Seg no of the segmen-tation is given below the images.
Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 27IGDTM Seg no=2IGDTM Seg no=2IGDTM Seg no=2
Fig. 14: Segmentation result for Dyntex datasets, from left to right: a frameof the video, segmentation with: Proposed method in [17], SAW [13], IGDTM,DT ' s number estimated by IGDTM. A Variational Bayesian M-step
As mentioned before, the Variational posterior is expected to be the same distributed from p ( Ψ | Ξ, y ) [37]. Since we defined the conjugate exponential priors, the Variational posteriorwill be distributed from the exponential family. It is shown that the best distribution for q ∗ j , for each of the factors q j (in terms of the distribution minimizing the KL measurementas described before, equations (27),(28)) is expressed asln q ∗ Ψ ( Ψ j | y, Ξ ) = E \ j [ln p ( Ψ, Ξ, y )] + const, (99)where E \ j [ . ] is the expectation taken over all variables not in the partition [37]. In thisappendix, the distribution of q ∗ j , for each of factors are computed. A.1 Estimation of the distribution q ( z i (cid:48) )? According to what explained before, for estimating q ( z i (cid:48) ) we can write: P (cid:0) z i | x T , z i (cid:54) = i , π K , α, δ K , S K , Q K , µ K , Σ K , R K , Ξ (cid:1) = K (cid:89) j =1 N ( x ( j )1 | δ j , S − j ) I ( z i = j ) T (cid:89) t =2 N ( x ( j ) t | Ax ( j ) t , Q − j ) I ( z i = j ) × T (cid:89) t =1 K (cid:89) j =1 (cid:16) π j ( ν ) N (cid:16) y i (cid:48) t | C ( j ) x ( j ) t , R − j (cid:17) N (cid:16) y it | µ j , Σ − j (cid:17)(cid:17) I ( z = j ) (100)8 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. ChanIGDTM Seg no=2IGDTM Seg no=2IGDTM Seg no=3IGDTM Seg no=3IGDTM Seg no=4 Fig. 15: Segmentation results of IGDTM for Dyntex datasets, left to right: aframe of video sequence, segmentation result of IGDTM, DT ' s number esti-mated by IGDTM is shown below of the results It is easy to show that by replacing (100) into (99) and doing some calculations, thedistribution of q ( z i ) is obtained via (43). In which, by defining ¯ p i,j = E [ z i = j ], we canderive q ( z i > j ) = (cid:80) Kk = j +1 ¯ p i,k . A.2 Estimation of the distribution q ( ν j ) The derivation of q ( ν j ) is similar to what is seen before. We can write: Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 29IGDTM Seg no=2IGDTM Seg no=2 IGDTM Seg no=3 Fig. 16: Segmentation result for Dyntex datasets, from left to right: a frameof the video, segmentation with: the proposed method in [39], IGDTM, DT ' snumber estimated by IGDTM is shown below of the results P (cid:16) ν j | v \ j , x T , Z L , α, δ K , S K , Q K , µ K , r K , A K , C K , Ξ (cid:17) = L (cid:89) i =1 P ( z i = j | π j ( ν )) I ( z i = j ) P ( ν ( π j ) | α ) . (101)According to the equation (99), we can derive: q ν ( ν j ) ∝ exp (cid:32)(cid:32) E [ α ] + L (cid:88) i =1 q ( z i = j ) − (cid:33) ln ν j + L (cid:88) i =1 q ( z i > j ) ln (1 − ν j ) (cid:33) (102)It is easy to show that by simplifying (102) the distribution of q ( ν j ) will be Beta distri-bution and can be rewritten as (53)-(55). A.3 Estimation of the distribution q ( δ j , S j ) Similar to the previous sections, we can write P ( δ j , S j | x T , z L , π K , α, δ \ j , S \ j , Q K , µ K , r K , A K , C K ) = N (cid:16) x ( j )1 | δ j , S − j (cid:17) NW ( δ j , S j | m , λ , w , Ψ )(103)Therefore with respect to the equation (99), we can derive: q ( δ j , S j ) = exp E \ δ j ,S j (cid:16) ln N (cid:16) x ( j )1 | δ j , S − j (cid:17) + ln NW ( δ j , S j | m , λ , w , Ψ ) (cid:17) (104)According to what described before, the distribution of q (cid:0) δ j (cid:48) , S j (cid:48) (cid:1) will be Normal-Wishart. By simplifying equation (104), the distribution of q (cid:0) δ j (cid:48) , S j (cid:48) (cid:1) is obtained via(42)–(49).0 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. ChanIGDTM Seg no=2 IGDTM Seg no=3 IGDTM Seg no=2 IGDTM Seg no=2IGDTM Seg no=3 Fig. 17: Segmentation result for UCSD Pedestrian datasets, from top to down:a frame of the video, ground truth, segmentation with IGDTM, DT ' s numberestimated by IGDTM.A.4 Estimation of the distribution q ( µ j , r j ) Similar to the previous sections, we can derive: P (cid:16) µ j , r j | x T , z L , π K , α, δ K , S K , Q K , µ \ j , r \ j , A K , C K , Ξ (cid:17) = T (cid:89) t =1 L (cid:89) i =1 N (cid:16) y it | C ( j ) i x ( j ) t + µ j , r − j (cid:17) I ( z i = j ) NW ( µ j , r j | m , λ , w , Ψ ) . (105) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 31Using equations (99) and (105), the distribution of q ( µ j , r j ) can be written: q ( µ j , r j ) = exp E \ µ j ,R j (cid:34) T (cid:88) t =1 L (cid:88) i =1 I ( z i = j ) ln N (cid:16) y it | c ( j ) i x ( j ) t + µ j , r − j (cid:17) + ln NW ( Q j | m , λ , w , Ψ )] (106)By simplifying equation (106), the distribution of is derived as Wishart and is given by(36)–(40). A.5 Estimation of the distribution q ( Q j ) In order to compute the distribution of q ( Q j ) we can write: P (cid:16) Q j | x T , z L , π K , α, δ K , S K , Q \ j , µ K , r K , A K , C K , Ξ (cid:17) = T (cid:89) t =2 N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) W ( Q j | w , Ψ ) . (107)So, with respect to the equation (99) and simplifying the equation (107), the distributionof q ( Q j ) is from Wishart distribution q ( Q j ) = W (cid:16) Q j | ˆ w , ˆ Ψ (cid:17) in which the mean vectorand covariance matrix is given by ( ?? )-(33). A.6 Estimation of the distribution q ( α ) Similar to the previous sections, we have: P (cid:16) α | x T , z L , π K , δ K , S K , Q K , µ K , r K , A K , C K , Ξ (cid:17) = K − (cid:89) j =1 Beta ( π j | , α ) Gam ( α | η , η ) . (108)According to the equations ( ?? ), (74) and what described before, the distribution of q ( α ) can be written as below: q ( α ) ∝ exp − α (cid:32) η − K − (cid:88) j =1 E [ln (1 − π j )] (cid:33) + ( η −
1) ln α . (109)By simplifying the above equation, the distribution of q ( α ) is from Gamma and isobtained via (39)-(40). A.7 Estimation of the distribution q (cid:16) a ( j ) n (cid:17) Similar to the previous sections, we have: P ( a ( j ) n | x T , z L , δ K , S K , Q K , µ K , r K , a \ j \ n , C K , Ψ ) = T (cid:89) t =2 N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) N (cid:89) n =1 N (cid:16) a ( j ) n | , σ ( j ) A (cid:17) , (110)2 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chantherefore q (cid:16) a ( j ) n (cid:17) ∝ exp E \ a ( j ) n (cid:34) T (cid:88) t =2 ln N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) + N (cid:88) n =1 ln N (cid:16) a ( j ) n | , σ ( j ) A (cid:17)(cid:35) (111) q (cid:16) a ( j ) n (cid:17) ∝ exp E \ a ( j ) n [ N (cid:88) n =1 a ( j ) (cid:124) n (cid:32) ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E [ x ( j ) (cid:124) t − n x ( j ) t − n ] + σ ( j ) A q ( n, n ) (cid:33) a ( j ) n − N (cid:88) n =1 a ( j ) (cid:124) n (cid:32) N (cid:88) n (cid:48) ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E (cid:104) x ( j ) (cid:124) t − n x ( j ) t (cid:124) n (cid:105) a ( j ) n (cid:48) + N (cid:88) n (cid:48) ˆ w ˆ Ψ nn (cid:48) T (cid:88) t =2 E (cid:104) x ( j ) (cid:124) t − n x ( j ) t (cid:124) n (cid:105) a ( j ) n (cid:48) + N (cid:88) n (cid:48) =1 σ ( j ) A q ( n, n (cid:48) ) a ( j ) n (cid:48) (cid:33) ](112)According to the equation (9) and expanding equation (112), the equations of (56) to(58) are derived. A.8 Estimation of the distribution q (cid:16) C ( j ) i (cid:17) Similar to the previous sections, we have: P (cid:16) C ( j ) i | x T , Z L , π K , δ j , S j , Q K , µ K , r K , a K , C \ j (cid:17) = T (cid:89) t =1 N (cid:16) y it | C ( j ) i x ( j ) t + µ j , r j − (cid:17) I ( z i = j ) N (cid:16) C ( j ) i | , Σ ( j ) C (cid:17) (113)therefore, q ( C ( j ) i ) ∝ T (cid:88) t =1 q ( z i = j ) (cid:16) C ( j ) i E (cid:104) x ( j ) t x ( j ) (cid:124) t (cid:105) ˆ w ˆ Ψ C ( j ) (cid:124) i − C ( j ) i E (cid:104) x ( j ) t (cid:105) (cid:16) ˆ w ˆ Ψ y it − ˆ w ˆ Ψ ˆ m (cid:17) − (cid:16) y (cid:124) it ˆ w ˆ Ψ − ˆ m (cid:124) ˆ w ˆ Ψ (cid:17) E (cid:104) x ( j ) (cid:124) t (cid:105) C ( j ) (cid:124) i (cid:17) − (cid:16) C ( j ) i Σ ( j ) C i C ( j (cid:124) ) i (cid:17) (114)According to the equation (9) and expanding equation (114), the equations of (59) to(61) are derived. B Inference of the Rauch-Tung-Striebel smoother (RTSS)
As mentioned before, RTSS algorithm contains two steps: Forward recursion and Backwardrecursion. In the Forward step, forward messages go along the LDSs. The forward messagesare defined as α t (cid:16) x ( j ) t (cid:17) ≡ p (cid:16) x ( j ) t | ¯ y ( j )1: t (cid:17) , where can be written: α t (cid:16) x ( j ) t (cid:17) = (cid:82) p (cid:16) x ( j ) t − | ¯ y ( j )1: t − (cid:17) p (cid:16) x ( j ) t | x ( j ) t − (cid:17) p (cid:16) ¯ y ( j ) t | x ( j ) t (cid:17) dx ( j ) t − p (cid:16) ¯ y ( j ) t | ¯ y ( j )1: t − (cid:17) , (115)where ¯ y ( j ) t is defined by equation (52). Suppose Φ t (¯ y ( j ) t ) ≡ p (cid:16) ¯ y ( j ) t | ¯ y ( j )1: t − (cid:17) , where Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 33 α t ( x t ) = 1 Φ t (cid:16) ¯ y ( j ) t (cid:17) (cid:90) α t − (cid:16) x ( j ) t − (cid:17) p (cid:16) x ( j ) t | x ( j ) t − p (cid:16) ¯ y ( j ) t | x ( j ) t (cid:17)(cid:17) dx t − = 1 Φ (¯ y ( j ) t ) (cid:90) N (cid:16) x ( j ) t − | ˙ µ t − , ˙ Σ − t − (cid:17) N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) N (cid:16) ¯ y ( j ) t | C ( j ) x ( j ) t , R − j (cid:17) N (cid:16) ¯ y ( j ) t | µ j , Σ − j (cid:17) dx ( j ) t − . (116)The equation (77) contains a Gaussian distribution N (cid:16) x ( j ) t − | ¨ µ t − , ¨ Σ − t − (cid:17) , where¨ µ t − = ¨ Σ − t − (cid:16) ˙ Σ t − ˙ µ t − + A ( j ) (cid:124) Q j x ( j ) t (cid:17) ¨ Σ t − = (cid:16) ˙ Σ t − + A ( j ) (cid:124) Q j A ( j ) (cid:17) . (117)So, with respect to the above distribution, the equation (77) can be written as α t ( x t ) ∝ N (cid:16) ¯ y ( j ) t | µ j , Σ − j (cid:17) × (cid:90) exp − x ( j ) (cid:124) t (cid:16) Q j + C ( j ) (cid:124) R j C ( j ) (cid:17) x ( j ) t − x ( j ) (cid:124) t C ( j ) (cid:124) R j ¯ y ( j ) t − ¯ y ( j ) (cid:124) t R j C ( j ) x ( j ) t + ˙ µ (cid:124) t − ˙ Σ t − ˙ µ t − +¯ y ( j ) (cid:124) t R j ¯ y ( j ) t − ¨ µ (cid:124) t − ¨ Σ t − ¨ µ t − N (cid:16) x ( j ) t − | ¨ µ t − , ¨ Σ − t − (cid:17) dx ( j ) t − (118)The distribution of the forward messages can be obtained from Normal distribution byintegration of equation (79) as below: α t ( x t ) = N (cid:16) x ( j ) t | ˙ µ t , ˙ Σ t (cid:17) ˙ µ t = ˙ Σ − t (cid:16) C ( j ) (cid:124) E [ R j ]¯ y ( j ) t + E [ Q j ] (cid:124) A ( j ) ¨ Σ − (cid:124) t − ˙ Σ t − ˙ µ t − (cid:17) ˙ Σ t = (cid:16) Q j + C ( j ) (cid:124) R j C ( j ) − Q (cid:124) j A ( j ) ¨ Σ − (cid:124) t − A ( j ) (cid:124) Q j (cid:17) . (119)In Backward step of RTSS, backward messages go back along the LDSs. The backwardmessages are computed as β t ( x ( j ) t ) = p (cid:16) ¯ y ( j ) t +1: T | x ( j ) t (cid:17) , where can be written as β t − ( x ( j ) t ) = (cid:90) p ( x t | x t − ) p (cid:16) ¯ y ( j ) t | x t (cid:17) β t (cid:16) x ( j ) t (cid:17) dx ( j ) t , (120)in which β T ( x ( j ) T ) = 1. The equation (82) can be written: β t − (cid:16) x ( j ) t − (cid:17) = (cid:90) N (cid:16) x ( j ) t | A ( j ) x ( j ) t − , Q − j (cid:17) N (cid:16) ¯ y ( j ) t | C ( j ) x ( j ) t , R − j (cid:17) × N (cid:16) ¯ y ( j ) t | µ j , Σ − j (cid:17) N (cid:16) x ( j ) t | η t , ψ − t (cid:17) dx ( j ) t (121)where it contains a Gaussian distribution N (cid:16) x ( j ) t | η (cid:48) t , ψ (cid:48) − t (cid:17) , which η (cid:48) t = ψ (cid:48) − t (cid:16) Q j A ( j ) x ( j ) t − + ψ t η t + C ( j ) (cid:124) R j ¯ y ( j ) t (cid:17) ψ (cid:48) t = (cid:16) Q j + ψ t + C ( j ) (cid:124) R j C ( j ) (cid:17) . (122)So we can write equation (84) as:exp − x ( j ) (cid:124) t − (cid:16) A ( j ) (cid:124) Q j A ( j ) − A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t Q j A ( j ) (cid:17) x ( j ) t − − (cid:16)(cid:16) η (cid:124) t ψ (cid:124) t + ¯ y ( j ) (cid:124) t R (cid:124) j C ( j ) (cid:17) ψ (cid:48) − (cid:124) t Q j A ( j ) (cid:17) x ( j ) t − − x ( j ) (cid:124) t − (cid:16) A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t (cid:16) ψ t η t + C ( j ) (cid:124) R j ¯ y ( j ) t (cid:17)(cid:17) + η (cid:124) t ψ t η t + ¯ y ( j ) (cid:124) t R j ¯ y ( j ) t − (cid:16) η (cid:124) t ψ (cid:124) t + ¯ y ( j ) t R (cid:124) j C ( j ) (cid:17) ψ (cid:48) − (cid:124) t ψ t η t − (cid:16) η (cid:124) t ψ (cid:124) t + ¯ y ( j ) (cid:124) t R (cid:124) j C ( j ) (cid:17) ψ (cid:48) − (cid:124) t C ( j ) (cid:124) R j ¯ y ( j ) t (123)4 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. ChanBy simplifying the equation (86), the distribution of backward messages is from N (cid:16) x ( j ) t − | η t − , ψ − t − (cid:17) ,where η t − = ψ − t − A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t (cid:16) ψ t η t + C (cid:124) R j ¯ y ( j ) t (cid:17) ψ t − = (cid:16) A ( j ) (cid:124) Q j A ( j ) − A ( j ) (cid:124) Q (cid:124) j ψ (cid:48) − (cid:124) t Q j A ( j ) (cid:17) (124)In order to compute the 1 st order and 2 nd order expected values of the hidden stateswe can write: p (cid:16) x ( j ) t | ¯ y ( j )1: T (cid:17) ∝ p (cid:16) x ( j ) t | ¯ y ( j )1: t (cid:17) p (cid:16) ¯ y ( j ) t +1: T | x ( j ) t (cid:17) = α t (cid:16) x ( j ) t (cid:17) β t (cid:16) x ( j ) t (cid:17) = N (cid:16) x ( j ) t | ˙ µ t , ˙ Σ − t (cid:17) N (cid:16) x ( j ) t | η t , ψ − t (cid:17) (125)Simplifying the equation (89) yields: p (cid:16) x ( j ) t | ¯ y ( j )1: T = N (cid:16) x ( j ) t | ω t Γ − tt (cid:17)(cid:17) Γ tt = ˙ Σ t + ψ t ω t = Γ − tt (cid:16) ˙ Σ t ˙ µ t + ψ t η t (cid:17) (126)In order to compute the 1 st order and 2 nd order expected values of the hidden states,we should compute the joint probability of p (cid:16) x ( j ) t , x ( j ) t +1 | ¯ y ( j )1: T (cid:17) . For this goal we will have: p (cid:16) x ( j ) t , x ( j ) t +1 | ¯ y ( j )1: T (cid:17) = α t ( x ( j ) t ) p (cid:16) x ( j ) t +1 | x t (cid:17) p (cid:16) ¯ y ( j ) t +1 | x t +1 (cid:17) β t +1 (cid:16) x ( j ) t +1 (cid:17) (127)Simplifying the equation (90) yields: p (cid:16) x ( j ) t , x ( j ) t +1 | ¯ y ( j )1: T (cid:17) = N (cid:34)(cid:32) x ( j ) t x ( j ) t +1 (cid:33)(cid:12)(cid:12) (cid:18) ω t ω t +1 (cid:19) , (cid:18) Γ t,t Γ t +1 ,t Γ (cid:124) t,t +1 Γ t +1 ,t +1 (cid:19)(cid:35) (128)where Γ t,t = (cid:16) ˙ Σ t + A ( j ) (cid:124) Q j A ( j ) + ψ t (cid:17) Γ t +1 ,t +1 = (cid:16) Q j + C ( j ) (cid:124) R j C ( j ) (cid:17) Γ t +1 ,t = (cid:0) Q j A ( j ) (cid:1) Γ t,t +1 = (cid:16) A ( j ) (cid:124) Q j (cid:17) (129)According to the equation (91), the 1 st order and 2 nd order expected values of thehidden states are computed as equations (89)-(91). A VBE E (cid:104) µ (cid:124) j r j µ j (cid:105) = 1ˆ λ + ˆ m ˆ w ˆ Ψ (130) E (cid:104) C ( j ) (cid:124) i r j C ( j ) i (cid:105) = Σ ( j ) C i ˆ w ˆ Ψ (131) E (cid:104) δ (cid:124) j S j δ j (cid:105) = N (cid:88) n (cid:48) =1 N (cid:88) n =1 (cid:18) λ + ˆ m n ˆ m n (cid:48) ˆ w ˆ Ψ nn (cid:48) (cid:19) (132) E (cid:104) C ( j ) i x ( j ) t x ( j ) (cid:124) t C ( j ) (cid:124) i (cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 (cid:18) ˆ Σ ( j ) C inn (cid:48) + µ ( j ) C in µ ( j ) C in (cid:48) (cid:19) E (cid:104) x ( j ) t n (cid:48) x ( j ) t n (cid:105) (133) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 35 E (cid:104)(cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17) (cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17) (cid:124) (cid:105) = E (cid:104) x ( j ) t x ( j ) (cid:124) t (cid:105) − µ ( j ) A E (cid:104) x ( j ) t − x ( j ) (cid:124) t − (cid:105) − E (cid:104) x ( j ) t x ( j ) (cid:124) t − (cid:105) µ ( j ) (cid:124) A + (cid:110) a ( j ) n x ( j ) t − (cid:48) n a ( j ) n (cid:48) (cid:111) nn (cid:48) (134)where { . } nn (cid:48) indicates the nn (cid:48) -th element of matrix. E (cid:104)(cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17) (cid:124) Q j (cid:16) x ( j ) t − A ( j ) x ( j ) t − (cid:17)(cid:105) = N (135) E (cid:104) x ( j ) (cid:124) t C ( j ) (cid:124) i r j C ( j ) i x ( j ) t (cid:105) = ˆ w ˆ Ψ N (cid:88) n =1 N (cid:88) n (cid:48) =1 Σ ( j ) C inn (cid:48) E (cid:104) x ( j ) t − (cid:48) n x ( j ) t − n (cid:105) (136) E (cid:104)(cid:16) y it − C ( j ) i x ( j ) t − µ j (cid:17) (cid:124) r j (cid:16) y it − C ( j ) i x ( j ) t − µ j (cid:17)(cid:105) = y (cid:124) it ˆ w ˆ Ψ y it − E (cid:104) x ( j ) (cid:124) t (cid:105) ˆ µ ( j ) (cid:124) C i ˆ w ˆ Ψ y it − ˆ m (cid:124) ˆ w ˆ Ψ y it − y (cid:124) it ˆ w ˆ Ψ ˆ µ ( j ) C i E (cid:104) x ( j ) t (cid:105) +ˆ w ˆ Ψ N (cid:88) n =1 N (cid:88) n (cid:48) =1 Σ ( j ) C inn (cid:48) E (cid:104) x ( j ) t − n (cid:48) x ( j ) t − n (cid:105) + ˆ m (cid:124) ˆ w ˆ Ψ ˆ µ ( j ) C i E (cid:104) x ( j ) t (cid:105) − y (cid:124) it ˆ w ˆ Ψ ˆ m + E (cid:104) x ( j ) (cid:124) t (cid:105) ˆ µ ( j ) C i ˆ w ˆ Ψ ˆ m + 1ˆ λ + ˆ m ˆ w ˆ Ψ (137) E [( µ j − m ) (cid:124) λ r j ( µ j − m )] = λ ˆ λ + λ ˆ m ˆ w ˆ Ψ − m (cid:124) λ ˆ w ˆ Ψ ˆ m − λ ˆ m (cid:124) ˆ w ˆ Ψ m + m (cid:124) λ ˆ w ˆ Ψ m (138) E (cid:104) x ( j ) (cid:124) S j x ( j )1 (cid:105) = N (1 + ˆ λ − ) + ˆ m N (cid:88) n (cid:48) =1 N (cid:88) n =1 { ˆ w ˆ Ψ } nn (cid:48) (139) E (cid:104) δ (cid:124) j S j δ j (cid:105) = N + ˆ m (cid:124) ˆ Ψ ˆ m (140) E (cid:104) δ (cid:124) j S j x ( j )1 (cid:105) = N (cid:88) n (cid:48) =1 N (cid:88) n =1 E (cid:104) x ( j )1 n (cid:48) (cid:105) ˆ m n ˆ w ˆ Ψ n (cid:48) (141) E (cid:104) x ( j ) (cid:124) S j δ j (cid:105) = N (cid:88) n (cid:48) =1 N (cid:88) n =1 E (cid:104) x ( j )1 n (cid:105) ˆ w ˆ Ψ n ˆ m n (cid:48) (142) E (cid:104)(cid:16) x ( j )1 − δ j (cid:17) (cid:124) S j (cid:16) x ( j )1 − δ j (cid:17)(cid:105) = N (1 + ˆ λ − ) + ˆ m N (cid:88) n (cid:48) =1 { ˆ w ˆ Ψ } nn (cid:48) − N (cid:88) n (cid:48) =1 N (cid:88) n =1 E (cid:104) x ( j )1 n (cid:48) (cid:105) ˆ m n ˆ w ˆ Ψ n (cid:48) − N (cid:88) n (cid:48) =1 N (cid:88) n =1 E (cid:104) x ( j )1 n (cid:105) ˆ w ˆ Ψ n ˆ m n (cid:48) + N + ˆ m (cid:124) ˆ Ψ ˆ m (143) E [( δ j − m ) (cid:124) λ S j ( δ j − m )] = λ N (cid:88) n (cid:48) =1 N (cid:88) n =1 (cid:18) λ + ˆ m n ˆ m n (cid:48) ˆ w ˆ Ψ nn (cid:48) (cid:19) − m (cid:124) λ ˆ w ˆ Ψ ˆ m − ˆ m (cid:124) ˆ w ˆ Ψ λ m + m (cid:124) λ ˆ w ˆ Ψ m (144)6 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan E (cid:104) a ( j ) (cid:124) n σ ( j ) A a ( j ) n (cid:105) = σ ( j ) A (cid:16) ˆ σ ( j ) A n + ˆ µ ( j ) A n (cid:17) (145) E (cid:104) C ( j ) i Σ ( j ) C C ( j ) (cid:124) i (cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 { Σ ( j ) C } nn (cid:48) (cid:16) { Σ ( j ) C i } nn (cid:48) − { ˆ µ ( j ) C i } n { ˆ µ ( j ) C i } n (cid:48) (cid:17) (146) E (cid:104)(cid:16) x ( j )1 − µ ( j ) x (cid:17) (cid:124) Σ ( j ) x (cid:16) x ( j )1 − µ ( j ) x (cid:17)(cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 ( { Σ ( j ) x } nn (cid:48) − { Σ ( j ) x } nn (cid:48) { µ ( j ) x } (147) E (cid:104)(cid:16) x ( j )1 − µ ( j ) x (cid:17) (cid:124) Σ ( j ) x (cid:16) x ( j )1 − µ ( j ) x (cid:17)(cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 (cid:16) { Σ ( j ) n } nn (cid:48) − { Σ ( j ) x } nn (cid:48) { µ ( j ) x } n { µ ( j ) x } n (cid:48) (cid:17) − N (cid:88) n =1 N (cid:88) n (cid:48) =1 { Σ ( j ) x } nn (cid:48) { µ ( j ) x } n E (cid:104) x ( j )1 n (cid:48) (cid:105) + N (cid:88) n =1 N (cid:88) n (cid:48) =1 { Σ ( j ) x } nn (cid:48) { µ ( j ) x } n { µ ( j ) x } n (cid:48) (148) E (cid:104)(cid:16) x ( j ) t − µ ( j ) x t (cid:17) (cid:124) Σ ( j ) x t (cid:16) x ( j ) t − µ ( j ) x t (cid:17)(cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 Σ ( j ) x tnn (cid:48) E (cid:104) x ( j ) t n x ( j ) t n (cid:48) (cid:105) − N (cid:88) n =1 N (cid:88) n (cid:48) =1 µ ( j ) x tn Σ x tnn (cid:48) E (cid:104) x ( j ) t n (cid:48) (cid:105) + N (cid:88) n =1 N (cid:88) n (cid:48) =1 µ ( j ) x tn Σ x tnn (cid:48) µ ( j ) x tn (cid:48) (149) E (cid:104)(cid:16) { A ( j ) } n − { ˆ µ ( j ) A } n (cid:17) (cid:124) { ˆ σ ( j ) A } n (cid:16) { A ( j ) } n − { ˆ µ ( j ) A } n (cid:17)(cid:105) = { ˆ σ ( j ) A } n (cid:16) { ˆ σ ( j ) A } n − { ˆ µ ( j ) A } n (cid:17) − { ˆ µ ( j ) A } n { ˆ σ ( j ) A } n { ˆ µ ( j ) A } n (150) E (cid:104)(cid:16) C ( j ) (cid:124) i − ˆ µ ( j ) C i (cid:17) (cid:124) ˆ Σ ( j ) C i (cid:16) C ( j ) (cid:124) i − ˆ µ ( j ) C i (cid:17)(cid:105) = N (cid:88) n =1 N (cid:88) n (cid:48) =1 { ˆ Σ ( j ) C i } nn (cid:48) (cid:16) { ˆ Σ ( j ) C i } nn (cid:48) − { ˆ µ ( j ) C i } n { ˆ µ ( j ) C i } n (cid:48) (cid:17) − N (cid:88) n =1 N (cid:88) n (cid:48) =1 { ˆ µ ( j ) C i } n { ˆ Σ ( j ) C i } nn (cid:48) { ˆ µ ( j ) C i } n (cid:48) (151) E (cid:104) ( µ j − ˆ m ) (cid:124) ˆ λ r j ( µ j − ˆ m ) (cid:105) = 1 + ˆ λ ˆ m ˆ w ˆ Ψ − ˆ m (cid:124) ˆ λ ˆ w ˆ Ψ ˆ m (152) Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation 37 References
1. Atiyeh Ghoreyshi and Ren´e Vidal. Segmenting dynamic textures with ising descriptors,arx models and level sets. In
Dynamical Vision , pages 127–141. Springer, 2007.2. Yinhui Zhang, Mohamed Abdel-Mottaleb, and Zifen He. Unsupervised segmentation ofhighly dynamic scenes through global optimization of multiscale cues.
Pattern Recog-nition , 48(11):3477–3487, 2015.3. Bela Julesz. Textons, the elements of texture perception, and their interactions.
Nature ,290(5802):91, 1981.4. Antoni B Chan and Nuno Vasconcelos. Modeling, clustering, and segmenting video withmixtures of dynamic textures. 2008.5. Jie Chen, Guoying Zhao, Mikko Salo, Esa Rahtu, and Matti Pietikainen. Automatic dy-namic texture segmentation using local descriptors and optical flow.
IEEE Transactionson Image Processing , 22(1):326–339, 2013.6. Vijay Badrinarayanan, Ignas Budvytis, and Roberto Cipolla. Semi-supervised videosegmentation using tree structured graphical models.
IEEE transactions on patternanalysis and machine intelligence , 35(11):2751–2764, 2013.7. Adeel Mumtaz, Weichen Zhang, and Antoni B Chan. Joint motion segmentation andbackground estimation in dynamic scenes. In
Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , pages 368–375, 2014.8. Dmitry Chetverikov and Renaud P´eteri. A brief survey of dynamic texture descriptionand recognition. In
Computer Recognition Systems , pages 17–26. Springer, 2005.9. Rene Vidal and Avinash Ravichandran. Optical flow estimation & segmentation ofmultiple moving dynamic textures. In
Computer Vision and Pattern Recognition, 2005.CVPR 2005. IEEE Computer Society Conference on , volume 2, pages 516–521. IEEE,2005.10. Tomer Amiaz, S´andor Fazekas, Dmitry Chetverikov, and Nahum Kiryati. Detectingregions of dynamic texture. In
International conference on scale space and variationalmethods in computer vision , pages 848–859. Springer, 2007.11. Weifeng Ge, Zhenhua Guo, Yuhan Dong, and Youbin Chen. Dynamic backgroundestimation and complementary learning for pixel-wise foreground/background segmen-tation.
Pattern Recognition , 59:112–125, 2016.12. Jie Chen, Guoying Zhao, and Matti Pietikainen. Unsupervised dynamic texture seg-mentation using local spatiotemporal descriptors. In
Pattern Recognition, 2008. ICPR2008. 19th International Conference on , pages 1–4. IEEE, 2008.13. Wesley Nunes Gon¸calves and Odemir Martinez Bruno. Dynamic texture segmenta-tion based on deterministic partially self-avoiding walks.
Computer Vision and ImageUnderstanding , 117(9):1163–1174, 2013.14. Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale androtation invariant texture classification with local binary patterns.
IEEE Transactionson pattern analysis and machine intelligence , 24(7):971–987, 2002.15. Antoni B Chan and Nuno Vasconcelos. Layered dynamic textures.
IEEE Transactionson Pattern Analysis and Machine Intelligence , 31(10):1862–1879, 2009.16. Gianfranco Doretto, Alessandro Chiuso, Ying Nian Wu, and Stefano Soatto. Dynamictextures.
International Journal of Computer Vision , 51(2):91–109, 2003.17. S´andor Fazekas, Tomer Amiaz, Dmitry Chetverikov, and Nahum Kiryati. Dynamictexture detection based on motion analysis.
International journal of computer vision ,82(1):48, 2009.18. Michael A Gennert and Shahriar Negahdaripour. Relaxing the brightness constancyassumption in computing optical flow. Technical report, MASSACHUSETTS INST OFTECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1987.19. Adin Ramirez Rivera and Oksam Chae. Spatiotemporal directional number transitionalgraph for dynamic texture recognition.
IEEE transactions on pattern analysis andmachine intelligence , 37(10):2146–2152, 2015.20. Antoni B Chan and Nuno Vasconcelos. Probabilistic kernels for the classification ofauto-regressive visual processes. In
Computer Vision and Pattern Recognition, 2005.CVPR 2005. IEEE Computer Society Conference on , volume 1, pages 846–851. IEEE,2005.8 Sahar Yousefi M. T. Manzuri Shalmani Antoni B. Chan21. Antoni B Chan and Nuno Vasconcelos. Variational layered dynamic textures. In
Com-puter Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pages1062–1069. IEEE, 2009.22. Payam Saisan, Gianfranco Doretto, Ying Nian Wu, and Stefano Soatto. Dynamic tex-ture recognition. In
Computer Vision and Pattern Recognition, 2001. CVPR 2001.Proceedings of the 2001 IEEE Computer Society Conference on , volume 2, pages II–II.IEEE, 2001.23. Stefano Soatto, Gianfranco Doretto, and Ying Nian Wu. Dynamic textures. In
Com-puter Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conferenceon , volume 2, pages 439–446. IEEE, 2001.24. Gianfranco Doretto.
Dynamic texture modeling . PhD thesis, University of California,Los Angeles, 2002.25. Gianfranco Doretto and Stefano Soatto. Editable dynamic textures. In
Computer Visionand Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conferenceon , volume 2, pages II–137. IEEE, 2003.26. Avinash Ravichandran, Paolo Favaro, and Ren´e Vidal. A unified approach to segmenta-tion and categorization of dynamic textures. In
Asian Conference on Computer Vision ,pages 425–438. Springer, 2010.27. Feng Yang, Gui-Song Xia, Gang Liu, Liangpei Zhang, and Xin Huang. Dynamic texturerecognition by aggregating spatial and temporal features via ensemble svms.
Neuro-computing , 173:1310–1321, 2016.28. Thomas S Ferguson. A bayesian analysis of some nonparametric problems.
The annalsof statistics , pages 209–230, 1973.29. Zoubin Ghahramani and Carl E Rasmussen. Bayesian monte carlo. In
Advances inneural information processing systems , pages 505–512, 2003.30. Matthew James Beal et al.
Variational algorithms for approximate Bayesian inference .university of London London, 2003.31. Sotirios P Chatzis and Gabriel Tsechpenakis. The infinite hidden markov random fieldmodel.
IEEE Transactions on Neural Networks , 21(6):1004–1014, 2010.32. Matthew J Beal, Zoubin Ghahramani, and Carl E Rasmussen. The infinite hiddenmarkov model. In
Advances in neural information processing systems , pages 577–584,2002.33. Matthew J Beal and Zoubin Ghahramani. The variational kalman smoother.
GatsbyComputational Neuroscience Unit, University College London, Tech. Rep. GCNU TR ,3:2001, 2001.34. David Blackwell, James B MacQueen, et al. Ferguson distributions via p´olya urnschemes.
The annals of statistics , 1(2):353–355, 1973.35. Jayaram Sethuraman. A constructive definition of dirichlet priors.
Statistica sinica ,pages 639–650, 1994.36. David M Blei, Michael I Jordan, et al. Variational inference for dirichlet process mix-tures.
Bayesian analysis , 1(1):121–143, 2006.37. C.M. Bishop.
Pattern recognition and Machine Learning .38. Renaud P´eteri, S´andor Fazekas, and Mark J Huiskes. Dyntex: A comprehensive databaseof dynamic textures.
Pattern Recognition Letters , 31(12):1627–1632, 2010.39. Pratik Soygaonkar, Shilpa Paygude, and Vibha Vyas. Dynamic texture segmentationusing texture descriptors and optical flow techniques. In