Countable Partially Exchangeable Mixtures
aa r X i v : . [ m a t h . P R ] J un Countable Partially Exchangeable Mixtures
Cecilia Prosdocimi ∗ Lorenzo Finesso † October 15, 2018
Abstract
Partially exchangeable sequences representable as mixtures of Markovchains are completely specified by de Finetti’s mixing measure. Thepaper characterizes, in terms of a subclass of hidden Markov models,the partially exchangeable sequences with mixing measure concen-trated on a countable set, for sequences of random variables both ona discrete and a Polish space.
Keywords— exchangeability, partial exchangeability, Markov exchange-ability, countable mixtures of Markov chains, hidden Markov model, mixingmeasure
In the Hewitt-Savage generalization of de Finetti’s theorem, the distributionsof exchangeable sequences of Polish valued random variables are shown tobe in one to one correspondence with distributions of mixtures of i.i.d. se-quences, and ultimately with the mixing measure defining the mixture. Themixing measure thus acts as a model of the exchangeable sequence and itsproperties shed light on the random mechanism generating the sequence.In this regard [5], using the connection with the Markov moment problem, ∗ Dipartimento di Economia e Finanza, Universit`a LUISS, viale Romania 32, 00197Roma, Italy, [email protected] † Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomuni-cazioni, Consiglio Nazionale delle Ricerche, via Gradenigo 6/a, Padova, Italy,lorenzo.fi[email protected] L p . It is also of interest to characterize the subclass for whichthe mixing measure is discrete ( i.e. concentrated on a countable set). Acontribution in this direction has been given by [3], where it is proved thatan exchangeable sequence of discrete valued random variables has de Finettimixing measure concentrated on a countable set if and only if it is a hiddenMarkov model (HMM), see Definition 2.8 for the precise notion.The more general class of partially exchangeable sequences is in one toone correspondence with mixtures of Markov chains. As noted in [5], theresults on the regularity of the mixing measure carry to partially exchange-able sequences. On the other hand, to the best of our knowledge, no resultshave been reported concerning partially exchangeable sequences with discretemixing measures.The goal of the present paper is to characterize, in the spirit of [3], count-able mixtures of Markov chains. Our results hold for sequences of bothdiscrete and Polish space valued random variables. This has required thedevelopment of a few special results for HMMs, previously not available inthe literature. Of independent interest are Propositions 3.1 and 4.1 on therows of the array of the successors of an HMM, and most of the Appendix,on properties of sequences of stopping times with respect to the filtrationgenerated by an HMM and its underlying Markov chain. For sequences ofPolish valued random variables it has also been necessary to first extend [3] toshow the equivalence between exchangeable HMMs and countable mixturesof i.i.d. sequences.In Section 2 we review the basic definitions in the setup most convenientfor our purpose. The reader should be aware of the fact that slightly differ-ent notions of partial exchangeability coexist in the literature for sequencesof discrete valued random variables. We recall the original definition, intro-duced in [2] and elaborated in [8]. The latter paper clarifies the relationshipwith the alternative definition given in [4]. Section 3 deals with sequences ofdiscrete valued random variables, Section 4 with sequences of Polish valuedrandom variables. In Section 3.1 we constructively prove Proposition 3.1,which is instrumental in the balance of the paper. In Section 3.2 we charac-terize countable mixtures of Markov chains taking values on a discrete space.In Section 4 we extend the result of [3] to exchangeable sequences of Polishvalued random variables. This allows us to prove Theorem 4.3, the main2esult of the paper. Section 5 contains final remarks and hints at possibleextensions. In the Appendix are collected results on sequences of stoppingtimes for HMMs, unavailable elsewhere in the literature, and a technical re-sult for a special class of HMMs representable as mixtures of i.i.d. sequences. Y = ( Y n ) n ≥ denotes a sequence of random variables on a probability space(Ω , F , P ), taking values in a Polish space S endowed with the Borel σ -field S . The generic element of S is denoted y , and y N = y y . . . y N is an element(a string) of the N + 1-th fold Cartesian product S N +1 , likewise ( Y N = y N )is the event ( Y = y , Y = y , . . . , Y N = y N ). Exchangeable and partially exchangeable sequences.
The notions of exchange-ability and partial exchangeability both originate in the work of de Finetti.The former notion is well established in the literature, the latter has beenpresented in various disguises. For ease of reference we give below the defi-nitions used in the paper.
Definition 2.1. [[9], page 24] A sequence of random variables ( Y n ) on (cid:0) S, S (cid:1) is exchangeable if, for any N ∈ N , and distinct natural numbers m , . . . , m N , ( Y m , . . . , Y m N ) d = ( Y , . . . , Y N ) , where d = denotes equality in distribution. We adopt the definition of partial exchangeability given in [8], close tode Finetti’s original [2]. The relation with the definition of partial exchange-ability given in [4] is clarified in [8]. The definition requires the introductionof the successors array V of a given random sequence ( Y n ) on S , and theextension of S to S ∗ = S ∪ { ∂ } , where ∂ / ∈ S is a fictitious state.If S is discrete (as in Section 3 of the paper) V = ( V y,n ) y ∈ S,n ≥ is definedsetting V y,n equal to the value of Y immediately following the n -th visit to y of Y . If Y visits y only m < ∞ time, to avoid rows of V of finite length, oneassigns V y,n = ∂ for all n > m .If S is uncountable (as in Section 4 of the paper) let E = { E j } j ≥ be afixed countable partition of S ∗ , with E = { ∂ } . The matrix V = ( V j,n ) j,n ≥ is defined setting V j,n equal to the value of Y immediately following the n -thvisit to E j . As in the discrete case, if Y visits E j only m < ∞ times, set3 j,n = ∂ for all n > m . In the uncountable case V depends on the partition E . Definition 2.2. [[8]] A sequence of random variables ( Y n ) on (cid:0) S, S (cid:1) is par-tially exchangeable if its successors array V is distributionally invariant un-der finite, not necessarily identical, permutations within each of its rows. Note that, if ( Y n ) is partially exchangeable, the rows of its successorsarray V are exchangeable. Mixtures of i.i.d. sequences and of Markov chains.
The set of all probabil-ity measures on ( S, S ) is denoted M S and it is equipped with the σ -fieldgenerated by the maps p p ( A ), varying p in M S and A in S . Definition 2.3. ( Y n ) is a mixture of i.i.d. sequences if there exists a randomprobability measure e p on M S such that, for any N ∈ N and any A , . . . , A N ∈S , P (cid:0) Y ∈ A , . . . , Y N ∈ A N | e p ) = e p ( A ) . . . e p ( A N ) , P − a.s. (1) Definition 2.4.
A mixture of i.i.d. sequences is countable (finite) if therandom probability measure e p is concentrated on a countable (finite) subsetof M S . If ( Y n ) is a countable (finite) mixture of i.i.d. sequences let (cid:0) p h ( · ) (cid:1) h ∈ H ,where H is countable (finite), be the set of measures on which e p is concen-trated then, integrating Equation (1) over Ω, one has P (cid:0) Y ∈ A , . . . , Y N ∈ A N ) = X h ∈ H µ h p h ( A ) . . . p h ( A N ) , (2)where µ h := P ( e p = p h ) > P h ∈ H µ h = 1.A time homogenous Markov chain with values in S ∗ is characterized bya transition kernel k : S ∗ × S ∗ → [0 , E = ( E j ) j ≥ of S . Kernels k in this subclass can be represented in terms of the simplerkernels in the class T ∗ made up of all kernels t : N × S ∗ → [0 , N = N ∪ { } . For a given t ∈ T ∗ one defines k = k t as follows k t ( y, A ) := X j ≥ I E j ( y ) t ( j, A ) , (3)4here I E j ( · ) is the indicator function of E j . The reader is referred to [8] fora more detailed discussion. We are ready to give the following Definition 2.5.
Let ( Y n ) be a sequence of random variables with P ( Y = y ) = 1 , for some y ∈ E , then ( Y n ) is a mixture of homogeneous Markovchains if there exists a random kernel e t on T ∗ such that, for any N ∈ N , and A , . . . , A N ∈ S ∗ , P ( Y ∈ A , . . . , Y N ∈ A N | e t ) = Z A · · Z A N k e t ( y , dy ) · · · k e t ( y N − , dy N ) P -a.s. (4)By the definition of k t in (3), Equation (4) gives P ( Y ∈ A , . . . , Y N ∈ A N | e t ) = X j N ∈ ( N ) N Z A ∩ E j . . . Z A N ∩ E jN e t (1 , dy ) · · · e t ( j N − , dy N ) P -a.s.(5) Definition 2.6.
A mixture of Markov chains is countable (finite) if therandom kernel e t is countably (finitely) valued. For a countable (finite) mixture of Markov chains, Equation (4) reads,after integration over Ω, P ( Y ∈ A , . . . ,Y N ∈ A N )= X h ∈ H µ h Z A · · · Z A N k t h ( y , dy ) · · · k t h ( y N − , dy N ) (6)where ( t h ) h ∈ H are the kernels on which e t is concentrated, µ h := P ( e t = t h ),and H is a countable (finite) set. Finally we can write the finite distributionsof a countable mixtures of Markov chains as P ( Y ∈ A , . . . , Y N ∈ A N )= X h ∈ H µ h X j N ∈ ( N ) N Z A ∩ E j . . . Z A N ∩ E jN t h (1 , dy ) · · · t h ( j N − , dy N ) . (7) Remark 2.7.
To define mixtures of Markov chains when S ∗ is discrete, lesstechnicalities are needed since transition kernels reduce to transition matrices.When S ∗ is discrete Equation (4) reduces to P ( Y N = y N | e P ) = e P y y e P y y . . . e P y N − y N P -a.s. , (8)5 here e P = ( e P i,j ) i,j ∈ S ∗ is a random matrix varying on the set P ∗ of transitionmatrices, and P ( Y = y ) = 1 for some y ∈ S . The analog of Equation (6)is P ( Y N = y N ) = X h ∈ H µ h P hy y . . . P hy N − y N , (9) where ( P h ) h ∈ H are the possible values taken by e P .Representation theorems. The classic de Finetti’s representation theoremcharacterizes exchangeable sequences as mixtures of i.i.d. sequences. Thestatement, the proof, and an extensive discussion of the ramifications of thetheorem can be found in [1] and [9]. The representation theorem characteriz-ing partially exchangeable sequences as mixtures of Markov chains was firstproved in [4], for S discrete and using a slightly different notion of partialexchangeability, and later extended to S a general Polish space in [8]. Hidden Markov models.
Let S be a Polish space endowed with the Borel σ -field S . Definition 2.8.
A random sequence ( Y n ) is an hidden Markov model (HMM)if there exists a pair ( X n , e Y n ) taking values on X × S for a discrete space X ,such that1. ( X n ) is a time homogeneous Markov chain on X ,2. (conditional independence property) for any N ∈ N , and for any S , . . . , S N ∈S it holds P (cid:0) e Y ∈ S , . . . , e Y N ∈ S N | X N = x N (cid:1) = N Y n =0 P (cid:0) e Y n ∈ S n | X n = x n (cid:1) , ( Y n ) and ( e Y n ) have the same distributions, i.e. for any N ∈ N , and forany S , . . . , S N ∈ S it holds P (cid:0) Y ∈ S , . . . Y N ∈ S N (cid:1) = P (cid:0) e Y ∈ S , . . . e Y N ∈ S N (cid:1) . It is often possible to verify the second property directly for the sequence( Y n ). 6 HMM is characterized by the initial distribution π on X , by the tran-sition matrix P = ( P ij ) i,j ∈X of the Markov chain ( X n ), and by the read-out distributions f x ( ¯ S ), with ¯ S ∈ S , where f x ( ¯ S ) := P (cid:0) e Y n ∈ ¯ S | X n = x (cid:1) . (10)We refer to the sequence ( X n ) as the ”underlying Markov chain” of the HMM.For discrete S , there are many equivalent definitions of HMMs (see [11]),not all making sense for S Polish.
Remark 2.9.
A countable mixture of i.i.d. sequences as in Equation (2)is an HMM: take as ( X n ) the Markov chain with values in H , with iden-tity transition matrix, initial distribution ( µ , . . . , µ h , . . . ) h ∈ H , and read-outdistributions P (cid:0) Y n ∈ ¯ S | X n = h (cid:1) = p h ( ¯ S ) . In this section S is a discrete set. The following result will be instrumental later, and it is also of independentinterest. It is based on some useful properties of HMMs, that can be foundin the Appendix. HMMs and the succesors array are defined in Section 2.
Proposition 3.1.
Let ( Y n ) be a HMM on a discrete space S with recurrent underlying Markov chain, then each row of the successors array ( V y,n ) is aHMM with recurrent underlying Markov chain.Proof. Denote with X the discrete state space of the Markov chain ( X n ),underlying the process ( Y n ) and let, for any x ∈ X and y ∈ S , f x ( y ) := P (cid:0) Y n = y | X n = x (cid:1) A time homogeneous Markov chain ( X n ) is recurrent if P ( X n = x i.o. n | X = x ) = 1,for all x ∈ X such that P ( X = x ) >
0. Such Markov chains have no transient states butpossibly more than one recurrence class.
7e the read-out distributions. Fix y ∈ S . To prove the theorem we constructa recurrent Markov chain ( W yn ) n ≥ such that the pair ( W yn , V y,n ) n ≥ satisfiesthe conditions of Definition 2.8 of HMM (note that for convenience we lettime start at n = 1). The proof is divided in three main steps. Step 1
Construction of the Markov chain ( W yn ) . To construct the Markovchain ( W yn ), define inductively the random times of the n -th visit of ( Y n ) tostate y : τ y := inf { t ≥ | Y t = y } ,τ yn := inf { t > τ yn − | Y t = y } , with the usual convention inf ∅ = + ∞ . The random times ( τ yn ) are stoppingtimes with respect to the filtration spanned by ( Y n ), and so are the times( τ yn + 1). The random times ( τ yn + 1) are actually hitting-times of X × y ,according to the Definition 6.4 in the Appendix. Define the sequence W yn := (cid:26) ε for τ yn = + ∞ ,X τ yn +1 for τ yn < + ∞ , (1)where ε / ∈ X is a fictitious state. The sequence W yn is either identically equalto ε , or it never hits it since the times τ yn are either all finite or all infinite .Let us check that ( W yn ) is a Markov chain. If the case W yn ≡ ε obtains,( W yn ) is a (recurrent) Markov chain. Otherwise a direct computation gives,for any N ∈ N , and any x , . . . , x N ∈ X , P (cid:0) W yN = x N | W yN − = x N − , . . . , W y = x (cid:1) = P (cid:0) X τ yN +1 = x N | X τ yN − +1 = x N − , . . . , X τ y +1 = x (cid:1) = P (cid:0) X τ yN +1 = x N | X τ yN − +1 = x N − (cid:1) = P (cid:0) W yN = x N | W yN − = x N − (cid:1) , where Remark 6.7 in the Appendix applies. Thus ( W yn ) is Markov. Step 2
Check of the recurrence of ( W yn ) . Since P ( W yn = x i.o. n | W y = x ) = P (cid:0) X τ yn +1 = x i.o. n | X τ y +1 = x (cid:1) , to check the recurrence of ( W yn ) we have to verify that, for all x ∈ X , P (cid:0) X τ yn +1 = x i.o. n | X τ y +1 = x (cid:1) = 1 . (2) If Y n = y , for some finite n , then X n = x , for some x such that f x ( y ) >
0. Since X isrecurrent it hits x infinitely many times, thus Y hits y infinitely many times. x ∈ X such that P ( X τ y +1 = x ) >
0, and choose ¯ x ∈ X such that f ¯ x ( y ) > P (cid:0) X n +1 = x | X n = ¯ x (cid:1) >
0, (there exists at least one such ¯ x ).Define the auxiliary sequence of hitting times: σ ¯ x,y := inf { t ≥ | X t = ¯ x, Y t = y } ,σ ¯ x,yn := inf { t > σ ¯ x,yn − | X t = ¯ x, Y t = y } . The hitting times (cid:0) σ ¯ x,yn (cid:1) are finite whenever (cid:0) τ yn (cid:1) are finite, and the sequence (cid:0) σ ¯ x,yn (cid:1) is a subsequence of (cid:0) τ yn (cid:1) , thus( X σ ¯ x,yn +1 = x ) ⊆ [ m ≥ n ( X τ ym +1 = x ) , (3)and trivially ( X σ ¯ x,yn +1 = x i.o. n ) ⊆ ( X τ yn +1 = x i.o. n ) . (4)The events (cid:0) X σ ¯ x,yn +1 = x (cid:1) n are independent under the law P (cid:0) · | X τ y +1 = x (cid:1) ,since (cid:8)(cid:0) X σ ¯ x,yn +1 = x (cid:1) n , (cid:0) X τ y +1 = x (cid:1)(cid:9) is a P -independent set. In fact, for any N ∈ N , and for any choice of m < · · · < m N ∈ N , P (cid:0) X σ ¯ x,ymN +1 = x, X σ ¯ x,ymN − +1 = x, . . . , X σ ¯ x,ym +1 = x, X τ y +1 = x (cid:1) = P (cid:0) X σ ¯ x,ymN +1 = x, X σ ¯ x,ymN = ¯ x, . . . , X σ ¯ x,ym +1 = x, X σ ¯ x,ym = ¯ x, X τ y +1 = x (cid:1) = P (cid:0) X σ ¯ x,ymN +1 = x | X σ ¯ x,ymN = ¯ x (cid:1) P (cid:0) X σ ¯ x,ymN = ¯ x | X σ ¯ x,ymN − +1 = x, . . . , X σ ¯ x,ym +1 = x, X σ ¯ x,ym = ¯ x, X τ y +1 = x (cid:1) × · · · × P (cid:0) X σ ¯ x,ym +1 = x | X σ ¯ x,ym = ¯ x (cid:1) × P (cid:0) X σ ¯ x,ym = ¯ x | X τ y +1 = x (cid:1) P (cid:0) X τ y +1 = x (cid:1) = P (cid:0) X σ ¯ x,ymN +1 = x (cid:1) . . . P (cid:0) X σ ¯ x,ym +1 = x (cid:1) P (cid:0) X τ y +1 = x (cid:1) , where the second equality follows by Remark 6.3 in the Appendix, and thefirst and last equality follow noting that X σ ¯ x,ymN = ¯ x for any ω ∈ Ω by definitionof σ ¯ x,y . Note that by definition σ ¯ x,yn > τ y for any n >
1, but it could happen σ ¯ x,y = τ y , so the computation aboveneeds some care for m = 1, but can beeasily recovered also in this case.The events (cid:0) X σ ¯ x,yn +1 = x (cid:1) are equiprobable, with strictly positive proba-bility. By the Borel-Cantelli lemma P ( X σ ¯ x,yn +1 = x i.o. n | X τ y +1 = x ) = 1 . (5)9quations (4) and (5) taken together give P ( X τ yn +1 = x i.o. n | X τ y +1 = x ) ≥ P ( X σ ¯ x,yn +1 = x i.o. n | X τ y +1 = x ) = 1 . Condition (2) is satisfied, thus the recurrence of ( W yn ) is proved. Step 3
Verification that the pair ( W yn , V y,n ) is a HMM. Let us check thatthe pair ( W yn , V y,n ) is as in Definition 2.8. Set P (cid:0) V y,n = δ | W yn = ε (cid:1) = 1 . For ε = x ∈ X and δ = ¯ y ∈ S , the pair ( W yn , V y,n ) inherits the read-outdistributions of ( X n , Y n ): P (cid:0) V y,n = ¯ y | W yn = x (cid:1) = P (cid:0) Y τ yn +1 = ¯ y | X τ yn +1 = x (cid:1) = f x (¯ y ) , (6)see Lemma 6.8 in the Appendix. Let us verify the conditional independenceproperty, i.e. that for any N ∈ N and any y N ∈ S N and any x N ∈ X N P (cid:0) V y, = y , . . . , V y,N = y N | W y = x , . . . , W yN = x N (cid:1) = N Y n =1 P (cid:0) V y,n = y n | W yn = x n (cid:1) . It follows from the direct computation, P (cid:0) V y, = y , . . . , V y,N = y N | W y = x , . . . , W yN = x N (cid:1) = P (cid:0) Y τ y +1 = y , . . . , Y τ yN +1 = y N | X τ y +1 = x , . . . , X τ yN +1 = x N (cid:1) = N Y n =1 P (cid:0) Y τ yn +1 = y n | X τ yn +1 = x n (cid:1) = N Y n =1 P (cid:0) V y,n = y n | W yn = x n (cid:1) , where the second equality is a direct consequence of Lemma 6.9 of the Ap-pendix. The sequence ( V y,n ) n is therefore a HMM with recurrent underlyingMarkov chain, and this concludes the proof of the proposition. In [3] Dharmadhikari gives a characterization of countable mixtures of i.i.d.sequences, linking HMMs to the class of exchangeable sequences. The mainresult of [3] can be rephrased as follows (see Section 2 for the definitions ofexchangeable sequences, mixture of i.i.d. sequences and HMM).10 heorem 3.2. (Dharmadhikari) Let ( Y n ) be an exchangeable sequence ona discrete state space S . The sequence ( Y n ) is a countable mixture of i.i.d.sequences if and only if ( Y n ) is a HMM with recurrent underlying Markovchain. In the original formulation of Theorem 3.2 the stationarity of the under-lying Markov chain is one of the hypotheses, but close inspection of the proofin [3] reveals that only the absence of transient states is required.The aim of this section is to extend the above theorem to partially ex-changeable sequences, i.e. to characterize countable mixtures of Markovchains. The analog of Theorem 3.2 for mixtures of Markov chains is as follows(we refer to Section 2 for the definition of partially exchangeable sequences,mixture of Markov chains and HMMs).
Theorem 3.3.
Let ( Y n ) be a partially exchangeable sequence on a discretestate space S , with P ( Y = y ) = 1 for some y ∈ S . The sequence ( Y n ) isa countable mixture of Markov chains if and only if ( Y n ) is a HMM withrecurrent underlying Markov chain.Proof. The standing hypothesis is that ( Y n ) is a partially exchangeable se-quence. We first prove that if ( Y n ) is a HMM, then it is a countable mixtureof Markov chains i.e., in the notations of Remark 2.7, e P takes countablymany values. By the partial exchangeability of ( Y n ), the row ( V y,n ), for any y ∈ S , is exchangeable, and therefore a mixture of i.i.d. sequences. As proved e.g. in Lemma 2.15 of [1] or in Proposition 1.1.4 of [9],lim N →∞ N N X n =1 I V y,n ( · ) = e p y ( · ) P − a.s., (7)where the limit has to be interpreted in the topology of weak convergence, andwhere e p y is the random probability measure with values in M S correspondingto e p in Definition 2.3. It follows from the proof of Theorem 1 in [8], thatthe random probability measure e p y in Equation (7) is the y -th row of therandom matrix e P . By Proposition 3.1 above, each row ( V y,n ) is a HMM, andtherefore, by Theorem 3.2 above, it is a countable mixture of i.i.d. sequences.The random probability measure e p y is thus concentrated on a countable set,and so is the y -th row of e P . Since this holds for each y , the conclusion isthat e P takes countably many values.11o prove the converse one has to show that if ( Y n ) is a given countablemixture of Markov chains, i.e. if Equation (9) holds for some countable family (cid:0) P h (cid:1) h ∈ H , then ( Y n ) is a HMM. We construct a pair ( X n , e Y n ) satisfying theconditions of Definition 2.8, with ( X n ) recurrent, and such that ( e Y n ) and thegiven ( Y n ) have the same distributions.The Markov chain ( X n ) is defined on the state space S × H , with tran-sition probability matrix P , the direct sum of the transition matrices P h , P := P . . . P . . . ... ... ... . . . , and initial distribution π defined, for any y ∈ S and h ∈ H , as π ( y, h ) = (cid:26) µ h for y = y y = y ,and µ h := P ( e P = P h ). To show that ( X n ) is recurrent note that by Theorem1 in [8], ( Y n ) is conditionally recurrent, therefore the matrices { P h } in themixture correspond to recurrent chains. Since P is the direct sum of suchmatrices, ( X n ) is recurrent.Consider now a sequence ( e Y n ), with fixed initial state e Y = y , condi-tionally independent given ( X n ), and with read-out distributions defined asfollows f ( x,h ) ( y ) := P (cid:0) e Y n = y | X n = ( x, h ) (cid:1) = δ x,y , where δ · , · is the Kronecker symbol. Let us compute the finite distributionsof ( e Y n ), for any N ∈ N and any y N ∈ ( S ∗ ) N , E.g. ordering the states in first lexical order as follows:(1 , , (2 , , . . . , (1 , , (2 , , . . . (cid:0) e Y N = y N (cid:1) = X ( x N , h N ) ∈ S N +1 × H N +1 P (cid:0) e Y N = y N , X N = ( x N , h N ) (cid:1) = X ( x N , h N ) ∈ S N +1 × H N +1 P ( X = ( x , h )) N Y n =0 P (cid:0) e Y n = y n | X n = ( x n , h n ) (cid:1) × N − Y n =0 P (cid:0) X n +1 = ( x n +1 , h n +1 ) | X n = ( x n , h n ) (cid:1) = X ( x N , h N ) ∈ S N +1 × H N +1 π ( x , h ) N Y n =0 f ( x n ,h n ) ( y n ) N − Y n =0 P ( x n ,h n )( x n +1 ,h n +1 ) = X h ∈ H µ h P hy y . . . P hy N − y N , where the second equality follows from the conditional independence of ( e Y n )given ( X n ), and the fourth from the definition of the read-out densities andby the block structure of P . Comparing (9) with the last expression, we havethat ( Y n ) and ( e Y n ) have the same distributions, thus ( Y n ) is a HMM withrecurrent underlying Markov chain and the theorem is proved.Note that, if S finite, in Theorems 3.2 and 3.3 the state space of theunderlying Markov chain is finite if and only if the mixture is finite . In this section S is a Polish space. The proposition below is the analog of Proposition 3.1 for uncountable statespace S (the definitions of HMM and of successors array are in Section 2). Proposition 4.1.
Let ( Y n ) be a HMM on a Polish space S with recurrentunderlying Markov chain, then each row of the successors array ( V j,n ) is aHMM with recurrent underlying Markov chain. roof. Let ( X n ) be the underlying Markov chain of ( Y n ). Consider the par-tition E = ( E j ) j ≥ of S ∗ , and for any element E j of the partition define τ E j := inf { t ≥ | Y t ∈ E j } , τ E j n := inf { t > τ E j n − | Y t ∈ E j } . The proof can be carried out exactly as the proof of Proposition 3.1, substi-tuting τ yn there with τ E j n , and σ ¯ x,yn with σ ¯ x,E j n , defined below σ ¯ x,E j := inf { t ≥ | X t = ¯ x, Y t ∈ E j } ,σ ¯ x,E j n := inf { t > σ ¯ x,yn − | X t = ¯ x, Y t ∈ E j } , where ¯ x ∈ X is such that f ¯ x ( E j ) > P (cid:0) X n +1 = x | X n = ¯ x (cid:1) >
0, ( x hasthe same role as in Equation (2)). This subsection mirrors Subsection 3.2 for the case of Polish state space S .Theorem 4.2 below extends Theorem 3.2 to Polish state spaces. To the bestof our knowledge the extension is not available in the literature. Based onTheorem 4.2 we prove Theorem 4.3 which is the counterpart of Theorem 3.3and the main result of the subsection.Note that Theorem 3.2, i.e. Dharmadhikari’s original result [3], can notbe directly generalized as it relies on a definition of HMMs unsuitable forgeneral state spaces. Exchangeable sequences, mixture of i.i.d. sequences, and HMMs are definedin Section 2.
Theorem 4.2.
Let ( Y n ) be an exchangeable sequence on a Polish space. Thesequence ( Y n ) is a countable mixture of i.i.d. sequences if and only if ( Y n ) isa HMM with recurrent underlying Markov chain.Proof. If ( Y n ) is a countable mixture of i.i.d., then it is a HMM by Remark2.9. To prove the converse let ( Y n ) be an exchangeable HMM, whose recurrentunderlying Markov chain ( X n ) has transition probability matrix P and initialdistribution π . The Markov chain ( X n ) has no transient states, but possiblymore than one recurrence class. As noted in [3], by the exchangeability of14 Y n ), one can substitute P with the Ces`aro limit P ∗ := lim n →∞ /n P nk =1 P k ,where P k is the k -power of P . By the ergodic theorem P ∗ has a blockstructure, being the direct sum of matrices P h with identical rows, one block P h for each recurrence class. By Lemma 6.10 of the Appendix ( Y n ) is acountable mixture of i.i.d. sequences. See Section 2 for the definitions of partially exchangeable sequence, mixtureof Markov chains and HMM.
Theorem 4.3.
Let ( Y n ) be a partially exchangeable sequence on a Polishspace with P ( Y = y ) = 1 , for some y ∈ E . The sequence ( Y n ) is acountable mixture of homogeneous Markov chains if and only if ( Y n ) is aHMM with recurrent underlying Markov chain.Proof. The partially exchangeability of ( Y n ) is a standing hypothesis. Let( Y n ) be a HMM with recurrent underlying Markov chain. To prove that ( Y n )is a countable mixture of Markov chains imitate the proof of Theorem 3.3.Note first that N P Nn =1 I V j,n ( · ) → θ j ( · ), where θ j is a probability measure on S . As in the proof of Theorem 4 in [8], for any j ∈ N define e t ( j, · ) := θ j ( · ).To conclude use Proposition 4.1 and Theorem 4.2.For the converse assume that ( Y n ) is a countable mixture of Markovchains, with random kernel e t taking values (cid:0) t h (cid:1) h ∈ H and with µ h = P ( e t = t h ),and finite distributions as in Equation (7). To prove that ( Y n ) is a HMM withrecurrent underlying Markov chain we construct a recurrent Markov chain( X n ) and a sequence ( e Y n ) satisfying the first two conditions in Definition2.8, then showing that ( e Y n ) has the same distributions of ( Y n ) . Considerthus a Markov chain ( X n ) taking values in H × N × N , with components X n = ( h n , i n , j n ) representing the index of the running chain in the mixture,the discretized value of Y n − (i.e. the elements of the partition to which Y n − belongs), and the discretized value of Y n respectively. The initial distributionof ( X n ) is taken to be P (cid:0) X = ( h, i, j ) (cid:1) = µ h i +1 δ j, , The construction used for the proof of Theorem 3.3 can not be used here, in factthe Markov chain ( X n ) there takes values in the product space H × S , which can be nowuncountable, while we need a discrete underlying Markov chain. δ · , · is again the Kronecker symbol, and its transition probabilities P (cid:16) X n = ( h n , i n , j n ) | X n − = ( h n − , i n − , j n − ) (cid:17) = δ h n − ,h n δ i n ,j n − t h n ( i n , E j n ) . The Markov chain ( X n ) is recurrent since the kernels t h correspond torecurrent Markov chains by Theorem 4 in [8]. Consider now a sequence ( e Y n )jointly distributed with ( X n ), with fixed initial value e Y = y , conditionallyindependent given ( X n ), and with read-out distributions defined as followsfor any A ∈ S P (cid:0) e Y n ∈ A | X n = ( h, i, j ) (cid:1) = (cid:26) t h ( i, E j ) = 0 t h ( i,E j ) R A ∩ E j t h ( i, dy ) for t h ( i, E j ) = 0 . For any N ∈ N , and any A , . . . , A N ∈ S ∗ , the distributions of ( e Y n ) are16omputed as follows P (cid:0) e Y ∈ A , . . . , e Y N ∈ A N (cid:1) = X h N ∈ H N , i N ∈ ( N ) N , j N ∈ ( N ) N P (cid:0) e Y ∈ A , . . . , e Y N ∈ A N , X = ( h , i , j ) , . . . , X N = ( h N , i N , j N ) (cid:1) = X h N ∈ H N , i N ∈ ( N ) N , j N ∈ ( N ) N P (cid:0) e Y ∈ A , . . . , e Y N ∈ A N | X = ( h , i , j ) , . . . , X N = ( h N , i N , j N ) (cid:1) × P (cid:0) X = ( h , i , j ) , . . . , X N = ( h N , i N , j N ) (cid:1) = X h N ∈ H N , i N ∈ ( N ) N +1 , j N ∈ ( N ) N +1 N Y n =1 P (cid:0) e Y n ∈ A n | X n = ( h n , i n , j n ) (cid:1) × N Y n =1 P (cid:0) X n = ( h n , i n , j n ) | X n − = ( h n − , i n − , j n − ) (cid:1) P (cid:0) X = ( h , i , j ) (cid:1) = X h N ∈ H N , i N ∈ ( N ) N +1 , j N ∈ ( N ) N +1 N Y n =1 t h n ( i n , E j n ) Z A n ∩ E jn t h n ( i n , dy ) × N Y n =1 δ h n − ,h n δ i n ,j n − t h n ( i n , E j n ) µ h δ j , i +1 = X h ∈ H, i ∈ N , j N ∈ ( N ) N +1 N Y n =1 Z A n ∩ E jn t h ( j n − , dy ) µ h δ j , i +1 = X h ∈ H, j N ∈ ( N ) N µ h Z A ∩ E j t h (1 , dy ) N Y n =2 Z A n ∩ E jn t h ( j n − , dy ) . Comparing the expression above with Equation (7), one concludes that thedistributions of ( e Y n ) coincide with those of ( Y n ), therefore proving that ( Y n )is a HMM. Throughout the paper we referred to the notion of partial exchangeabilityoriginally given by de Finetti and to the corresponding representation the-orem as given in [8]. For discrete state space partial exchangeability canbe defined in a slightly different way, and a representation theorem in this17lternative framework is proved in [4]. According to [4], a sequence of ran-dom variables is partially exchangeable if the probability is invariant underall permutations of a string that preserves the first value and the transitioncounts between any couple of states. A characterization of countable mix-tures of Markov chains can be given also in the setup of [4], using differentmathematical tools. The result is in [7], but for a complete proof see [10]. Bythe same token the characterization of countable mixtures of Markov chainsof order k holds true, for the proof see [10]. Unfortunately the approach of [7]and [10] does not readily generalize to Polish state space.Based on the results in [8], a de Finetti’s type representation theoremfor mixtures of semi-Markov processes have been proved in [6]. The authorsare confident that a characterization of countable mixtures of semi-Markovprocesses in terms of HMMs can be given properly adapting the proof ofProposition 4.1 and Theorem 4.3. This section contains some useful properties of HMMs.
Lemma 6.1. (Splitting property) Let ( Y n ) be a HMM with underlying Markovchain ( X n ) . Then the pair ( X n , Y n ) is a Markov chain. Moreover for any N ∈ N , for any x , . . . , x N ∈ X , and S , . . . , S N ∈ S such that P (cid:0) X N − = x N − , Y N − ∈ S N − (cid:1) > we have P (cid:0) X N = x, Y N ∈ S N | X N − = x N − , Y N − ∈ S N − (cid:1) = P (cid:0) X N = x, Y N ∈ S N | X N − = x N − (cid:1) . roof. P (cid:0) X N = x, Y N ∈ S N | X N − = x N − , Y N − ∈ S N − (cid:1) = P (cid:0) X N = x N , Y N ∈ S N (cid:1) P (cid:0) X N − = x N − , Y N − ∈ S N − (cid:1) = Q Nn =1 P (cid:0) Y n ∈ S n | X n = x n (cid:1) P (cid:0) X N = x N (cid:1)Q N − n =1 P (cid:0) Y n ∈ S n | X n = x n (cid:1) P (cid:0) X N − = x N − (cid:1) = P (cid:0) Y N ∈ S N | X N = x N (cid:1) P (cid:0) X N = x N | X N − = x N − (cid:1) = P (cid:0) Y N ∈ S N | X N = x N , X N − = x N − (cid:1) P (cid:0) X N = x N | X N − = x N − (cid:1) = P (cid:0) Y N ∈ S N , X N = x N | X N − = x N − (cid:1) , where the second and fourth equality follow by the conditional independenceof the observations ( Y n ) in the definition of HMM. Lemma 6.2. (Strong splitting property) Let ( Y n ) be a HMM with underlyingMarkov chain ( X n ) , and γ be a stopping time for ( X n , Y n ) , then, for any x, e x, ¯ x ∈ X , and any S , S , S ∈ S such that P (cid:0) X γ = e x, Y γ ∈ S , X γ ∧ n =¯ x, Y γ ∧ n ∈ S (cid:1) > it holds that P (cid:0) X γ + k = x, Y γ + k ∈ S | X γ = e x, Y γ ∈ S , X γ ∧ n = ¯ x, Y γ ∧ n ∈ S (cid:1) = P (cid:0) X γ + k = x, Y γ + k ∈ S | X γ = e x (cid:1) . (1) Proof.
We manipulate separately the left-hand side (LHS) and the right-hand side (RHS) of Equation (1). For readability denote C r := (cid:0) γ = r, X r = e x, Y r ∈ S , X r ∧ n = ¯ x, Y r ∧ n ∈ S (cid:1) . Applying Lemma 6.1, the numerator ofthe conditional probability on the LHS of Equation (1) is19 (cid:0) X γ + k = x, Y γ + k ∈ S , X γ = e x, Y γ ∈ S , X γ ∧ n = ¯ x, Y γ ∧ n ∈ S (cid:1) = P r ≥ P (cid:0) X r + k = x, Y r + k ∈ S | C r (cid:1) P ( C r )= P r ≥ P (cid:0) X r + k = x, Y r + k ∈ S | X r = e x (cid:1) P ( C r )= P r ≥ P (cid:0) Y r + k ∈ S | X r + k = x (cid:1) P (cid:0) X r + k = x | X r = e x (cid:1) P ( C r )= f x ( S ) P ( k ) e x,x P r ≥ P ( C r )= f x ( S ) P ( k ) e x,x P (cid:0) X γ = e x, Y γ ∈ S , X γ ∧ n = ¯ x, Y γ ∧ n ∈ S (cid:1) , where P ( k ) e x,x is the e x, x -entry of the k -step transition matrix of the Markovchain ( X n ). The numerator of the conditional probability on the RHS ofEquation (1), again applying Lemma 6.1, is P (cid:0) X γ + k = x, Y γ + k ∈ S , X γ = e x (cid:1) = P r ≥ P (cid:0) X r + k = x, Y r + k ∈ S | γ = r, X r = e x (cid:1) P (cid:0) γ = r, X r = e x (cid:1) = P r ≥ P (cid:0) X r + k = x, Y r + k ∈ S | X r = e x (cid:1) P (cid:0) γ = r, X r = e x (cid:1) = f x ( S ) P ( k ) e x,x P (cid:0) X γ = e x (cid:1) . The lemma is proved comparing the expressions of the LHS and the RHSderived above.Taking S = S = S = S we have Remark 6.3.
Let ( X n ) , γ be as in Lemma 6.2 then, for any x, e x, ¯ x ∈ X , such that P (cid:0) X γ = e x, X γ ∧ n = ¯ x (cid:1) > it holds that P (cid:0) X γ + k = x, | X γ = e x, X γ ∧ n = ¯ x (cid:1) = P (cid:0) X γ + k = x, | X γ = e x (cid:1) . (2)20 efinition 6.4. Let ( Y n ) be a HMM with underlying Markov chain ( X n ) ,and let A ⊂ X × S . We say that the sequence of random times (cid:0) γ n (cid:1) n ≥ is asequence of hitting times of A if γ := inf { t ≥ | ( X t , Y t ) ∈ A } ,γ n := inf { t > γ n − | ( X t , Y t ) ∈ A } . Lemma 6.5. (Generalized strong splitting property) Let ( Y n ) be a HMM withunderlying Markov chain ( X n ) . Let ( γ n ) be a sequence of hitting times of A for ( X n , Y n ) , where A ⊂ X ×S . Then for any N , and any ( x , S ) , . . . , ( x N , S N ) ∈ A such that P (cid:0) X γ N − γ = x N − , Y γ N − γ ∈ S N − (cid:1) > it holds P (cid:0) X γ N = x N , Y γ N ∈ S N | X γ N − γ = x N − , Y γ N − γ ∈ S N − (cid:1) = P (cid:0) X γ N = x N , Y γ N ∈ S N | X γ N − = x N − (cid:1) . Proof.
Denote with A c the complement of A in X × S , and with ( A c ) r the r -th fold Cartesian product of A c . Let B := (cid:0) X γ N − γ = x N − , Y γ N − γ ∈ S N − (cid:1) .Applying Lemma 6.2 in the third equality below, the numerator of the con-ditional probability on the LHS is P (cid:0) X γ N γ = x N , Y γ N γ ∈ S N (cid:1) = P r ≥ P (cid:0) γ N = γ N − + r, X γ N − + r = x N , Y γ N − + r ∈ S N | B (cid:1) P (cid:0) B (cid:1) = P r ≥ P (cid:16) X γ N − + r = x N , Y γ N − + r ∈ S N , (cid:0) X γ N − + r − γ N − +1 , Y γ N − + r − γ N − +1 (cid:1) ∈ ( A c ) r | B (cid:17) P (cid:0) B (cid:1) = P r ≥ P (cid:16) X γ N − + r = x N , Y γ N − + r ∈ S N , (cid:0) X γ N − + r − γ N − +1 , Y γ N − + r − γ N − +1 (cid:1) ∈ ( A c ) r | X γ N − = x N − (cid:17) P (cid:0) B (cid:1) = P r ≥ P (cid:0) γ N = γ N − + r, X γ N − + r = x N , Y γ N − + r ∈ S N | X γ N − = x N − (cid:1) P (cid:0) B (cid:1) = P (cid:0) X γ N = x N , Y γ N ∈ S N | X γ N − = x N − (cid:1) P (cid:0) B (cid:1) , and dividing by P ( B ) the lemma is proved. Remark 6.6.
By the same token, for any ( x , S ) , . . . , ( x N , S N ) ∈ X × S , P (cid:0) X γ N +1 = x N , Y γ N +1 ∈ S N | X γ N − +1 γ +1 = x N − , Y γ N − +1 γ +1 ∈ S N − (cid:1) = P (cid:0) X γ N +1 = x N , Y γ N +1 ∈ S N | X γ N − +1 = x N − (cid:1) . S = · · · = S N = S , Remark 6.6 gives Remark 6.7.
For any x , . . . , x N ∈ X , P (cid:0) X γ N +1 = x N | X γ N − +1 γ +1 = x N − (cid:1) = P (cid:0) X γ N +1 = x N | X γ N − +1 = x N − (cid:1) . As a consequence of the conditional independence property of HMMs wehave
Lemma 6.8.
Let ( Y n ) be a HMM with underlying Markov chain ( X n ) . Thenfor any N ∈ N , for any x , . . . , x N ∈ X , and S , . . . , S N ∈ S such that P (cid:0) X N = x N , Y N − ∈ S N − (cid:1) > we have P (cid:0) Y N ∈ S N | X N = x N , Y N − ∈ S N − (cid:1) = P ( Y N ∈ S N | X N = x N ) . (3) Moreover let σ, τ two stopping times for ( X n , Y n ) such that σ < τ . Then forany ¯ S ∈ S ∗ , and any x , x ∈ X we have P (cid:0) Y τ +1 ∈ ¯ S | X τ +1 = x (cid:1) = f x ( ¯ S ) , (4) P (cid:0) Y τ +1 ∈ ¯ S | X τ +1 = x , X σ +1 = x (cid:1) = P (cid:0) Y τ +1 ∈ ¯ S | X τ +1 = x (cid:1) . (5) Proof.
Equation (3) can be easily proved using the conditional independenceproperty. Equation (4) can be seen disintegrating the stopping time τ + 1, P (cid:0) Y τ +1 ∈ ¯ S, X τ +1 = x (cid:1) = X m ∈ N P (cid:0) τ + 1 = m, Y m ∈ ¯ S, X m = x (cid:1) = X m ∈ N P (cid:0) Y m ∈ ¯ S | τ + 1 = m, X m = x (cid:1) P (cid:0) τ + 1 = m, X m = x (cid:1) = X m ∈ N P (cid:0) Y m ∈ ¯ S | X m = x (cid:1) P (cid:0) τ + 1 = m, X m = x (cid:1) = f x ( ¯ S ) X m ∈ N P (cid:0) τ + 1 = m, X m = x (cid:1) = f x ( ¯ S ) P (cid:0) X τ +1 = x (cid:1) , where the third equality follows by Equation (3), and the result then followsby definition of conditional probability.22o get Equation (5) statement write P (cid:0) Y τ +1 ∈ ¯ S, X τ +1 = x , X σ +1 = x (cid:1) = X m>n X n ∈ N P (cid:0) τ = m, σ = n, Y m +1 ∈ ¯ S, X m +1 = x , X n +1 = x (cid:1) = X m>n X n ∈ N P (cid:0) Y m +1 ∈ ¯ S | τ = m, σ = n, X m +1 = x , X n +1 = x (cid:1) P ( τ = m, σ = n, X m +1 = x , X n +1 = x )= X m>n X n ∈ N P (cid:0) Y m +1 ∈ ¯ S | X m +1 = x (cid:1) P ( τ = m, σ = n, X m +1 = x , X n +1 = x )= f x ( ¯ S ) X m>n X n ∈ N P ( τ = m, σ = n, X m +1 = x , X n +1 = x )= f x ( ¯ S ) P ( X τ +1 = x , X σ +1 = x ) , where the third equality follows by Equation (3), and the result follows bythe first statement and again by definition of conditional probability. Lemma 6.9.
Let ( Y n ) be a HMM with underlying Markov chain ( X n ) , and ( γ n ) be a sequence of hitting times for A , where A ⊂ S , then, for any N ∈ N ,and for any ( x , S ) . . . , ( x N , S N ) ∈ χ × S , P (cid:0) Y γ N +1 γ +1 ∈ S N | X γ N +1 γ +1 = x N (cid:1) = N Y k =1 P (cid:0) Y γ k +1 ∈ S k | X γ k +1 = x k (cid:1) . (6) Proof.
Let C := (cid:0) Y γ N − +1 γ +1 ∈ S N − , X γ N − +1 γ +1 = x N − (cid:1) for readability. P (cid:0) Y γ N +1 γ +1 ∈ S N , X γ N +1 γ +1 = x N (cid:1) = P (cid:0) Y γ N +1 ∈ S N , X γ N +1 = x N | C (cid:1) P ( C )= P (cid:0) Y γ N +1 ∈ S N , X γ N +1 = x N | X γ N − +1 = x N − (cid:1) P ( C )= P (cid:0) Y γ N +1 ∈ S N | X γ N +1 = x N (cid:1) P (cid:0) X γ N +1 = x N | X γ N − +1 = x N − (cid:1) P ( C )= N Y k =1 P (cid:0) Y γ k +1 ∈ S k | X γ k +1 = x k (cid:1) P (cid:0) X γ N − +1 γ = x N − (cid:1) , where the second equality follows by Remark 6.6, the third by Lemma 6.8,and the last equality follows iterating the procedure and using Remark 6.7.23 .2 HMMs and countable mixtures of i.i.d. sequences The following fact was used in the proof of Theorem 4.2. If the HMM ( Y n )has an underlying Markov chain with block structured transition probabilitymatrix, with identical rows within blocks, then ( Y n ) is a countable mixtureof i.i.d. sequences.Consider a Markov chain ( X n ) with values in X and transition matrix P as follows P := P . . . . . . P . . . . . . . . . P h . . . . . . . . . . . . . . . , P h := p hc h p hc h . . . p hc hlh p hc h p hc h . . . p hc hlh ... ... ... ... p hc h p hc h . . . p hc hlh , (7)with h ∈ H , a countable set. The block P h has size l h . Some of the p hc can benull. The Markov chain ( X n ) has clearly H recurrence classes, one for eachblock, and no transient states. Let us indicate with C h the h -th recurrenceclass, corresponding to the states of the h -th block, set C h = { c h , . . . , c hl h } ,where l h can be infinite. Trivially X = ∪ h ∈ H C h . An invariant distributionassociated with the h -th block is p h := ( p hc h , . . . , p hc hlh ), and for any sequence µ h > P h ∈ H µ h = 1, the vector π = ( µ p , . . . , µ h p h , . . . ) (8)is an invariant distribution for P . Lemma 6.10.
Consider a HMM ( Y n ) where the underlying Markov chain ( X n ) has transition matrix P as in (7), invariant measure π as in (8), andassigned read-out distributions f x ( ¯ S ) , then ( Y n ) is a countable mixtures ofi.i.d. sequences where e p takes values in the set { F h , h ∈ H } , with F h ( ¯ S ) := p hc h f c h ( ¯ S ) + · · · + p hc hlh f c hlh ( ¯ S ) , and P ( e p = F h ) = µ h .Proof. Let us compute the finite distributions of ( Y n ). For any N ∈ N , let S , . . . , S N ∈ S : 24 (cid:0) Y N ∈ S N (cid:1) = X x N ∈X P (cid:0) Y N ∈ S N , X N = x N (cid:1) = X x N ∈X P (cid:0) X = x (cid:1) N Y n =0 P ( Y n ∈ S n | X n = x n ) N Y n =1 P ( X n = x n | X n − = x n − )= X x N ∈X π x N Y n =0 f x n ( S n ) N Y n =1 P x n − ,x n = X x N ∈X π x f x ( S ) P x ,x f x ( S ) . . . P x N − ,x N f x N ( S N )= X h ∈ H X x N ∈ C h µ h p hx f x ( S ) p hx f x ( S ) . . . p hx N f x N ( S N )= X h ∈ H µ h X x ∈ C h p hx f x ( S ) (cid:16) X x ∈ C h p hx f x ( S ) . . . (cid:0) X x N ∈ C h p hx N f x N ( S N ) (cid:1)(cid:17) = X h ∈ H µ h F h ( S ) F h ( S ) . . . F h ( S N ) , where the second equality follows by the HMM properties, the fifth equalityfollows noting that P x n ,x n +1 is null for x n and x n +1 in different recurrenceclasses, and it is equal to p hx n +1 for x n and x n +1 in the same recurrence class C h . The expression above coincides with the representation of countablemixtures of i.i.d. sequences given in (2), thus completing the proof. References [1] Aldous, D.J. (1985)
Exchangeability and related topics in Ecole d’ ´Et´e deProbabilit´es de Saint-Flour XIII - 1983 , Lecture Notes in Mathematics1117, Springer, Berlin[2] de Finetti, B. (1938)
Sur la condition d’ equivalence partielle in Actualit´eScientifiques et Industrielles , Hermann, Paris, , 5-18[3] Dharmadhikari, S.W. (1964)
Exchangeable processes which are functionof stationary Markov chains in The Annals of Mathematical Statistics , , 429-430 254] Diaconis, P. and Freedman, D. (1980) de Finetti’s theorem for Markovchains in The Annals of Probability , , 115-130[5] Diaconis, P. and Freedman, D. (2004) The Markov Moment Problem andde Finetti’s Theorem, Part I and Part II in Mathematische Zeitschrift , , 183-212[6] Epifani, I. and Fortini, S. and Ladelli, L. (2002) A characterization formixtures of semi-Markov processes , in
Statistics and Probability Letters , 445-457[7] Finesso, L. and Prosdocimi, C. (2009) Partially exchangeable hiddenMarkov models , in
Proceeding of European Control Conference 2009 ,3910–3914[8] Fortini, S., Ladelli, L., Petris, G. and Regazzini, E. (2002)
On mix-tures of distribution of Markov chains in Stochastic Processes and theirApplications , , 147-165[9] Kallenberg, O. (2005) in Probabilistic Symmetries and Invariance Prin-ciples , Springer[10] Prosdocimi, C. (2010)
Partial exchangeability and change detection forhidden Markov models , in
PhD Dissertation, cycle XXII, University ofPadova [11] Vidyasagar, M. (2011)
The complete realization problem for hiddenMarkov models: A survey and some new results in Mathematics of Con-trol, Signals and Systems ,23