[PDF] Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces

Abstract

The class of location-scale finite mixtures is of enduring interest both from applied and theoretical perspectives of probability and statistics. We prove the following results: to an arbitrary degree of accuracy, (a) location-scale mixtures of a continuous probability density function (PDF) can approximate any continuous PDF, uniformly, on a compact set; and (b) for any finite p≥1 , location-scale mixtures of an essentially bounded PDF can approximate any PDF in L p , in the L p norm.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug Approximation of probability density functionsvia location-scale ﬁnite mixtures in Lebesgue spaces

TrungTin Nguyen ∗ , Faicel Chamroukhi , Hien D Nguyen , and Geoﬀrey J McLachlan Lab of Mathematics Nicolas Oresme LMNO, UMR CNRS, Caen, France. School of Engineering and Mathematical Sciences. Department of Mathematics and Statistics, La TrobeUniversity, Melbourne, Victoria, Australia. School of Mathematics and Physics, University of Queensland, St. Lucia, Brisbane, Australia. ∗ Corresponding author, email: [email protected].

Abstract

The class of location-scale ﬁnite mixtures is of enduring interest both from applied and theoretical per-spectives of probability and statistics. We prove the following results: to an arbitrary degree of accuracy, (a)location-scale mixtures of a continuous probability density function (PDF) can approximate any continuousPDF, uniformly, on a compact set; and (b) for any ﬁnite p ≥

1, location-scale mixtures of an essentiallybounded PDF can approximate any PDF in L p , in the L p norm. Keywords:

Mixture models, approximation theory, uniform approximation, probability density functions.

Deﬁne (cid:0) E , k·k E (cid:1) to be a normed vector space (NVS), and let x ∈ (cid:0) R n , k·k (cid:1) , for some n ∈ N , where k·k is the Euclideannorm. Let f : R n → R be a function satisfying f ≥ R f d λ = 1, where λ is the Lebesgue measure. We say that f isa probability density function (PDF) on the domain R n (which we will omit for brevity, from hereon in). Let g : R n → R be another PDF and deﬁne the functional class M g = S m ∈ N M gm , where M gm = ( h gm : h gm ( · ) = m X i =1 c i σ ni g (cid:18) · − µ i σ i (cid:19) , µ i ∈ R n , σ i ∈ R + , c ∈ S m − , i ∈ [ m ] ) , c ⊤ = ( c , . . . , c m ), R + = (0 , ∞ ), S m − = ( c ∈ R m : m X i =1 c i = 1 , c i ≥ , i ∈ [ m ] ) ,[ m ] = { , . . . , m } , and ( · ) ⊤ is the matrix transposition operator. We say that h gm ∈ M gm is an m -component location-scale ﬁnite mixture of the PDF g . The class M g has enjoyed enduring practical and theoretical interest throughoutthe years, as reported in the volumes of Everitt and Hand (1981), McLachlan and Basford (1988), Lindsay (1995),McLachlan and Peel (2000), Fruwirth-Schnatter (2006), Mengersen et al. (2011), and Fruwirth-Schnatter et al. (2019).We say that f is compactly supported on K ⊂ R n , if K is compact and if K ∁ f = 0 , where X is the indicator functionthat takes value 1 when x ∈ X , and 0 elsewhere, and where ( · ) ∁ is the set complement operator (i.e. X ∁ = R n \ X ). Here, X is a generic subset of R n . Further, say that f ∈ L p ( X ) for any 1 ≤ p < ∞ , if k f k L p ( X ) = (cid:18)Z | X f | p d λ (cid:19) /p < ∞ ,and say that f ∈ L ∞ ( X ), the class of essentially bounded measurable functions, if k f k L ∞ ( X ) = inf (cid:8) a ≥ λ ( { x ∈ X : | f ( x ) | > a } ) = 0 (cid:9) < ∞ ,where we call k·k L p ( X ) the L p -norm on X . Denote the class of all bounded functions on X by B ( X ) = { f ∈ L ∞ ( X ) : ∃ a ∈ [0 , ∞ ) , such that | f ( x ) | ≤ a, ∀ x ∈ X } and write k f k B ( X ) = sup x ∈ X | f ( x ) | .For brevity, we shall write L p ( R n ) = L p , B ( R n ) = B , k f k L p ( R n ) = k f k L p , and k f k B ( R n ) = k f k B . astly, we denote the class of continuous functions and uniformly continuous functions by C and C u , respectively.The classes of bounded continuous and bounded uniformly continuous functions shall be denoted as C b = C ∩ B and C ub = C b ∩ C u , respectively. Note that the class of continuous functions that vanish at inﬁnity, deﬁned as C = n f ∈ C : ∀ ǫ > , ∃ a compact K ⊂ R n , such that k f k B ( K ∁ ) < ǫ o ,is a subset of C ub .An important characteristic of the class M g is its capability of approximating larger classes of PDFs in variousways. Motivated by the incomplete proofs of Xu et al. (1993, Lem 3.1) and Theorem 5 from Cheney and Light (2000,Ch. 20), as well as the results of Nestoridis and Stefanopoulos (2007), Bacharoglou (2010), and Nestoridis et al. (2011),Nguyen et al. (2020) established and proved the following theorem regarding sequences of PDFs { h gm } from M g . Theorem 1 (Theorem 5 from Nguyen et al., 2020) . Let h gm ∈ M g denote an m -component location ﬁnite mixture PDF.If f and g are PDFs and that g ∈ C , then the following statements are true.(a) For any f ∈ C , there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k L ∞ = 0 .(b) For any f ∈ C b , and compact set K ⊂ R n , there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k L ∞ ( K ) = 0 .(c) For any p ∈ (1 , ∞ ) and f ∈ L p , there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k L p = 0 .(d) For any measurable f , there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ h gm = f , almost everywhere.(e) If ν is a σ -ﬁnite Borel measure on R n , then for any ν -measurable f , there exists a sequence { h gm } M g , suchthat lim m →∞ h gm = f , almost everywhere, with respect to ν .Further, if we assume that g ∈ n g ∈ C : ∀ x ∈ R n , | g ( x ) | ≤ θ (cid:0) k x k (cid:1) − n − θ , ( θ , θ ) ∈ R o ,then the following is also true.(f ) For any f ∈ C , there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k L = 0 . The goal of this work is to seek the weakest set of assumptions in order to establish approximation theoretical resultsover the widest class of probability density problems, possible. In this paper, we prove Theorem 2, which improves uponTheorem 1 in a number of ways. More speciﬁcally, while statements (a), (c), (d), and (e) still hold under the sameassumptions as in Theorem 1; statement (b) from Theorem 1 is improved to apply to a larger class of target function f ∈ C (cf. Theorem 2(a)); and statement (f) from Theorem 1 is drastically improved to apply to any f ∈ L and g ∈ L ∞ , (cf. Theorem 2(b)). We note, in particular, that our improvement with respect to statement (b) from Theorem1 yields exactly the result of Theorem 5 from Cheney and Light (2000, Ch. 20), which was incorrectly proved (see alsoDasGupta, 2008, Thm. 33.2).The remainder of the article progresses as follows. The main result of this paper is stated in Section 2. Technicalpreliminaries to the proof of the main result are presented in Section 3. The proof is then established in Section 4.Additional technical results required throughout the paper are reported in the Appendix A. Theorem 2.

Let h gm ∈ M g denote an m -component location ﬁnite mixture PDF. If f and g are PDFs, then the followingstatements are true.(a) If f, g ∈ C and K ⊂ R n is a compact set, then there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k B ( K ) = 0 .(b) For p ∈ [1 , ∞ ) , if f ∈ L p and g ∈ L ∞ , then there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k f − h gm k L p = 0 . Technical preliminaries

Let f, g ∈ L , and denote the convolution of f and g by f ⋆ g = g ⋆ f . Further, we say that g k ( · ) = k n g ( k × · ) ( k ∈ R + )is a dilate of g .Notice that M gm can be parameterized via dilates. That is, we can write M gm = ( h gm : h gm ( · ) = m X i =1 c i k ni g ( k i × · − k i µ i ) , µ i ∈ R n , k i ∈ R + , c ∈ S m − , i ∈ [ m ] ) ,where k i = 1 /σ i .Let F be a subset of E , and denote the convex hull of F by conv ( F ), which is the smallest convex subset in E thatcontains F (cf. Brezis, 2010, Ch. 1). By deﬁnition, we may writeconv ( F ) =  X i ∈ [ m ] α i f i : f i ∈ F , α ∈ S m − , i ∈ [ m ] , m ∈ N  ,where α ⊤ = ( α , . . . , α m ).Deﬁne the class of “basic” densities, which will serve as the approximation building blocks, as follows G g = { k n g ( k × · − kµ ) , µ ∈ R n , k ∈ R + } ,and suppose that we can choose a suitable NVS (cid:0) E , k·k E (cid:1) , such that G g ⊂ M g ⊂ E . Then, by deﬁnition, it holds that M g is a convex hull of G g .For u ∈ E and r >

0, we deﬁne the open and closed balls of radius r , centered around u , by: B ( u, r ) = (cid:8) v ∈ E : k u − v k E < r (cid:9) ,and B ( u, r ) = (cid:8) v ∈ E : k u − v k E ≤ r (cid:9) ,respectively. For brevity, we also write B r = B (0 , r ) and B r = B (0 , r ). A set F ⊂ E is open, if for every u ∈ F , thereexists an r >

0, such that B ( u, r ) ⊂ F . We say that F is closed if its complement is open, and by deﬁnition, we say that E and the empty set are both closed and open.We call the smallest closed set containing F its closure, and we denote it by F . A sequence { u m } ⊂ E converges to u ∈ E , if lim m →∞ k u m − u k E = 0, and we denote it symbolically by lim m →∞ u m = u . That is, for every ǫ >

0, thereexists an N ( ǫ ) ∈ N , such that m ≥ N ( ǫ ) implies that k u m − u k E < ǫ .By Lemma 6, we can write the closure of F as F = n u ∈ E : u = lim m →∞ u m , u m ∈ F o and hence M g = n h ∈ E : h = lim m →∞ h gm , h gm ∈ M g o .Thus, by deﬁnition, it holds that M g is a closed and convex subset of E .If f ∈ C is a PDF on R n , we denote its support bysupp f = { x ∈ R n : f ( x ) = 0 } and furthermore, we denote the set of compactly supported continuous functions by C c = { f ∈ C : supp f is compact } .For open sets V ⊂ R n , we will write f ≺ V as shorthand for f ∈ C c , 0 ≤ f ≤

1, and supp f ⊂ V .The following lemmas permit us to construct the primary technical mechanism that is used to prove our main resultpresented in Theorem 2. Lemma 1.

Let f ∈ C be a PDF. Then, for every compact K ⊂ R n , we can choose h ∈ C c , such that supp h ⊂ B r , ≤ h ≤ f , and h = f on K , for some r ∈ R + .Proof. Since K is bounded, there exists some r ∈ R + , such that K ⊂ B r . Lemma 10 implies that there exists a function u ≺ B r , such that u ( x ) = 1, for all x ∈ K . We can then set h = uf to obtain the desired result of Lemma 1. Lemma 2.

Let h ∈ C c , such that supp h ⊂ B r , ≤ h , and R h d λ ≤ , and let g ∈ C be a PDF. Then, for any k ∈ R + ,there exists a sequence { h gm } ∞ m =1 ⊂ M g , so that lim m →∞ k g k ⋆ h − h gm k B ( B r ) = 0 . (1) Furthermore, if g ∈ C ub , we have the stronger result that lim m →∞ k g k ⋆ h − h gm k B = 0 . (2) roof. It suﬃces to show that given any r, k, ǫ ∈ R + , there exists a suﬃciently large m ( ǫ, r, k ) ∈ N such that for all m ≥ m ( ǫ, r, k ), there exists a h gm ∈ M gm satisfying k g k ⋆ h − h gm k B ( B r ) < ǫ . (3)First, write( g k ⋆ h ) ( x ) = Z g k ( x − y ) h ( y ) d λ ( y ) = Z { y : y ∈ B r } g k ( x − y ) h ( y ) d λ ( y )= Z { y : y ∈ B r } k n g ( kx − ky ) h ( y ) d λ ( y ) = Z { z : z ∈ B rk } g ( kx − z ) h (cid:18) zy (cid:19) d λ ( z ) ,where B rk is a continuous image of a compact set, and hence is also compact (cf. Rudin, 1976, Thm. 4.14). By Lemma11, for any δ >

0, there exist κ i ∈ R n ( i ∈ [ m − m ∈ N ), such that B rk ⊂ S m − i =1 B ( κ i , δ/ B δi = B δrk = B rk ∩ B ( κ i , δ/ B rk = S m − i =1 B δi . Hence, we can obtain a disjoint covering of B rk by taking A δ = B δ ,and A δi = B δi \ S i − j =1 B δj ( i ∈ [ m − B rk = S m − i =1 A δi , each A δi is aBorel set, and diam (cid:0) A δi (cid:1) ≤ δ , by construction.We shall denote the disjoint cover of B rk by Π δm = (cid:8) A δi (cid:9) m − i =1 . We seek to show that there exists an m ∈ N and Π δm ,such that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g k ⋆ h − m X i =1 c i k ni g ( k i x − z i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B ( B r ) < ǫ ,where k i = k , c i = k − n R { z : z ∈ A δi } h ( z/k ) d λ ( z ), and z i ∈ A δi , for i ∈ [ m − z m = 0 and c m = 1 − P m − i =1 c i .Here, c m depends only on r and ǫ . Suppose that c m >

0. Then, since g = 0, there exists some s ∈ R + such that C s = sup w ∈ B s g ( w ) >

0. We can choose k m = min ( sr , (cid:18) ǫ c m C s (cid:19) /n ) ,so that k g ( k m × · ) k B ( B r ) ≤ C s and k g ( k m × · ) k B ( B r ) ≤ c m ǫC s c m C s = ǫ/ g ∈ C ub , then there exists a constant C ∈ (0 , ∞ ) such that k g k B ≤ C . In this case, we canchoose k nm = ǫ/ (2 c m C ) to obtain k c m k nm g ( k m × · − z m ) k B ≤ ǫ/ ≤ h and R h d λ ∈ [0 , P m − i =1 c i satisﬁes the inequalities:0 ≤ m − X i =1 c i = k − n m − X i =1 Z { z : z ∈ A δi } h (cid:16) zk (cid:17) d λ ( z )= k − n Z { z : z ∈ k K } h (cid:16) zk (cid:17) d λ ( z ) = Z { x : x ∈ K } h d λ ≤ c m ∈ [0 , h gm implies that h gm = P mi =1 c i k ni g ( k i x − z i ) ∈ M gm .We can then bound the left-hand side of (3) as follows: k g k ⋆ h − h gm k B ( B r ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g k ⋆ h − m − X i =1 c i k ni g ( k i × · − z i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B ( B r ) + k c m k nm g ( k m × · − z m ) k B ( B r ) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) g k ⋆ h − m − X i =1 c i k ni g ( k i × · − z i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B ( B r ) + ǫ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)Z { z : z ∈ B rk } g ( kx − z ) h (cid:16) zk (cid:17) d λ ( z ) − m − X i =1 Z { z : z ∈ A δi } g ( kx − z ) h (cid:16) zk (cid:17) d λ ( z ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) B ( B r ) + ǫ ≤ m − X i =1 Z { z : z ∈ A δi } | g ( kx − z ) − g ( kx − z i ) | h (cid:16) zk (cid:17) d λ ( z ) + ǫ x ∈ B r , z ∈ A δi , and z i ∈ B rk , it holds that k kx − z i k = k kx − z k ≤ rk , and k kx − z − ( kx − z i ) k = k z − z i k ≤ diam (cid:16) A δi (cid:17) ≤ δ .Note that g ∈ C , and thus g is uniformly continuous on the compact set B rk , implying that | g ( kx − z ) − g ( kx − z i ) | ≤ w ( g, rk, δ ) , or each i ∈ [ m − w ( g, r, δ ) = sup (cid:8) | g ( x ) − g ( y ) | : k x − y k ≤ δ and x, y ∈ B r (cid:9) denotes a modulus of continuity. Since lim δ → w ( g, rk, δ ) = 0 (cf. Makarov and Podkorytov, 2013, Thm. 4.7.3), wemay choose a δ ( ǫ, r, k ) >

0, such that w ( g, rk, δ ( ǫ, r, k )) < ǫ k n .We then proceed from (4) as follows: k g k ⋆ h − h gm k B ( B r ) ≤ w ( g, rk, δ ( ǫ, r, k )) Z { z : z ∈ B rk } h (cid:16) zk (cid:17) d λ ( z ) + ǫ w ( g, rk, δ ( ǫ, r, k )) k n Z h d λ + ǫ ≤ w ( g, rk, δ ( ǫ, r, k )) k n + ǫ < ǫ ǫ ǫ . (5)To conclude the proof of (1), it suﬃces to choose an appropriate sequence of partitions Π δ ( ǫ,r,k ) m , such that m ≥ m ( ǫ, r, k ),for some suﬃciently large m ( ǫ, r, k ), so that (4) and (5) hold. This is possible via Lemma 11. When g ∈ C ub , we noticethat (4) and (5) both hold for all x ∈ R n . Thus, we have the stronger result of (2).We present the primary tools for proving Theorem (2) in the following pair of lemmas. Lemma 3 permits theapproximation of convolutions of the form g k ⋆ f in the L functional space, and Lemma 4 generalizes this ﬁrst result tothe spaces L p , where p ∈ [1 , ∞ ), under an essentially bounded assumption. Lemma 3. If f and g are PDFs in the NVS (cid:16) L , k·k L (cid:17) , then M g ⊂ L and g k ⋆f ∈ L , for every k ∈ R + . Furthermore,there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k g k ⋆ f − h gm k L = 0 .Proof. For any k ∈ R + , we can show that g k ∈ L , since k g k k L = Z g k d λ = Z k n g ( kx ) d λ ( x ) = Z g d λ = 1.If h gm ∈ M gm , then h gm ∈ L , since it is a ﬁnite sum of functions in L , and thus, M g ⊂ L . Note that since f is a PDF,we have f ∈ L , and by Lemma 13, we also have that g k ⋆ f ∈ L . By Lemma 14, it then follows that k g k ⋆ f k L = Z g k ⋆ f d λ = Z (cid:20)Z g k ( x − y ) f ( y ) d λ ( y ) (cid:21) d λ ( x )= Z (cid:20)Z g k ( x − y ) d λ ( x ) (cid:21) f ( y ) d λ ( y )= k g k k L k f k L = 1By deﬁnition of of the closure of M g in L , it suﬃces to show that for any k ∈ R + , g k ⋆ f ∈ M g . We seek acontradiction by assuming that g k ⋆ f / ∈ M g . Then, we can choose A = M g and B = { g k ⋆ f } so that A , B ⊂ L arenonempty convex subsets, such that A ∩ B = ∅ . Furthermore, A is closed and B is compact. By Lemma 7, there existsa continuous linear functional φ ∈ L ∗ , such that φ ( v ) < α < φ ( w ), for all v ∈ A and w ∈ B . By deﬁnition of B , for all v ∈ M g ⊂ L we have φ ( v ) < α < φ ( g k ⋆ f ) .By Lemma 9, with φ ∈ L ∗ , there exists a unique function u ∈ L ∞ , such that, for all v ∈ L , φ ( v ) = Z u ( x ) v ( x ) d λ ( x ) .If we let v = g k ( · − µ ) ∈ M g ⊂ L , then we obtain the inequalitiessup µ ∈ R n Z u ( x ) g k ( x − µ ) d λ ( x ) < α < Z u ( x ) ( g k ⋆ f ) ( x ) d λ ( x ) .The left-hand inequality can be reduced as follows: α < Z u ( x ) ( g k ⋆ f ) ( x ) d λ ( x )= Z u ( x ) (cid:20)Z g k ( x − µ ) f ( µ ) d λ ( µ ) (cid:21) d λ ( x )= Z f ( µ ) (cid:20)Z u ( x ) g k ( x − µ ) d λ ( x ) (cid:21) d λ ( µ ) < α Z f ( µ ) d λ ( µ ) = α , here the third line is due to Lemma 14 and the ﬁnal equality is because f is a PDF. This yields the sought contradiction. Lemma 4. If f, g ∈ L ∞ are PDFs in the NVS (cid:16) L ∞ , k·k L p (cid:17) , for p ∈ [1 , ∞ ) , then, M g ⊂ L p and g k ⋆ f ∈ L p , for any k ∈ R + . Furthermore, there exists a sequence { h gm } ∞ m =1 ⊂ M g , such that lim m →∞ k g k ⋆ f − h gm k L p = 0 .Proof. We obtain the result for p = 1 via Lemma 3. Otherwise, since g ∈ L ∩ L ∞ , we know that g ∈ L p and g k ∈ L p ,for each k ∈ R + , via Lemma 12. For any h gm ∈ M gm , we then have h gm ∈ L p via ﬁnite summation, and hence M g ∈ L p .Since f ∈ L , Lemma 13 implies that g k ⋆f ∈ L p . By deﬁnition of the closure of M g , it suﬃces to show that g k ⋆f ∈ M g ,for any k ∈ R + . This can be achieved by seeking a contradiction under the assumption that g k ⋆ f / ∈ M g and usingLemma 8 in the same manner as Lemma 9 is used in the proof of Lemma 3. To prove the statement (a) of Theorem 2, it suﬃces to show that there exists a suﬃciently large m ( ǫ, K ) ∈ N , such thatfor all m ≥ m ( ǫ, K ), there exists a h gm ∈ M gm , such that k f − h gm k B ( K ) < ǫ , for any ǫ > K ⊂ R n .First, Lemma 1 implies that we can choose a h ∈ C c , such that supp h ⊂ B r , 0 ≤ h ≤ f , and h = f on K , for some r >

0, where K ⊂ B r . We then have k f − h k B ( K ) = 0.Since h ∈ C c ⊂ C ub , Lemma 5 and Corollary 1 then imply that there exists a k ( ǫ ) ∈ R + , such that for all k ≥ k ( ǫ ), k h − g k ⋆ h k B ( K ) < ǫ/

2. We shall assume that k ≥ k ( ǫ ), from hereon in.Lemma 2 then implies that there exists an m ( ǫ, r, k ) ∈ N , such that for any m ≥ m ( ǫ, r, k ), there exists a h gm ∈ M gm ,such that k g k ⋆ h − h gm k B ( K ) < k g k ⋆ h − h gm k B ( B r ) < ǫ/

2. The triangle inequality then completes the proof.

To prove the statement (a) of Theorem 2, it suﬃces to show that there exists a suﬃciently large m ( ǫ ) ∈ N , such thatfor all m ≥ m ( ǫ ), there exists a h gm ∈ M gm , such that k f − h gm k L p < ǫ , for any ǫ > k ( ǫ ) ∈ R + , such that for any k ≥ k ( ǫ ), it follows that k f − g k ⋆ f k L p < ǫ/

2. We shall assume k ≥ k ( ǫ ), from hereon in.Lemmas 3 and 4 imply that there exists an m ( ǫ ) ∈ N , such that for all m ≥ m ( ǫ ), there exists a h gm ∈ M gm , suchthat k g k ⋆ f − h gm k L p < ǫ/

2. The triangle inequality then completes the proof.

A Technical results

We state a number of technical results that are used throughout the main text, in this Appendix. Sources for unprovedresults are provided at the end of the section.

Lemma 5.

Let { g k } be a sequence of PDFs in L , such that for every δ > , lim k →∞ Z { x : k x k >δ } g k d λ = 0 .Then, for f ∈ L p and p ∈ [1 , ∞ ) , lim k →∞ k g k ⋆ f − f k L p = 0 .Furthermore, for f ∈ C b and compact K ⊂ R n , lim k →∞ k g k ⋆ f − f k L ∞ ( K ) = 0 . The sequences { g k } of Lemma 5 are often referred to as approximate identities or approximations of identity (cf.Makarov and Podkorytov, 2013, Sec. 7.6). A typical construction of approximate identities is to consider the sequenceof dilations, of the form: g k ( · ) = k n g ( k × · ), which permits the following corollary. Corollary 1.

Let g be a PDF. Then, the sequence { g k : g k ( · ) = k n g ( k × · ) } satisﬁes the hypothesis of Lemma 5 andhence permits its conclusion. Lemma 6.

Let (cid:0) E , k·k E (cid:1) be an NVS, and let F ⊂ E and u ∈ E . Then the following statements are equivalent: (a) u ∈ F ;(b) B ( u, r ) ∩ F = ∅ , for all r > ; and (c) there exists a sequence { u m } ⊂ F that converges to u . Let E be a locally convex linear topological space over R and recall that a functional is a function deﬁned on E (orsome subspace of E ), with values in R . We denote the due space of E (the space of all continuous linear functions on E )by E ∗ . emma 7 (Second geometric form of the Hahn-Banach theorem) . Let A , B ⊂ E be two nonempty convex subsets, suchthat A ∩ B = ∅ . Assume that A is closed and that B is compact. Then, there exists a continuous linear functional φ ∈ E ∗ , such that its corresponding hyperplane H = { u ∈ E : φ ( u ) = α } ( α ∈ R ) strictly separates A and B . That is,there exists some ǫ > , such that φ ( u ) ≤ α − ǫ and φ ( v ) ≥ α + ǫ , for all u ∈ A and v ∈ B . Or, in other words, sup u ∈ A φ ( u ) < inf v ∈ B φ ( v ) . Lemma 8 (Riesz representation theorem for L p , p ∈ R + ) . If p ∈ R + , and φ ∈ ( L p ) ∗ , then, there exists a unique function u ∈ L q , such that for all v ∈ L q , φ ( v ) = Z u ( x ) v ( x ) d λ ( x ) ,where /p + 1 /q = 1 . Lemma 9 (Riesz representation theorem for L ) . If φ ∈ ( L ) ∗ , then there exists a unique u ∈ L ∞ , such that for all v ∈ L , φ ( v ) = Z u ( x ) v ( x ) d λ ( x ) . Lemma 10.

Let V , . . . , V n be open subsets of R n , and let K be a compact set, such that K ⊂ S ni =1 V i . Then, thereexists functions h i ≺ V i ( i ∈ [ n ] ), such that P ni =1 h i ( x ) = 1 , for all x ∈ K . The set { h i } is referred to as the partitionof unity on K , subordinated to the cover { V i } . Lemma 11. If X ⊂ R n is bounded, then for any r > , X can be covered by S mi =1 B ( x i , r ) , for some ﬁnite m ∈ N , where x i ∈ R n and i ∈ [ m ] . Lemma 12. If ≤ p ≤ q ≤ r ≤ ∞ , then L p ∩ L r ⊂ L q . Lemma 13. If f ∈ L p ( ≤ p ≤ ∞ ) and g ∈ L , then f ⋆ g exists and we have k f ⋆ g k L p ≤ k f k L p k f k L . Furthermore,if p and q are such that /p + 1 /q = 1 , then f ∈ L p and g ∈ L q , then f ⋆ g exists, is bounded and uniformly continuous,and k f ⋆ g k L ∞ ≤ k f k L p k f k L q . In particular, if p ∈ R + , then f ⋆ g ∈ C . Lemma 14 (Fubini’s Theorem) . Let ( X , X , ν ) and ( Y , Y , ν ) be σ -ﬁnite measure spaces, and assume that f is a ( X × Y ) -measurable function on X × Y . If Z X (cid:20)Z Y | f ( x, y ) | d ν ( x ) (cid:21) d ν ( y ) < ∞ ,then Z X × Y | f | d ( ν × ν ) = Z X (cid:20)Z Y | f ( x, y ) | d ν ( x ) (cid:21) d ν ( y ) = Z Y (cid:20)Z X | f ( x, y ) | d ν ( y ) (cid:21) d ν ( x ) < ∞ . Sources for results

Lemma 5 appears in Makarov and Podkorytov (2013, Thm. 9.3.3) and Cheney and Light (2000, Ch. 20, Thm. 2).Corollary 1 is obtained from Cheney and Light (2000, Ch. 20, Thm. 4). Lemmas 6, 12, and 13 are taken fromPropositions 0.22, 6.10, and 8.8 Folland (1999). Lemmas 7–9 appear in Brezis (2010) as Theorems 1.7, 4.11, and 4.14,respectively. Lemmas 10 and 14 can be found in Rudin (1987) as Theorems 2.13 and Theorem 8.8, respectively. Lemma11 is obtained from Conway (2012, Thm. 1.2.2).

B Acknowledgements

The authors would like to very much thank Pr. Eric Ricard for the interesting discussions with him and for his suggestions.TTN is supported by “Contrat doctoral” from the French Ministry of Higher Education and Research and by the FrenchNational Research Agency (ANR) grant SMILES ANR-18-CE40-0014. HDN and GJM are funded by Australian ResearchCouncil grant number DP180101192.

References

Bacharoglou, A. G. (2010). Approximation of probability distributions by convex mixtures of Gaussian measures.

Proceeding of the American Mathematical Society , 138:2619–2628. (Cited on page 2.)

Brezis, H. (2010).

Functional Analysis, Sobolev Spaces and Partial Diﬀerential Equations . Springer, New York. (Citedon pages 3 and 7.)

Cheney, W. and Light, W. (2000).

A Course in Approximation Theory . Brooks/Cole, Paciﬁc Grove. (Cited on pages 2,4, and 7.)

Conway, J. B. (2012).

A Course in Abstract Analysis . American Mathematical Society, Providence. (Cited on page 7.) asGupta, A. (2008). Asymptotic Theory of Statistics and Probability . Springer, New York. (Cited on page 2.)

Everitt, B. S. and Hand, D. J. (1981).

Finite Mixture Distributions . Chapman and Hall, London. (Cited on page 1.)

Folland, G. B. (1999).

Real Analysis: Modern Techniques and Their Applications . Wiley, New York. (Cited on page 7.)

Fruwirth-Schnatter, S. (2006).

Finite Mixture and Markov Switching Models . Springer, New York. (Cited on page 1.)

Fruwirth-Schnatter, S., Celeux, G., and Robert, C. P., editors (2019).

Handbook of Mixture Analysis . CRC Press, BocaRaton. (Cited on page 1.)

Lindsay, B. G. (1995). Mixture models: theory, geometry and applications. In

NSF-CBMS Regional Conference Seriesin Probability and Statistics . (Cited on page 1.) Makarov, B. and Podkorytov, A. (2013).

Real Analysis: Measures, Integrals and Applications . Springer, London. (Citedon pages 5, 6, and 7.)

McLachlan, G. J. and Basford, K. E. (1988).

Mixutre Models: Inference and Applications to Clustering . Marcel Dekker,New York. (Cited on page 1.)

McLachlan, G. J. and Peel, D. (2000).

Finite Mixture Models . Wiley, New York. (Cited on page 1.)

Mengersen, K. L., Robert, C., and Titterington, M., editors (2011).

Mixtures: Estimation and Applications . Wiley,Hoboken. (Cited on page 1.)

Nestoridis, V., Schmutzhard, S., and Stefanopoulos, V. (2011). Universal series inducd by approximate identities andsome relevant applications.

Journal of Approximation Theory , 163:1783–1797. (Cited on page 2.)

Nestoridis, V. and Stefanopoulos, V. (2007). Universal series and approximate identities. Technical report, Universityof Cyprus. (Cited on page 2.)

Nguyen, T. T., Nguyen, H. D., Chamroukhi, F., and McLachlan, G. J. (2020). Approximation by ﬁnite mixtures ofcontinuous density functions that vanish at inﬁnity.

Cogent Mathematics & Statistics , 7(1):1750861. (Cited on page 2.)

Rudin, W. (1976).

Principles of Mathematical Analysis . McGraw-Hill, New York. (Cited on page 4.)

Rudin, W. (1987).

Real and complex analysis . McGraw-Hill, New York. (Cited on page 7.)

Xu, Y., Light, W. A., and Cheney, E. W. (1993). Constructive methods of approximation by ridge functions and radialfunctions.

Numerical Algoritms , 4:205–223. (Cited on page 2.)(Cited on page 2.)