[PDF] On the Identifiability of Latent Class Models for Multiple-Systems Estimation

Abstract

Latent class models have recently become popular for multiple-systems estimation in human rights applications. However, it is currently unknown when a given family of latent class models is identifiable in this context. We provide necessary and sufficient conditions on the number of latent classes needed for a family of latent class models to be identifiable. Along the way we provide a mechanism for verifying identifiability in a class of multiple-systems estimation models that allow for individual heterogeneity.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug On the Identiﬁability of Latent Class Models for Multiple-SystemsEstimation

Serge Aleshin-GuendelDepartment of Biostatistics, University of [email protected] 25, 2020

Abstract

Latent class models have recently become popular for multiple-systems estimation in human rightsapplications. However, it is currently unknown when a given family of latent class models is identiﬁablein this context. We provide necessary and suﬃcient conditions on the number of latent classes neededfor a family of latent class models to be identiﬁable. Along the way we provide a mechanism for verifyingidentiﬁability in a class of multiple-systems estimation models that allow for individual heterogeneity.Keywords: Capture-recapture; Heterogeneity; Population size estimation.

Multiple-systems estimation, also known as capture-recapture in ecological settings, is an approach to esti-mating hard to reach population sizes which has been used in a number of ﬁelds, including epidemiology,oﬃcial statistics, and human rights (Hook & Regal, 1995; Bird & King, 2018; Ball & Price, 2019). In thissetting, multiple sources have incompletely sampled from a closed population of interest, and individualssampled by more than one source are able to be uniquely identiﬁed. The population size is estimated basedon the observed overlap of the sources using a model that describes how individuals were sampled by thesources. A common problem in multiple-systems estimation is that individuals may be sampled hetero-geneously, i.e. diﬀerent individuals may have diﬀerent probabilities of being sampled by each source. Inorder to come up with reliable estimates of the population size in these settings, one needs to take thisheterogeneity into account in the model describing the sampling process of the sources.Two classic models that incorporate individual heterogeneity are the M th and M h models (Otis et al.,1978). The M th model assumes that each individual is independently sampled by each source, conditionalon latent probabilities of being sampled by each source. The M h model additionally assumes that anindividual has the same probability of being sampled by each source. The M h model can be plausible inecological settings where researchers have the ability to design experiments where animals have the sameprobability of being sampled by each source. However, the assumption that an individual has the sameprobability of being sampled by each source is typically not plausible in human populations where sourcesoften use convenience samples (Ball & Price, 2019). This motivates the use of M th models in such settings.In particular, Manrique-Vallier (2016) recently proposed a family of latent class models with a large numberof classes, a type of M th model, which has become popular in human rights and other human populationsettings (Sadinle, 2018; Ball & Harrison, 2018; Ball et al., 2018; Manrique-Vallier et al., 2019; ´Angel & Ball,2019; Ball et al., 2019; Doshi et al., 2019; Okiria et al., 2019).When using either M h or M th models, it is known that one must restrict oneself to a parametric familyof models for identiﬁcation, which has generated a literature characterizing identiﬁability in M h models(Huggins, 2001; Link, 2003; Holzmann et al., 2006; Link, 2006). Once identiﬁability is settled for a family ofmodels, one can begin discussing properties of population size estimates using that family, such as consistency(Sanathanan, 1972) or ﬁnite-sample risk (Johndrow et al., 2019). Currently, a literature does not existcharacterizing identiﬁability in M th models, even though a large number of parametric M th model families1ave been proposed in the literature (Agresti, 1994; Coull & Agresti, 1999; Fienberg et al., 1999; Pledger,2000; Bartolucci et al., 2004; Durban & Elston, 2005; King & Brooks, 2008).In this paper we partially close the gap between theory and methodology for M th models through twocontributions. The ﬁrst contribution is a mechanism for verifying identiﬁability in M th models based onmoments of the distribution for the latent sampling probabilities. The second contribution is a necessaryand suﬃcient condition for families of latent class models to be identiﬁed. Our result shows that recentapplications using latent class models for multiple-systems estimation have been based on nonidentiﬁablefamilies of models. Suppose K ≥ N , of which only n < N aresampled by at least one source. We let H = { , } K denote the possible inclusion patterns of individuals inthe sources, H ∗ = H \{ } K denote the possible inclusion patterns of the n individuals sampled in at least onesource, and x i ∈ H denote the inclusion pattern for individual i . For example, if K = 4 and x i = (1 , , , i was sampled by sources 1 , , and 3, but not by source 4. The x i can be aggregated into a 2 K contingency table, with cells indexed by h ∈ H and cell counts n h = P Ni =1 I ( x i = h ). We do not observethe count for the cell { } K , n (0 ,..., = N − n . The target of inference is the population size N .In this article, we assume that that the individuals’ inclusion patterns follow an M th model (Otis et al.,1978), i.e. x i,k | λ i,k iid ∼ Bernoulli ( λ i,k ) , ( λ i, , . . . , λ i,K ) iid ∼ Q, (1)where Q ∈ Q and Q is a family of mixing distributions on (0 , K for the latent sampling probabilities. Underthis model, conditional on an individual’s sampling probabilities, ( λ i, , . . . , λ i,K ), the individual is sampledby each source k with probability λ i,k , independently of all other sources. By imposing the restriction that λ i,k ∈ (0 , K for each individual i and source k , we are assuming that each of the N individuals has non-zeroprobability of being sampled by at least one source. The M h model (Otis et al., 1978) is a submodel ofthe M th model which further assumes that the latent sampling probabilities are equal for all sources, i.e. λ i, = · · · = λ i,K . M th Models

Marginalizing over Q in (1), we ﬁnd that the complete 2 K contingency table of counts is multinomiallydistributed, i.e. ( n h ) h ∈ H | N, π Q ∼ Multinomial ( N, π Q ) (2)where π Q = ( π Q, h ) h ∈ H , π Q, h = E Q { Q Kk =1 λ h k k (1 − λ k ) − h k } , and E Q denotes an expectation with respectto the mixing distribution Q . Throughout this article when a vector or matrix is indexed by h ∈ H or H ∗ we use the order given by viewing h as binary digits. For example, when K = 3 we order H as (0 , , , , , , , , , , , , , , , , n | N, π Q, ∼ Binomial ( N, − π Q, ) , ( n h ) h ∈ H ∗ | n, ˜ π Q ∼ Multinomial ( n, ˜ π Q ) , (3)where π Q, = π Q, (0 ,..., , ˜ π Q = (˜ π Q, h ) h ∈ H ∗ , and ˜ π Q, h = π Q, h / (1 − π Q, ). The multinomial likelihood for theobserved cell counts, ( n h ) h ∈ H ∗ , conditional on their sum, n , in (3) is referred to as the conditional likelihood(Fienberg, 1972). Intuitively, in order to estimate N , the conditional cell probabilities, ˜ π Q , estimated fromthe conditional likelihood need to determine the missing cell probability, π Q, . The following deﬁnition ofidentiﬁability, modiﬁed from Link (2003), codiﬁes this intuition. If a family of distributions Q is identiﬁableaccording to this deﬁnition, then N can be consistently estimated within Q (Sanathanan, 1972).2 eﬁnition 1. A family of distributions Q on (0 , K is identiﬁable if, for Q, R ∈ Q , ˜ π Q = ˜ π R implies that π Q, = π R, . Latent class models are a classical tool for the analysis of multivariate categorical data that describe popu-lations which can be stratiﬁed into J classes, in which the latent sampling probabilities are homogeneous forindividuals within each class (Goodman, 1974; Haberman, 1979). They form a special case of the M th model,where the mixing distribution is a discrete ﬁnite mixture, and have been used for multiple-systems estimationmany times (Agresti, 1994; Coull & Agresti, 1999; Pledger, 2000; Bartolucci et al., 2004; Manrique-Vallier,2016). We denote the family of latent class models with J classes by Q J = { Q = P Jj =1 ν Q,j Q Kk =1 δ λ Q,jk | ν Q,j ≥ , P Jj =1 ν Q,j = 1 , λ

Q,jk ∈ (0 , K } . It is currently unknown when Q J is identiﬁed. In this section, we aim to provide a mechanism for directly checking Deﬁnition 1, to verify identiﬁabilityof a given family Q . Before proving the main theorem of this section, we have the following lemma, whichtells us that cell probabilities for any M th model only depends on the mixing distribution, Q , through mixedmoments of Q . Lemma 1.

For any h ∈ H ∗ , π Q, h = P h ′ ∈ H ∗ c h , h ′ m Q, h ′ where c h , h ′ = ( − P Kk =1 h ′ k − h k Q Kk =1 I ( h k ≤ h ′ k ) and m Q, h ′ = E Q ( Q Kk =1 λ h ′ k k ) .Proof. For all h ∈ H ∗ , Q Kk =1 λ h k k (1 − λ k ) − h k = P h ′ ∈ H ∗ c h , h ′ Q Kk =1 λ h ′ k k by an application of the multi-binomial theorem. The result follows from taking the expectation over both sides with respect to Q .We can restate Lemma 1 in matrix form. Letting π ∗ Q = ( π Q, h ) h ∈ H ∗ and m Q = ( m Q, h ) h ∈ H ∗ , we havethat π ∗ Q = C m Q , where C = ( c h , h ′ ) h ∈ H ∗ , h ′ ∈ H ∗ . C is invertible as it is upper triangular with non-zerodiagonal entries. We are now ready to prove Theorem 1. Theorem 1.

For any two distributions

Q, R on (0 , K , ˜ π Q = ˜ π R is equivalent to m Q = A m R for some A > .Proof. ˜ π Q = ˜ π R is equivalent to π ∗ Q / (1 − π Q, ) = π ∗ R / (1 − π R, ) . Rearranging terms we have that π ∗ Q = π ∗ R (1 − π Q, ) / (1 − π R, ) , and thus π ∗ Q = A π ∗ R , where A = (1 − π Q, ) / (1 − π R, ) >

0. Using Lemma 1, thisis equivalent to C m Q = AC m R , and thus m Q = A m R due to the invertibility of C .The immediate consequence of Theorem 1 is that to verify identiﬁability of a family Q , one can demon-strate that if m Q = A m R for some Q, R ∈ Q , then π Q, = π R, . We use this mechanism in the next sectionto characterize when latent class models are identiﬁable. To provide necessary and suﬃcient conditions for the family of J -class latent class models, Q J , to beidentiﬁable, we restrict the family deﬁned in Section 2.3 to Q J = { Q = P Jj =1 ν Q,j Q Kk =1 δ λ Q,jk | ν Q,j ≥ , P Jj =1 ν Q,j = 1 , λ

Q,jk ∈ (0 , K , λ Q,jk = λ Q,j ′ k for j = j ′ } . This restriction makes the mild assumptionthat each class’ sampling probabilities are distinct, which simpliﬁes the proof of Theorem 2. Loosening thisrestriction could only make the conditions on J for Q J to be identiﬁable stricter, and thus the conclusionswe reach in the following section would still stand for families where this restriction is violated.There are J ( K + 1) − Q J , thus when Q J is identiﬁable, J satisﬁes J ( K + 1) − ≤ K − π Q , are 2 K − J must satisfya stricter condition for Q J to be identiﬁable. 3 heorem 2. Q J is identiﬁable iﬀ J ≤ K .Proof. We will ﬁrst show that if 2 J ≤ K , then Q J is identiﬁable. The proof of this direction is similar inspirit to the proofs of Theorem 2 in Holzmann et al. (2006) and Theorem 1 in Pezzott et al. (2019), whichwere both concerned with characterizing the identiﬁability of the M h analogue of Q J . Assume 2 J ≤ K , andlet Q, R ∈ Q J such that m Q = A m R for some A >

0, so that we have the following system of equations: J X j =1 ν Q,j K Y k =1 λ h k Q,jk − A J X j =1 ν R,j K Y k =1 λ h k R,jk = 0 ( h ∈ H ∗ ) . (4)Let I Q = { j | λ Q,j ( λ R, , . . . , λ R,J ) } and I R = { j | λ R,j ( λ Q, , . . . , λ Q,J ) } , where λ Q,j = ( λ Q,j , . . . , λ Q,jK )and λ R,j = ( λ R,j , . . . , λ R,jK ). We can then rewrite (4) as J X j =1 y j K Y k =1 λ h k Q,jk − A J X i ∈I R ν R,j K Y k =1 λ h k R,jk = 0 ( h ∈ H ∗ ) , (5)where y j = ν Q,j if j ∈ I Q and y j = ν Q,j − Aν R,j ′ for some j ′ ∈ { , . . . , J } \ I R otherwise. Letting m = |I R | = |I Q | and labelling the elements of I R as i , . . . , i m , the system of equations in (5) can be writtenin matrix form as Λ y = 0, whereΛ =  λ Q, K · · · λ Q,JK λ R,i K · · · λ R,i m K ... . . . ... ... . . . ... Q Kk =1 λ h k Q, k · · · Q Kk =1 λ h k Q,Jk Q Kk =1 λ h k R,i k · · · Q Kk =1 λ h k R,i m k ... . . . ... ... . . . ... Q Kk =1 λ Q, k · · · Q Kk =1 λ h k Q,Jk Q Kk =1 λ R,i k · · · Q Kk =1 λ R,i m k  , y =  y ... y J − Aν R,i ... − Aν R,i m  , and the rows of Λ are indexed by h ∈ H ∗ . In Appendix 1, we prove that Λ is full rank, and thus y = 0, forany m ∈ { , . . . , J } . The proof of this direction concludes by examining three possible cases. Case 1.

Suppose m = 0 , i.e. for each j ∈ { , . . . , J } , there exists some j ′ ∈ { , . . . , J } such that λ Q,j = λ R,j ′ and ν Q,j = Aν R,j ′ . As P Jj =1 ν Q,j = P Jj =1 ν R,j = 1 , this implies that A = 1 and thus π Q, = π R, . Case 2.

Suppose m ∈ { , . . . , J − } , i.e. for each j ∈ { , . . . , J } \ I Q , there exists some j ′ ∈ { , . . . , J } \ I R such that λ Q,j = λ R,j ′ and ν Q,j = Aν R,j ′ . Further, for each j ∈ I Q and j ′ ∈ I R ν Q,j = ν R,j ′ = 0 . We canthus ignore the classes j ∈ I Q and j ′ ∈ I R . As P Jj =1 ν Q,j = P Jj =1 ν R,j = 1 , this implies that A = 1 andthus π Q, = π R, . Case 3.

Suppose m = J , i.e. for each j ∈ { , . . . , J } , there exists no j ′ ∈ { , . . . , J } such that λ Q,j = λ R,j ′ .Then ν Q,j = ν R,j = 0 for j ∈ { , . . . , J } , which is a contradiction. We will now show that if 2

J > K , then Q J is not identiﬁable. To do so we will provide explicit Q, R ∈ Q J such that π Q, = π R, , but m Q = A m R for A >

0. This counterexample is modiﬁed fromTahmasebi et al. (2018), who studied identiﬁability of families of latent class models outside of the multiple-systems estimation context where n (0 ,..., is observed. Choose J such that 2 J > K . For j ∈ { , . . . , J } , let ν Q,j = (cid:0) J j (cid:1) / (2 J − −

1) and ν R,j = (cid:0) J j − (cid:1) / (2 J − ). For j ∈ { , . . . , J } and k ∈ { , . . . , K } , let λ Q,jk = α (2 j )and λ R,jk = α (2 j −

1) where 0 < α < / (2 J ). We thus have that Q, R ∈ Q J , where clearly Q = R . InAppendix 2 we prove that for these choices of Q, R , m Q = A m R for A > A = 1, and thus π Q, = π R, . Recently, Manrique-Vallier (2016) proposed to use a family of latent class models with an inﬁnite number ofclasses, i.e. Q ∞ = ∪ ∞ J =1 Q J , for multiple-systems estimation. In practice, Manrique-Vallier (2016) restrictedthe actual family used to Q J ∗ for some large J ∗ , for computational purposes. Theorem 2 tells us that such4 family is nonidentiﬁable if 2 J ∗ > K . Manrique-Vallier (2016) suggested setting J ∗ = K , which alwaysresults in a nonidentiﬁable family. In the R (R Core Team, 2019) package LCMCR (Manrique-Vallier, 2020)which implements the methodology of Manrique-Vallier (2016), the default value of J ∗ is 5. Unless one isworking with at least K = 10 sources, which is rare outside of ecological applications, the family being usedwill not be identiﬁable. Extensions of Manrique-Vallier (2016), such as Manrique-Vallier et al. (2019) andKang et al. (2020), share the same problem with nonidentiﬁability when too many latent classes are used.In their discussion, Manrique-Vallier (2016) write, “[a]s Fienberg (1972) warns, multiple-recapture es-timation — as any other extrapolation technique — relies on the untestable assumption that the modelthat describes the observed counts also applies to the unobserved ones.” However, the problem is graverthan this when working with a nonidentiﬁable family Q , as there can be multiple models that describe theobserved counts. For example in the simplest case, consider data from K = 2 sources generated from thetwo-class latent class model Q with parameters given in Table 1. Under Q , ˜ π Q, (0 , = 0 . π Q, (1 , = 0 . π Q, (1 , = 0 . π Q, = 0 . R , with parame-ters given in Table 1, such that ˜ π Q = ˜ π R but π R, = 0 . Q is not identiﬁed, if wetry to perform estimation within Q , which contains the true data generating model, there is no guaranteethat we can estimate well, in any traditional sense, the cell probabilities and population size which generatedthe data. In particular, nonidentiﬁability precludes consistent estimation as “there will be uncertainty inparameter estimates that is not washed out as more data are collected” (Linero, 2017). The proof of Theorem2 shows us that such an example can be constructed whenever 2 J > K .Table 1: Parameters of two latent class models which produce identical conditional cell probabilities, butdiﬀerent missing cell probabilities ν ν λ λ λ λ Q R K = 3 sources and used J ∗ = 10 latent classes. Ball et al. (2019) used J ∗ = 5 latent classes to produce results for six diﬀerent strata,in which four of the strata had less than K = 10 sources. Thus both of these applications presented resultsusing nonidentiﬁable families of latent class models. In all of the other applications there were there lessthan K = 10 sources. Thus, if the default setting of J ∗ = 5 in the R package LCMCR was used, or any other J ∗ not satisfying Theorem 2, none of the families used were identiﬁed. Moving forward, we believe that it isimperative that families of models used for multiple-systems estimation in such sensitive contexts are knownto be identiﬁed. References

Agresti, A. (1994). Simple capture-recapture models permitting unequal catchability and variable samplingeﬀort.

Biometrics , 494–500. 5

Angel, V. R. & Ball, P. (2019). Killings of social movement leaders in Colombia: an estimation of thetotal population of victims-update 2018. Tech. rep., Human Rights Data Analysis Group.

Ball, P. & Asher, J. (2002). Statistics and Slobodan: Using data analysis and statistics in the war crimestrial of former President Milosevic.

Chance , 17–24. Ball, P. , Coronel, S. , Padilla, M. & Mora, D. (2019). Drug-Related Killings in the Philippines. Tech.rep., Human Rights Data Analysis Group and the Stabile Center for Investigative Journalism.

Ball, P. & Harrison, F. (2018). How many people disappeared on 17–19 May 2009 in Sri Lanka? Tech.rep., Human Rights Data Analysis Group.

Ball, P. , Hee-Seok Shin, E. & Yang, H. (2018). There may have been 14 undocumented Korean “comfortwomen” in Palembang, Indonesia. Tech. rep., Human Rights Data Analysis Group and Transitional JusticeWorking Group.

Ball, P. & Price, M. (2018). The statistics of genocide.

CHANCE , 38–45. Ball, P. & Price, M. (2019). Using statistics to assess lethal violence in civil and inter-state war.

Annualreview of statistics and its application , 63–84. Bartolucci, F. , Mira, A. & Scaccia, L. (2004). Answering two biological questions with a latent classmodel via MCMC applied to capture-recapture data. In

Applied Bayesian statistical studies in biology andmedicine . Springer, pp. 7–23.

Bird, S. M. & King, R. (2018). Multiple systems estimation (or capture-recapture estimation) to informpublic policy.

Annual review of statistics and its application , 95–118. Coull, B. A. & Agresti, A. (1999). The use of mixed logit models to reﬂect heterogeneity in capture-recapture studies.

Biometrics , 294–301. Doshi, R. H. , Apodaca, K. , Ogwal, M. , Bain, R. , Amene, E. , Kiyingi, H. , Aluzimbi, G. , Musinguzi,G. , Serwadda, D. , McIntyre, A. F. et al. (2019). Estimating the size of key populations in Kampala,Uganda: 3-source capture-recapture study.

JMIR public health and surveillance , e12118. Durban, J. W. & Elston, D. A. (2005). Mark-recapture with occasion and individual eﬀects: abun-dance estimation through Bayesian model selection in a ﬁxed dimensional parameter space.

Journal ofagricultural, biological, and environmental statistics , 291. Fienberg, S. E. (1972). The multiple recapture census for closed populations and incomplete 2 k contingencytables. Biometrika , 591–603. Fienberg, S. E. , Johnson, M. S. & Junker, B. W. (1999). Classical multilevel and Bayesian approachesto population size estimation using multiple lists.

Journal of the Royal Statistical Society: Series A(Statistics in Society) , 383–405.

Goodman, L. A. (1974). Exploratory latent structure analysis using both identiﬁable and unidentiﬁablemodels.

Biometrika , 215–231. Haberman, S. J. (1979).

Analysis of Qualitative Data. Volume 2, New Developments . Academic Press.

Holzmann, H. , Munk, A. & Zucchini, W. (2006). On identiﬁability in capture–recapture models.

Bio-metrics , 934–936. Hook, E. B. & Regal, R. R. (1995). Capture-recapture methods in epidemiology: methods and limita-tions.

Epidemiologic reviews , 243–264. Huggins, R. (2001). A note on the diﬃculties associated with the analysis of capture–recapture experimentswith heterogeneous capture probabilities.

Statistics & probability letters , 147–152.6 ohndrow, J. , Lum, K. & Manrique-Vallier, D. (2019). Low-risk population size estimates in thepresence of capture heterogeneity.

Biometrika , 197–210.

Kang, S. , Gile, K. & Price, M. (2020). Nested Dirichlet Process For Population Size Estimation FromMulti-list Recapture Data. arXiv preprint arXiv:2007.06160 . King, R. & Brooks, S. (2008). On the Bayesian estimation of a closed population size in the presence ofheterogeneity and model uncertainty.

Biometrics , 816–824. Linero, A. R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informativemissingness.

Biometrika , 327–341.

Link, W. A. (2003). Nonidentiﬁability of population size from capture-recapture data with heterogeneousdetection probabilities.

Biometrics , 1123–1130. Link, W. A. (2006). Rejoinder to” On Identiﬁability in Capture-Recapture Models”.

Biometrics ,936–939. Manrique-Vallier, D. (2016). Bayesian population size estimation using Dirichlet process mixtures.

Biometrics , 1246–1254. Manrique-Vallier, D. (2020).

LCMCR: Bayesian Non-Parametric Latent-Class Capture-Recapture . Rpackage version 0.4.11.

Manrique-Vallier, D. , Ball, P. & Sulmont, D. (2019). Estimating the Number of Fatal Victims ofthe Peruvian Internal Armed Conﬂict, 1980-2000: an application of modern multi-list Capture-Recapturetechniques. arXiv preprint arXiv:1906.04763 . Okiria, A. G. , Bolo, A. , Achut, V. , Arkangelo, G. C. , Michael, A. T. I. , Katoro, J. S. , Wesson,J. , Gutreuter, S. , Hundley, L. & Hakim, A. (2019). Novel approaches for estimating female sexworker population size in conﬂict-aﬀected South Sudan.

JMIR public health and surveillance , e11576. Otis, D. L. , Burnham, K. P. , White, G. C. & Anderson, D. R. (1978). Statistical inference fromcapture data on closed animal populations.

Wildlife monographs , 3–135.

Pezzott, G. L. M. , Salasar, L. E. B. , Leite, J. G. & Louzada-Neto, F. (2019). A note on identiﬁa-bility and maximum likelihood estimation for a heterogeneous capture-recapture model.

Communicationsin Statistics-Theory and Methods , 1–21.

Pledger, S. (2000). Uniﬁed maximum likelihood estimates for closed capture–recapture models usingmixtures.

Biometrics , 434–442. R Core Team (2019).

R: A Language and Environment for Statistical Computing . R Foundation forStatistical Computing, Vienna, Austria.

Sadinle, M. (2018). Bayesian propagation of record linkage uncertainty into population size estimation ofhuman rights violations.

The Annals of Applied Statistics , 1013–1038. Sanathanan, L. (1972). Estimating the size of a multinomial population.

The Annals of MathematicalStatistics , 142–152.

Tahmasebi, B. , Motahari, S. A. & Maddah-Ali, M. A. (2018). On the Identiﬁability of Finite Mixturesof Finite Product Measures. arXiv preprint arXiv:1807.05444 .7 ppendix 1 We will prove that Λ is full rank for any m ∈ { , . . . , J } by proving a stronger result. Recall that K ≥ x ℓk ∈ (0 ,

1) for ℓ ∈ { , . . . , K } and k ∈ { , . . . , K } , such that x ℓk = x ℓk ′ for k = k ′ . Let X K =  x K · · · x KK ... . . . ... Q Kk =1 x h k k · · · Q Kk =1 x h k Kk ... . . . ... Q Kk =1 x k · · · Q Kk =1 x h k Kk  , where the rows of X K are indexed by h ∈ H ∗ . We will show that X K is full rank by induction on K . Thisimplies that Λ is full rank, as J + m ≤ J ≤ K by assumption for any m ∈ { , . . . , J } .For the base case when K = 2, verifying X is full rank is straightforward. Assume that X K − is fullrank. Let v ∈ R K × be such that X K v = 0. For each h ∈ { h ′ ∈ H ∗ | h ′ K = 0 } we have that v K Q K − k =1 x h k Kk = − P K − ℓ =1 v ℓ Q K − k =1 x h k ℓk , which implies that P K − ℓ =1 v ℓ ( x ℓK − x KK ) Q K − k =1 x h k ℓk = 0. For ℓ ∈ { , . . . , K − } , let v ′ ℓ = v ℓ ( x ℓK − x KK ) and v ′ = ( v ′ , . . . , v ′ K − ). This leads to the system of equations X K − v ′ = 0. Bythe inductive assumption, v ′ = 0. Since x ℓK = x KK for ℓ ∈ { , . . . , K − } , we have that v ℓ = 0 for ℓ ∈ { , . . . , K − } , and thus v K = 0. Appendix 2

We will now prove that m Q, h = Am R, h for all h ∈ H ∗ , where A = (2 J − ) / (2 J − − = 1. Deﬁne thefunction h ( x ) = (1 − e αx ) J = P Ji =0 (cid:0) Ji (cid:1) ( − i e αix . For t ∈ { , . . . , K } , we can diﬀerentiate the series repre-sentation of h to ﬁnd that h ( t ) ( x ) = P Ji =0 (cid:0) Ji (cid:1) ( − i ( αi ) t e αix and thus h ( t ) ( x ) | x =0 = P Ji =0 (cid:0) Ji (cid:1) ( − i ( αi ) t = P Ji =1 (cid:0) Ji (cid:1) ( − i ( αi ) t . We can alternatively diﬀerentiate the non-series representation of h using the fact that t ≤ K < J and the chain rule for higher order derivatives to ﬁnd that h ( t ) ( x ) | x =0 = 0. Let h ∈ H ∗ and t = P Kk =1 h k ∈ { , . . . , K } . The desired result follows as m Q, h − Am R, h = J X j =1 ν Q,j K Y k =1 λ h k Q,jk − A J X j =1 ν R,j K Y k =1 λ h k R,jk = J X j =1 (cid:18) J j (cid:19) (2 J − − − K Y k =1 { α (2 j ) } h k − A J X j =1 (cid:18) J j − (cid:19) (2 J − ) − K Y k =1 { α (2 j − } h k = (2 J − − − J X i =1 (cid:18) Ji (cid:19) ( − i ( αi ) t = (2 J − − − { h ( t ) ( x ) | x =0 } = 0 ..