On the Identifiability of Latent Class Models for Multiple-Systems Estimation
aa r X i v : . [ m a t h . S T ] A ug On the Identifiability of Latent Class Models for Multiple-SystemsEstimation
Serge Aleshin-GuendelDepartment of Biostatistics, University of [email protected] 25, 2020
Abstract
Latent class models have recently become popular for multiple-systems estimation in human rightsapplications. However, it is currently unknown when a given family of latent class models is identifiablein this context. We provide necessary and sufficient conditions on the number of latent classes neededfor a family of latent class models to be identifiable. Along the way we provide a mechanism for verifyingidentifiability in a class of multiple-systems estimation models that allow for individual heterogeneity.Keywords: Capture-recapture; Heterogeneity; Population size estimation.
Multiple-systems estimation, also known as capture-recapture in ecological settings, is an approach to esti-mating hard to reach population sizes which has been used in a number of fields, including epidemiology,official statistics, and human rights (Hook & Regal, 1995; Bird & King, 2018; Ball & Price, 2019). In thissetting, multiple sources have incompletely sampled from a closed population of interest, and individualssampled by more than one source are able to be uniquely identified. The population size is estimated basedon the observed overlap of the sources using a model that describes how individuals were sampled by thesources. A common problem in multiple-systems estimation is that individuals may be sampled hetero-geneously, i.e. different individuals may have different probabilities of being sampled by each source. Inorder to come up with reliable estimates of the population size in these settings, one needs to take thisheterogeneity into account in the model describing the sampling process of the sources.Two classic models that incorporate individual heterogeneity are the M th and M h models (Otis et al.,1978). The M th model assumes that each individual is independently sampled by each source, conditionalon latent probabilities of being sampled by each source. The M h model additionally assumes that anindividual has the same probability of being sampled by each source. The M h model can be plausible inecological settings where researchers have the ability to design experiments where animals have the sameprobability of being sampled by each source. However, the assumption that an individual has the sameprobability of being sampled by each source is typically not plausible in human populations where sourcesoften use convenience samples (Ball & Price, 2019). This motivates the use of M th models in such settings.In particular, Manrique-Vallier (2016) recently proposed a family of latent class models with a large numberof classes, a type of M th model, which has become popular in human rights and other human populationsettings (Sadinle, 2018; Ball & Harrison, 2018; Ball et al., 2018; Manrique-Vallier et al., 2019; ´Angel & Ball,2019; Ball et al., 2019; Doshi et al., 2019; Okiria et al., 2019).When using either M h or M th models, it is known that one must restrict oneself to a parametric familyof models for identification, which has generated a literature characterizing identifiability in M h models(Huggins, 2001; Link, 2003; Holzmann et al., 2006; Link, 2006). Once identifiability is settled for a family ofmodels, one can begin discussing properties of population size estimates using that family, such as consistency(Sanathanan, 1972) or finite-sample risk (Johndrow et al., 2019). Currently, a literature does not existcharacterizing identifiability in M th models, even though a large number of parametric M th model families1ave been proposed in the literature (Agresti, 1994; Coull & Agresti, 1999; Fienberg et al., 1999; Pledger,2000; Bartolucci et al., 2004; Durban & Elston, 2005; King & Brooks, 2008).In this paper we partially close the gap between theory and methodology for M th models through twocontributions. The first contribution is a mechanism for verifying identifiability in M th models based onmoments of the distribution for the latent sampling probabilities. The second contribution is a necessaryand sufficient condition for families of latent class models to be identified. Our result shows that recentapplications using latent class models for multiple-systems estimation have been based on nonidentifiablefamilies of models. Suppose K ≥ N , of which only n < N aresampled by at least one source. We let H = { , } K denote the possible inclusion patterns of individuals inthe sources, H ∗ = H \{ } K denote the possible inclusion patterns of the n individuals sampled in at least onesource, and x i ∈ H denote the inclusion pattern for individual i . For example, if K = 4 and x i = (1 , , , i was sampled by sources 1 , , and 3, but not by source 4. The x i can be aggregated into a 2 K contingency table, with cells indexed by h ∈ H and cell counts n h = P Ni =1 I ( x i = h ). We do not observethe count for the cell { } K , n (0 ,..., = N − n . The target of inference is the population size N .In this article, we assume that that the individuals’ inclusion patterns follow an M th model (Otis et al.,1978), i.e. x i,k | λ i,k iid ∼ Bernoulli ( λ i,k ) , ( λ i, , . . . , λ i,K ) iid ∼ Q, (1)where Q ∈ Q and Q is a family of mixing distributions on (0 , K for the latent sampling probabilities. Underthis model, conditional on an individual’s sampling probabilities, ( λ i, , . . . , λ i,K ), the individual is sampledby each source k with probability λ i,k , independently of all other sources. By imposing the restriction that λ i,k ∈ (0 , K for each individual i and source k , we are assuming that each of the N individuals has non-zeroprobability of being sampled by at least one source. The M h model (Otis et al., 1978) is a submodel ofthe M th model which further assumes that the latent sampling probabilities are equal for all sources, i.e. λ i, = · · · = λ i,K . M th Models
Marginalizing over Q in (1), we find that the complete 2 K contingency table of counts is multinomiallydistributed, i.e. ( n h ) h ∈ H | N, π Q ∼ Multinomial ( N, π Q ) (2)where π Q = ( π Q, h ) h ∈ H , π Q, h = E Q { Q Kk =1 λ h k k (1 − λ k ) − h k } , and E Q denotes an expectation with respectto the mixing distribution Q . Throughout this article when a vector or matrix is indexed by h ∈ H or H ∗ we use the order given by viewing h as binary digits. For example, when K = 3 we order H as (0 , , , , , , , , , , , , , , , , n | N, π Q, ∼ Binomial ( N, − π Q, ) , ( n h ) h ∈ H ∗ | n, ˜ π Q ∼ Multinomial ( n, ˜ π Q ) , (3)where π Q, = π Q, (0 ,..., , ˜ π Q = (˜ π Q, h ) h ∈ H ∗ , and ˜ π Q, h = π Q, h / (1 − π Q, ). The multinomial likelihood for theobserved cell counts, ( n h ) h ∈ H ∗ , conditional on their sum, n , in (3) is referred to as the conditional likelihood(Fienberg, 1972). Intuitively, in order to estimate N , the conditional cell probabilities, ˜ π Q , estimated fromthe conditional likelihood need to determine the missing cell probability, π Q, . The following definition ofidentifiability, modified from Link (2003), codifies this intuition. If a family of distributions Q is identifiableaccording to this definition, then N can be consistently estimated within Q (Sanathanan, 1972).2 efinition 1. A family of distributions Q on (0 , K is identifiable if, for Q, R ∈ Q , ˜ π Q = ˜ π R implies that π Q, = π R, . Latent class models are a classical tool for the analysis of multivariate categorical data that describe popu-lations which can be stratified into J classes, in which the latent sampling probabilities are homogeneous forindividuals within each class (Goodman, 1974; Haberman, 1979). They form a special case of the M th model,where the mixing distribution is a discrete finite mixture, and have been used for multiple-systems estimationmany times (Agresti, 1994; Coull & Agresti, 1999; Pledger, 2000; Bartolucci et al., 2004; Manrique-Vallier,2016). We denote the family of latent class models with J classes by Q J = { Q = P Jj =1 ν Q,j Q Kk =1 δ λ Q,jk | ν Q,j ≥ , P Jj =1 ν Q,j = 1 , λ
Q,jk ∈ (0 , K } . It is currently unknown when Q J is identified. In this section, we aim to provide a mechanism for directly checking Definition 1, to verify identifiabilityof a given family Q . Before proving the main theorem of this section, we have the following lemma, whichtells us that cell probabilities for any M th model only depends on the mixing distribution, Q , through mixedmoments of Q . Lemma 1.
For any h ∈ H ∗ , π Q, h = P h ′ ∈ H ∗ c h , h ′ m Q, h ′ where c h , h ′ = ( − P Kk =1 h ′ k − h k Q Kk =1 I ( h k ≤ h ′ k ) and m Q, h ′ = E Q ( Q Kk =1 λ h ′ k k ) .Proof. For all h ∈ H ∗ , Q Kk =1 λ h k k (1 − λ k ) − h k = P h ′ ∈ H ∗ c h , h ′ Q Kk =1 λ h ′ k k by an application of the multi-binomial theorem. The result follows from taking the expectation over both sides with respect to Q .We can restate Lemma 1 in matrix form. Letting π ∗ Q = ( π Q, h ) h ∈ H ∗ and m Q = ( m Q, h ) h ∈ H ∗ , we havethat π ∗ Q = C m Q , where C = ( c h , h ′ ) h ∈ H ∗ , h ′ ∈ H ∗ . C is invertible as it is upper triangular with non-zerodiagonal entries. We are now ready to prove Theorem 1. Theorem 1.
For any two distributions
Q, R on (0 , K , ˜ π Q = ˜ π R is equivalent to m Q = A m R for some A > .Proof. ˜ π Q = ˜ π R is equivalent to π ∗ Q / (1 − π Q, ) = π ∗ R / (1 − π R, ) . Rearranging terms we have that π ∗ Q = π ∗ R (1 − π Q, ) / (1 − π R, ) , and thus π ∗ Q = A π ∗ R , where A = (1 − π Q, ) / (1 − π R, ) >
0. Using Lemma 1, thisis equivalent to C m Q = AC m R , and thus m Q = A m R due to the invertibility of C .The immediate consequence of Theorem 1 is that to verify identifiability of a family Q , one can demon-strate that if m Q = A m R for some Q, R ∈ Q , then π Q, = π R, . We use this mechanism in the next sectionto characterize when latent class models are identifiable. To provide necessary and sufficient conditions for the family of J -class latent class models, Q J , to beidentifiable, we restrict the family defined in Section 2.3 to Q J = { Q = P Jj =1 ν Q,j Q Kk =1 δ λ Q,jk | ν Q,j ≥ , P Jj =1 ν Q,j = 1 , λ
Q,jk ∈ (0 , K , λ Q,jk = λ Q,j ′ k for j = j ′ } . This restriction makes the mild assumptionthat each class’ sampling probabilities are distinct, which simplifies the proof of Theorem 2. Loosening thisrestriction could only make the conditions on J for Q J to be identifiable stricter, and thus the conclusionswe reach in the following section would still stand for families where this restriction is violated.There are J ( K + 1) − Q J , thus when Q J is identifiable, J satisfies J ( K + 1) − ≤ K − π Q , are 2 K − J must satisfya stricter condition for Q J to be identifiable. 3 heorem 2. Q J is identifiable iff J ≤ K .Proof. We will first show that if 2 J ≤ K , then Q J is identifiable. The proof of this direction is similar inspirit to the proofs of Theorem 2 in Holzmann et al. (2006) and Theorem 1 in Pezzott et al. (2019), whichwere both concerned with characterizing the identifiability of the M h analogue of Q J . Assume 2 J ≤ K , andlet Q, R ∈ Q J such that m Q = A m R for some A >
0, so that we have the following system of equations: J X j =1 ν Q,j K Y k =1 λ h k Q,jk − A J X j =1 ν R,j K Y k =1 λ h k R,jk = 0 ( h ∈ H ∗ ) . (4)Let I Q = { j | λ Q,j ( λ R, , . . . , λ R,J ) } and I R = { j | λ R,j ( λ Q, , . . . , λ Q,J ) } , where λ Q,j = ( λ Q,j , . . . , λ Q,jK )and λ R,j = ( λ R,j , . . . , λ R,jK ). We can then rewrite (4) as J X j =1 y j K Y k =1 λ h k Q,jk − A J X i ∈I R ν R,j K Y k =1 λ h k R,jk = 0 ( h ∈ H ∗ ) , (5)where y j = ν Q,j if j ∈ I Q and y j = ν Q,j − Aν R,j ′ for some j ′ ∈ { , . . . , J } \ I R otherwise. Letting m = |I R | = |I Q | and labelling the elements of I R as i , . . . , i m , the system of equations in (5) can be writtenin matrix form as Λ y = 0, whereΛ = λ Q, K · · · λ Q,JK λ R,i K · · · λ R,i m K ... . . . ... ... . . . ... Q Kk =1 λ h k Q, k · · · Q Kk =1 λ h k Q,Jk Q Kk =1 λ h k R,i k · · · Q Kk =1 λ h k R,i m k ... . . . ... ... . . . ... Q Kk =1 λ Q, k · · · Q Kk =1 λ h k Q,Jk Q Kk =1 λ R,i k · · · Q Kk =1 λ R,i m k , y = y ... y J − Aν R,i ... − Aν R,i m , and the rows of Λ are indexed by h ∈ H ∗ . In Appendix 1, we prove that Λ is full rank, and thus y = 0, forany m ∈ { , . . . , J } . The proof of this direction concludes by examining three possible cases. Case 1.
Suppose m = 0 , i.e. for each j ∈ { , . . . , J } , there exists some j ′ ∈ { , . . . , J } such that λ Q,j = λ R,j ′ and ν Q,j = Aν R,j ′ . As P Jj =1 ν Q,j = P Jj =1 ν R,j = 1 , this implies that A = 1 and thus π Q, = π R, . Case 2.
Suppose m ∈ { , . . . , J − } , i.e. for each j ∈ { , . . . , J } \ I Q , there exists some j ′ ∈ { , . . . , J } \ I R such that λ Q,j = λ R,j ′ and ν Q,j = Aν R,j ′ . Further, for each j ∈ I Q and j ′ ∈ I R ν Q,j = ν R,j ′ = 0 . We canthus ignore the classes j ∈ I Q and j ′ ∈ I R . As P Jj =1 ν Q,j = P Jj =1 ν R,j = 1 , this implies that A = 1 andthus π Q, = π R, . Case 3.
Suppose m = J , i.e. for each j ∈ { , . . . , J } , there exists no j ′ ∈ { , . . . , J } such that λ Q,j = λ R,j ′ .Then ν Q,j = ν R,j = 0 for j ∈ { , . . . , J } , which is a contradiction. We will now show that if 2
J > K , then Q J is not identifiable. To do so we will provide explicit Q, R ∈ Q J such that π Q, = π R, , but m Q = A m R for A >
0. This counterexample is modified fromTahmasebi et al. (2018), who studied identifiability of families of latent class models outside of the multiple-systems estimation context where n (0 ,..., is observed. Choose J such that 2 J > K . For j ∈ { , . . . , J } , let ν Q,j = (cid:0) J j (cid:1) / (2 J − −
1) and ν R,j = (cid:0) J j − (cid:1) / (2 J − ). For j ∈ { , . . . , J } and k ∈ { , . . . , K } , let λ Q,jk = α (2 j )and λ R,jk = α (2 j −
1) where 0 < α < / (2 J ). We thus have that Q, R ∈ Q J , where clearly Q = R . InAppendix 2 we prove that for these choices of Q, R , m Q = A m R for A > A = 1, and thus π Q, = π R, . Recently, Manrique-Vallier (2016) proposed to use a family of latent class models with an infinite number ofclasses, i.e. Q ∞ = ∪ ∞ J =1 Q J , for multiple-systems estimation. In practice, Manrique-Vallier (2016) restrictedthe actual family used to Q J ∗ for some large J ∗ , for computational purposes. Theorem 2 tells us that such4 family is nonidentifiable if 2 J ∗ > K . Manrique-Vallier (2016) suggested setting J ∗ = K , which alwaysresults in a nonidentifiable family. In the R (R Core Team, 2019) package LCMCR (Manrique-Vallier, 2020)which implements the methodology of Manrique-Vallier (2016), the default value of J ∗ is 5. Unless one isworking with at least K = 10 sources, which is rare outside of ecological applications, the family being usedwill not be identifiable. Extensions of Manrique-Vallier (2016), such as Manrique-Vallier et al. (2019) andKang et al. (2020), share the same problem with nonidentifiability when too many latent classes are used.In their discussion, Manrique-Vallier (2016) write, “[a]s Fienberg (1972) warns, multiple-recapture es-timation — as any other extrapolation technique — relies on the untestable assumption that the modelthat describes the observed counts also applies to the unobserved ones.” However, the problem is graverthan this when working with a nonidentifiable family Q , as there can be multiple models that describe theobserved counts. For example in the simplest case, consider data from K = 2 sources generated from thetwo-class latent class model Q with parameters given in Table 1. Under Q , ˜ π Q, (0 , = 0 . π Q, (1 , = 0 . π Q, (1 , = 0 . π Q, = 0 . R , with parame-ters given in Table 1, such that ˜ π Q = ˜ π R but π R, = 0 . Q is not identified, if wetry to perform estimation within Q , which contains the true data generating model, there is no guaranteethat we can estimate well, in any traditional sense, the cell probabilities and population size which generatedthe data. In particular, nonidentifiability precludes consistent estimation as “there will be uncertainty inparameter estimates that is not washed out as more data are collected” (Linero, 2017). The proof of Theorem2 shows us that such an example can be constructed whenever 2 J > K .Table 1: Parameters of two latent class models which produce identical conditional cell probabilities, butdifferent missing cell probabilities ν ν λ λ λ λ Q R K = 3 sources and used J ∗ = 10 latent classes. Ball et al. (2019) used J ∗ = 5 latent classes to produce results for six different strata,in which four of the strata had less than K = 10 sources. Thus both of these applications presented resultsusing nonidentifiable families of latent class models. In all of the other applications there were there lessthan K = 10 sources. Thus, if the default setting of J ∗ = 5 in the R package LCMCR was used, or any other J ∗ not satisfying Theorem 2, none of the families used were identified. Moving forward, we believe that it isimperative that families of models used for multiple-systems estimation in such sensitive contexts are knownto be identified. References
Agresti, A. (1994). Simple capture-recapture models permitting unequal catchability and variable samplingeffort.
Biometrics , 494–500. 5
Angel, V. R. & Ball, P. (2019). Killings of social movement leaders in Colombia: an estimation of thetotal population of victims-update 2018. Tech. rep., Human Rights Data Analysis Group.
Ball, P. & Asher, J. (2002). Statistics and Slobodan: Using data analysis and statistics in the war crimestrial of former President Milosevic.
Chance , 17–24. Ball, P. , Coronel, S. , Padilla, M. & Mora, D. (2019). Drug-Related Killings in the Philippines. Tech.rep., Human Rights Data Analysis Group and the Stabile Center for Investigative Journalism.
Ball, P. & Harrison, F. (2018). How many people disappeared on 17–19 May 2009 in Sri Lanka? Tech.rep., Human Rights Data Analysis Group.
Ball, P. , Hee-Seok Shin, E. & Yang, H. (2018). There may have been 14 undocumented Korean “comfortwomen” in Palembang, Indonesia. Tech. rep., Human Rights Data Analysis Group and Transitional JusticeWorking Group.
Ball, P. & Price, M. (2018). The statistics of genocide.
CHANCE , 38–45. Ball, P. & Price, M. (2019). Using statistics to assess lethal violence in civil and inter-state war.
Annualreview of statistics and its application , 63–84. Bartolucci, F. , Mira, A. & Scaccia, L. (2004). Answering two biological questions with a latent classmodel via MCMC applied to capture-recapture data. In
Applied Bayesian statistical studies in biology andmedicine . Springer, pp. 7–23.
Bird, S. M. & King, R. (2018). Multiple systems estimation (or capture-recapture estimation) to informpublic policy.
Annual review of statistics and its application , 95–118. Coull, B. A. & Agresti, A. (1999). The use of mixed logit models to reflect heterogeneity in capture-recapture studies.
Biometrics , 294–301. Doshi, R. H. , Apodaca, K. , Ogwal, M. , Bain, R. , Amene, E. , Kiyingi, H. , Aluzimbi, G. , Musinguzi,G. , Serwadda, D. , McIntyre, A. F. et al. (2019). Estimating the size of key populations in Kampala,Uganda: 3-source capture-recapture study.
JMIR public health and surveillance , e12118. Durban, J. W. & Elston, D. A. (2005). Mark-recapture with occasion and individual effects: abun-dance estimation through Bayesian model selection in a fixed dimensional parameter space.
Journal ofagricultural, biological, and environmental statistics , 291. Fienberg, S. E. (1972). The multiple recapture census for closed populations and incomplete 2 k contingencytables. Biometrika , 591–603. Fienberg, S. E. , Johnson, M. S. & Junker, B. W. (1999). Classical multilevel and Bayesian approachesto population size estimation using multiple lists.
Journal of the Royal Statistical Society: Series A(Statistics in Society) , 383–405.
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiablemodels.
Biometrika , 215–231. Haberman, S. J. (1979).
Analysis of Qualitative Data. Volume 2, New Developments . Academic Press.
Holzmann, H. , Munk, A. & Zucchini, W. (2006). On identifiability in capture–recapture models.
Bio-metrics , 934–936. Hook, E. B. & Regal, R. R. (1995). Capture-recapture methods in epidemiology: methods and limita-tions.
Epidemiologic reviews , 243–264. Huggins, R. (2001). A note on the difficulties associated with the analysis of capture–recapture experimentswith heterogeneous capture probabilities.
Statistics & probability letters , 147–152.6 ohndrow, J. , Lum, K. & Manrique-Vallier, D. (2019). Low-risk population size estimates in thepresence of capture heterogeneity.
Biometrika , 197–210.
Kang, S. , Gile, K. & Price, M. (2020). Nested Dirichlet Process For Population Size Estimation FromMulti-list Recapture Data. arXiv preprint arXiv:2007.06160 . King, R. & Brooks, S. (2008). On the Bayesian estimation of a closed population size in the presence ofheterogeneity and model uncertainty.
Biometrics , 816–824. Linero, A. R. (2017). Bayesian nonparametric analysis of longitudinal studies in the presence of informativemissingness.
Biometrika , 327–341.
Link, W. A. (2003). Nonidentifiability of population size from capture-recapture data with heterogeneousdetection probabilities.
Biometrics , 1123–1130. Link, W. A. (2006). Rejoinder to” On Identifiability in Capture-Recapture Models”.
Biometrics ,936–939. Manrique-Vallier, D. (2016). Bayesian population size estimation using Dirichlet process mixtures.
Biometrics , 1246–1254. Manrique-Vallier, D. (2020).
LCMCR: Bayesian Non-Parametric Latent-Class Capture-Recapture . Rpackage version 0.4.11.
Manrique-Vallier, D. , Ball, P. & Sulmont, D. (2019). Estimating the Number of Fatal Victims ofthe Peruvian Internal Armed Conflict, 1980-2000: an application of modern multi-list Capture-Recapturetechniques. arXiv preprint arXiv:1906.04763 . Okiria, A. G. , Bolo, A. , Achut, V. , Arkangelo, G. C. , Michael, A. T. I. , Katoro, J. S. , Wesson,J. , Gutreuter, S. , Hundley, L. & Hakim, A. (2019). Novel approaches for estimating female sexworker population size in conflict-affected South Sudan.
JMIR public health and surveillance , e11576. Otis, D. L. , Burnham, K. P. , White, G. C. & Anderson, D. R. (1978). Statistical inference fromcapture data on closed animal populations.
Wildlife monographs , 3–135.
Pezzott, G. L. M. , Salasar, L. E. B. , Leite, J. G. & Louzada-Neto, F. (2019). A note on identifia-bility and maximum likelihood estimation for a heterogeneous capture-recapture model.
Communicationsin Statistics-Theory and Methods , 1–21.
Pledger, S. (2000). Unified maximum likelihood estimates for closed capture–recapture models usingmixtures.
Biometrics , 434–442. R Core Team (2019).
R: A Language and Environment for Statistical Computing . R Foundation forStatistical Computing, Vienna, Austria.
Sadinle, M. (2018). Bayesian propagation of record linkage uncertainty into population size estimation ofhuman rights violations.
The Annals of Applied Statistics , 1013–1038. Sanathanan, L. (1972). Estimating the size of a multinomial population.
The Annals of MathematicalStatistics , 142–152.
Tahmasebi, B. , Motahari, S. A. & Maddah-Ali, M. A. (2018). On the Identifiability of Finite Mixturesof Finite Product Measures. arXiv preprint arXiv:1807.05444 .7 ppendix 1 We will prove that Λ is full rank for any m ∈ { , . . . , J } by proving a stronger result. Recall that K ≥ x ℓk ∈ (0 ,
1) for ℓ ∈ { , . . . , K } and k ∈ { , . . . , K } , such that x ℓk = x ℓk ′ for k = k ′ . Let X K = x K · · · x KK ... . . . ... Q Kk =1 x h k k · · · Q Kk =1 x h k Kk ... . . . ... Q Kk =1 x k · · · Q Kk =1 x h k Kk , where the rows of X K are indexed by h ∈ H ∗ . We will show that X K is full rank by induction on K . Thisimplies that Λ is full rank, as J + m ≤ J ≤ K by assumption for any m ∈ { , . . . , J } .For the base case when K = 2, verifying X is full rank is straightforward. Assume that X K − is fullrank. Let v ∈ R K × be such that X K v = 0. For each h ∈ { h ′ ∈ H ∗ | h ′ K = 0 } we have that v K Q K − k =1 x h k Kk = − P K − ℓ =1 v ℓ Q K − k =1 x h k ℓk , which implies that P K − ℓ =1 v ℓ ( x ℓK − x KK ) Q K − k =1 x h k ℓk = 0. For ℓ ∈ { , . . . , K − } , let v ′ ℓ = v ℓ ( x ℓK − x KK ) and v ′ = ( v ′ , . . . , v ′ K − ). This leads to the system of equations X K − v ′ = 0. Bythe inductive assumption, v ′ = 0. Since x ℓK = x KK for ℓ ∈ { , . . . , K − } , we have that v ℓ = 0 for ℓ ∈ { , . . . , K − } , and thus v K = 0. Appendix 2
We will now prove that m Q, h = Am R, h for all h ∈ H ∗ , where A = (2 J − ) / (2 J − − = 1. Define thefunction h ( x ) = (1 − e αx ) J = P Ji =0 (cid:0) Ji (cid:1) ( − i e αix . For t ∈ { , . . . , K } , we can differentiate the series repre-sentation of h to find that h ( t ) ( x ) = P Ji =0 (cid:0) Ji (cid:1) ( − i ( αi ) t e αix and thus h ( t ) ( x ) | x =0 = P Ji =0 (cid:0) Ji (cid:1) ( − i ( αi ) t = P Ji =1 (cid:0) Ji (cid:1) ( − i ( αi ) t . We can alternatively differentiate the non-series representation of h using the fact that t ≤ K < J and the chain rule for higher order derivatives to find that h ( t ) ( x ) | x =0 = 0. Let h ∈ H ∗ and t = P Kk =1 h k ∈ { , . . . , K } . The desired result follows as m Q, h − Am R, h = J X j =1 ν Q,j K Y k =1 λ h k Q,jk − A J X j =1 ν R,j K Y k =1 λ h k R,jk = J X j =1 (cid:18) J j (cid:19) (2 J − − − K Y k =1 { α (2 j ) } h k − A J X j =1 (cid:18) J j − (cid:19) (2 J − ) − K Y k =1 { α (2 j − } h k = (2 J − − − J X i =1 (cid:18) Ji (cid:19) ( − i ( αi ) t = (2 J − − − { h ( t ) ( x ) | x =0 } = 0 ..