[PDF] Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

Abstract

Finite mixture models have been widely used to model and analyze data from a heterogeneous populations. Moreover, data of this kind can be missing or subject to some upper and/or lower detection limits because of the restriction of experimental apparatuses. Another complication arises when measures of each population depart significantly from normality, for instance, asymmetric behavior. For such data structures, we propose a robust model for censored and/or missing data based on finite mixtures of multivariate skew-normal distributions. This approach allows us to model data with great flexibility, accommodating multimodality and skewness, simultaneously, depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM- type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the truncated multivariate skew-normal distributions. Furthermore, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed method. The proposed algorithm and method are implemented in the new R package CensMFM.

Full PDF

aa r X i v : . [ s t a t . M E ] S e p F INITE MIXTUR E MODELING OF C ENSORED AND MISSING DATAUSING THE MULTIVARIATE SKEW - NORMAL DISTR IBUTION

Francisco H. C. de Alencar

Departamento de EstatísticaUniversidade Estadual de CampinasCampinas, Brazil [email protected]

Christian E. Galarza

Departamento de EstadísticaEscuela Superior Politecnica del LitoralGuayaquil, Ecuador [email protected]

Larissa A. Matos

Departamento de EstatísticaUniversidade Estadual de CampinasCampinas, Brazil [email protected]

Victor H. Lachos

Department of StatisticsUniversity of ConnecticutStorrs CT 06269, U.S.A. [email protected]

September 24, 2020 A BSTRACT

Finite mixture models have been widely used to model and analyze data from a heterogeneous pop-ulations. Moreover, data of this kind can be missing or subject to some upper and/or lower detectionlimits because of the restriction of experimental apparatuses. Another complication arises whenmeasures of each population depart signiﬁcantly from normality, for instance, asymmetric behavior.For such data structures, we propose a robust model for censored and/or missing data based on ﬁnitemixtures of multivariate skew-normal distributions. This approach allows us to model data with greatﬂexibility, accommodating multimodality and skewness, simultaneously, depending on the structureof the mixture components. We develop an analytically simple, yet efﬁcient, EM-type algorithm forconducting maximum likelihood estimation of the parameters. The algorithm has closed-form ex-pressions at the E-step that rely on formulas for the mean and variance of the truncated multivariateskew-normal distributions. Furthermore, a general information-based method for approximating theasymptotic covariance matrix of the estimators is also presented. Results obtained from the analysisof both simulated and real datasets are reported to demonstrate the effectiveness of the proposedmethod. The proposed algorithm and method are implemented in the new R package CensMFM . K eywords Censored data · Detection limit · EM-type algorithms · Finite mixture models · Multivariate skew-normaldistribution · Truncated distributions.

Modeling based on ﬁnite mixture distributions is a rapidly developing area with a wide range of applications. Finitemixture models are now applied in such diverse areas as biology, biometrics, genetics, medicine and marketing, amongothers. There are various features of ﬁnite mixture distributions that make them useful in statistical modeling. Forinstance, statistical models which are based on ﬁnite mixture distributions capture many speciﬁc properties of real datasuch as multimodality, skewness, kurtosis, and unobserved heterogeneity. The importance of mixture distributions canbe noted from the large number of books on mixtures, including [31], [14], [29], [20] and [9].In many research areas, such as environmental pollution and infectious diseases measurements often exhibit complexfeatures such as censored responses and missing values [24, 23]. Moreover, the proportion of censoring in these stud-ies may be substantial, so the use of crude/ad hoc methods, such as substituting a threshold value or some arbitrary

PREPRINT - S

EPTEMBER

24, 2020point like a midpoint between zero and cutoff for detection, might lead to biased estimates of the model parameters.Furthermore, multivariate data are commonly seen with simultaneous occurrence of multimodality and skewness andinferential procedures become complicated when the data exhibit these features. The mixture distribution can be usedquite effectively to analyze this kind of data. [21] proposed a ﬂexible mixture modeling framework using the multivari-ate skew-normal distribution, where a feasible EM algorithm is developed for ﬁnding the maximum likelihood (ML)estimates. In the context of ﬁnite mixtures for correlated censored data, [16] proposed a Gaussian mixture model toﬂexibly approximate the underlying distribution of the observed data, where an EM algorithm in a multivariate settingwas developed to cope with the censored data. More recently, [19] proposed a robust model for censored data based onﬁnite mixtures of multivariate Student-t distributions (FM-MtC model), including the implementation of an exact EMalgorithm for ML estimation. This approach allows modeling data with great ﬂexibility, accommodating multimodal-ity, and kurtosis depending on the structure of the mixture components. These methods are undoubtedly very ﬂexible,but the problems related to the simultaneous occurrence of skewness, anomaly observations and multimodality remain.Even when modeling using Student-t mixtures, overestimation of the number of components necessary to capture theasymmetric nature of each subpopulation can occur [11]. So far, to the best of our knowledge there are no studiessimultaneously accounting for multivariate censored responses, missing values, heterogeneity and skewness.In this article, we propose a robust mixture model for censored data based on the multivariate skew-normal distributionso that the FM-MSNC model is deﬁned and a fully likelihood-based approach is carried out, including the implemen-tation of an exact EM-type algorithm for the ML estimation. The interval censoring mechanism of the proposed modelallows us to handle missing and censored values simultaneously. We show that the E-step reduces to computing theﬁrst two moments of a truncated multivariate skew-normal distribution. The general formulas for these moments werederived efﬁciently by [15], for which we use the

MomTrunc package in R . The likelihood function is easily computedas a byproduct of the E-step and is used for monitoring convergence and for model selection. Furthermore, we considera general information-based method for obtaining the asymptotic covariance matrix of the ML estimate. The methodproposed in this paper is implemented in the R package CensMFM , which is available for download from the CRANrepository.The remainder of the paper is organized as follows. In Section 2, we brieﬂy discuss some preliminary results relatedto the multivariate extended skew-normal (ESN) and related truncated extended skew-normal (TESN) distributions, inaddition, to some of their key properties are presented. In section 3, we present the multivariate skew-normal censored(MSNC) model and the related ML estimation. In Section 4, we introduce the robust FM-MSNC model, includingthe EM algorithm for ML estimation, and derive the empirical information matrix analytically to obtain the standarderrors. In Sections 5 and 6, numerical examples using both simulated and real data, respectively, are given to illustratethe performance of the proposed method. Finally, some concluding remarks are presented in Section 7.

In this subsection we present the skew-normal distribution and some of its properties. We say that a p × randomvector Y follows a multivariate SN distribution with p × location vector µ , p × p positive deﬁnite dispersion matrix Σ and p × skewness parameter vector λ ∈ R p , and we write Y ∼ SN p ( µ , Σ , λ ) , if its pdf is given by SN p ( y ; µ , Σ , λ ) = 2 φ p ( y ; µ , Σ )Φ ( λ ⊤ Σ − / ( y − µ )) , (1) where Φ ( · ) represents the cumulative distribution function (cdf) of the standard univariate normal distribution. If λ = then (1) reduces to the symmetric N p ( µ , Σ ) pdf which is denoted by φ p ( y ; µ , Σ ) . Except by a straightforwarddifference in the parameterization considered in (1), this model corresponds to that introduced by [5], whose propertieswere extensively studied in[4] [see also, 2]. Proposition 1 If Y ∼ SN p ( µ , Σ , λ ) , then for any y ∈ R p F Y ( y ) = P ( Y ≤ y ) = 2Φ p +1 (cid:0) ( z ⊤ , ⊤ ; , Ω (cid:1) , (2) where z = y − µ and Ω = (cid:18) Σ − ∆ − ∆ ⊤ (cid:19) , with ∆ = Σ / λ / (1 + λ ⊤ λ ) / . It is worth mentioning that the multivariate skew-normal distribution is not closed to marginalization and conditioning.Next we present its extended version which has these properties, called the multivariate ESN distribution.

We say that a p × random vector Y follows an ESN distribution with p × location vector µ , p × p positivedeﬁnite dispersion matrix Σ , a p × skewness parameter vector λ ∈ R p , and shift parameter τ ∈ R , denoted by2 PREPRINT - S

EPTEMBER

24, 2020 Y ∼ ESN p ( µ , Σ , λ , τ ) , if its pdf is given by ESN p ( y ; µ , Σ , λ , τ ) = ξ − φ p ( y ; µ , Σ )Φ ( τ + λ ⊤ Σ − / ( y − µ )) , (3) with ξ = Φ ( τ / (1 + λ ⊤ λ ) / ) . Note that when τ = 0 , we retrieve the skew-normal distribution deﬁned in (1), that is, ESN p ( y ; µ , Σ , λ , ≡ SN p ( y ; µ , Σ , λ ) . It is also interesting to note that ESN p ( y ; µ , Σ , λ , τ ) −→ φ p ( y ; µ , Σ ) , a s τ → + ∞ . The following propositions are crucial to develop our methods. The proofs are given in [3].

Proposition 2

Let Y ∼ ESN p ( µ , Σ , λ , τ ) and Y is partitioned as Y = ( Y ⊤ , Y ⊤ ) ⊤ of dimensions p and p ( p + p = p ) , respectively. Let Σ = (cid:18) Σ Σ Σ Σ (cid:19) , µ = ( µ ⊤ , µ ⊤ ) ⊤ , λ = ( λ ⊤ , λ ⊤ ) ⊤ and ϕ = ( ϕ ⊤ , ϕ ⊤ ) ⊤ be the corresponding partitions of Σ , µ , λ and ϕ = Σ − / λ . Then, Y ∼ ESN p ( µ , Σ , c Σ / ˜ ϕ , c τ ) , Y | Y = y ∼ ESN p ( µ . , Σ . , Σ / . ϕ , τ . ) where c = (1 + ϕ ⊤ Σ . ϕ ) − / , ˜ ϕ = ϕ + Σ − Σ ϕ , Σ . = Σ − Σ Σ − Σ , µ . = µ + Σ Σ − ( y − µ ) and τ . = τ + ˜ ϕ ⊤ ( y − µ ) . Proposition 3 If Y ∼ ESN p ( µ , Σ , λ , τ ) , then for any y ∈ R p F Y ( y ) = P ( Y ≤ y ) = Φ p +1 (cid:0) ( z ⊤ , ˜ τ ) ⊤ ; , Ω (cid:1) Φ (˜ τ ) , (4) with z and Ω as deﬁned in Proposition 1, and ˜ τ = τ / (1 + λ ⊤ λ ) / . Hereafter, for Y ∼ ESN p ( µ , Σ , λ , τ ) , we will denote its cdf as F Y ( y ) ≡ ˜Φ p ( y ; µ , Σ , λ ,τ ) for simplicity.Let A be a Borel set in R p . We say that the random vector Y has a truncated extended skew-normal distribution on A when Y has the same distribution as Y | ( Y ∈ A ) . In this case, the pdf of Y is given by f ( y | µ , Σ , ν ; A ) = ESN p ( y ; µ , Σ , λ , τ ) P ( Y ∈ A ) A ( y ) , where A is the indicator function of A . We use the notation Y ∼ TESN p ( µ , Σ , λ , τ ; A ) . If A has the form A = { ( x , . . . , x p ) ∈ R p : a ≤ x ≤ b , . . . , a p ≤ x p ≤ b p } = { x ∈ R p : a ≤ x ≤ b } , (5) then we use the notation { Y ∈ A } = { a ≤ Y ≤ b } , where a = ( a , . . . , a p ) ⊤ and b = ( b , . . . , b p ) ⊤ . Here, we saythat the distribution of Y is doubly truncated. Analogously we deﬁne { Y ≥ a } and { Y ≤ b } . Thus, we say that thedistribution of Y is truncated from below and truncated from above, respectively. For convenience, we also use thenotation Y ∼ TESN p ( µ , Σ , λ , τ ; [ a , b ]) . In particular, we denote W to follow a truncated p -variate normal distributionon [ a , b ] as W ∼ TN p ( µ , Σ ; [ a , b ]) .For the general doubly truncated case, we deﬁne the normalizing constant L p ( a , b ; µ , Σ , λ , τ ) = P ( a ≤ Y ≤ b ) as L p ( a , b ; µ , Σ , λ , τ ) = Z ba ESN p ( y ; µ , Σ , λ , τ )d y . When all λ and τ are equal to zero, we have a normal integral L p ( a , b ; µ , Σ , ,

0) = L p ( a , b ; µ , Σ ) = R ba φ p ( y ; µ , Σ ) d y .Note that we use calligraphic style L p when we work with the skewed extended version and Roman style L p for thesymmetric case.The following properties of the truncated multivariate extended skew-normal distributions are useful for implementa-tion of the EM-algorithm. The proofs are given in [15]. Proposition 4

Let Y ∼ T ESN p ( µ , Σ , λ , τ ; [ a , b ]) . For any measurable function g ( · ) , we have that E (cid:20) g ( Y ) φ ( τ + λ ⊤ Σ − / ( Y − µ ))Φ ( τ + λ ⊤ Σ − / ( Y − µ )) (cid:21) = η L p ( a , b ; µ − µ b , Γ ) L p ( a , b ; µ , Σ , λ , τ ) E [ g ( W )] , (6) with η = φ ( τ ; 0 , λ ⊤ λ ) /ξ , µ b = ˜ τ ∆ , Γ = Σ − ∆∆ ⊤ and W ∼ T N p ( µ − µ b , Γ ; [ a , b ]) . PREPRINT - S

EPTEMBER

24, 2020

Proposition 5

Let Y ∼ ESN p ( µ , Σ , λ , τ ; [ a , b ]) , where Y is partitioned as Y = ( Y ⊤ , Y ⊤ ) ⊤ of dimensions p and p ( p + p = p ), with corresponding partitions of a , b , µ , Σ , λ and ϕ . Then, for any measurable function g ( · ) , we havethat E Y (cid:20) g ( Y ) φ ( τ + λ ⊤ Σ − / ( Y − µ ))Φ ( τ + λ ⊤ Σ − / ( Y − µ )) (cid:12)(cid:12)(cid:12)(cid:12) Y (cid:21) = η . L . L . E [ g ( W )] , (7) where L . = L p ( a , b ; µ . − µ b . , Γ . ) , L . = L p ( a , b ; µ . , Σ . , λ . , τ . ) and W ∼ T N p ( µ . − µ b . , Γ . , [ a , b ]) with λ . = Σ / . ϕ , µ . , Σ . , and τ . as in proposition 2, and η . , µ b . and Γ . canbe computed as expressions η , µ b and Γ in proposition 4 but using the new set of parameters µ . , Σ . , λ . and τ . (instead of µ , Σ , λ and τ ). Observe that Propositions 4 and 5 depend on formulas for g ( Y ) , where Y ∼ T ESN ( µ , Σ , λ , τ ; [ a , b ]) . Closed formexpressions for these expectations were obtained recently by [15], for which the meanvarTMD() function of the RMomTrunc library can be used.

Now we present the robust multivariate skew-normal model for censored data. So, we write Y , . . . , Y n ∼ SN p ( µ , Σ , λ ) , (8) where for each i ∈ { , . . . , n } , Y i = ( Y i , . . . , Y ip ) ⊤ is a p × vector of responses for sample unit i , µ = ( µ , . . . , µ p ) ⊤ isthe location vector and the dispersion matrix Σ = Σ ( α ) depends on an unknown and reduced parameter vector α andskewness parameter λ . We assume that Y , . . . , Y n are independent and identically distributed. We consider a similarapproach to that proposed by [19] to model the censored responses. Thus, the observed data for the i th subject aregiven by ( V i , C i ) , where each element of V i = ( V i , . . . , V ip ) ⊤ represents either the vector of uncensored observations ( V ik = V i ) or the interval censoring level ( V ik ∈ [ V ik , V ik ]) , and C i = ( C i , . . . , C ip ) ⊤ is the vector of censoringindicators, satisfying C ik = (cid:26) if V ik ≤ Y ik ≤ V ik ;0 if Y ik = V i . (9) for all i ∈ { , . . . , n } and k ∈ { , . . . , p } , i.e., C ik = 1 if Y ik is located within a speciﬁc interval. In this case, (8) and(9) deﬁne the multivariate skew-normal interval censored model (hereafter, the MSNC model). Missing observationscan be handled by considering V ik = −∞ and V ik = + ∞ . Let y = ( y ⊤ , . . . , y ⊤ n ) ⊤ , where y i = ( y i , . . . , y ip ) ⊤ is a realization of Y i ∼ SN p ( µ , Σ , λ ) . To obtain the likelihood function of the MSNC model, we ﬁrst treat the observed and censored components of y i ,separately, i.e., y i = ( y o ⊤ i , y c ⊤ i ) ⊤ , where C ik = 0 for all elements in the p oi -dimensional vector y oi , and C ik = 1 for allelements in the p ci -dimensional vector y ci . Accordingly, we write V i = vec( V oi , V ci ) , where V ci = ( V c i , V c i ) with µ i = ( µ o ⊤ i , µ c ⊤ i ) ⊤ , Σ = Σ ( α ) = Σ ooi Σ oci Σ coi Σ cci ! , λ i = ( λ o ⊤ i , λ c ⊤ i ) ⊤ and ϕ i = ( ϕ o ⊤ i , ϕ c ⊤ i ) ⊤ . (10) Then, using Proposition 2, we have that Y oi ∼ SN p oi ( µ oi , Σ ooi , c oci Σ oo / i ˜ ϕ oi ) and Y ci | Y oi = y oi ∼ ESN p ci ( µ coi , Σ cc.oi , Σ cc.o / i ϕ ci , τ coi ) , where µ coi = µ ci + Σ coi Σ oo − i ( y oi − µ oi ) , Σ cc.oi = Σ cci − Σ coi ( Σ ooi ) − Σ oci , (11) ˜ ϕ oi = ϕ oi + Σ oo − i Σ oci ϕ ci , c oci = (1 + ϕ c ⊤ i Σ cc.oi ϕ ci ) − / and τ coi = ˜ ϕ o ⊤ i ( y oi − µ oi ) . (12) Let V = vec( V , . . . , V n ) and C = vec( C , . . . , C n ) denote the observed data. Therefore, the log-likelihood functionof θ = ( µ ⊤ , α ⊤ , λ ⊤ ) ⊤ , given the observed data ( V , C ) is ℓ ( θ | V , C ) = n X i =1 ln L i , (13) PREPRINT - S

EPTEMBER

24, 2020where L i represents the likelihood function of θ for the i th sample, given by L i ≡ L i ( θ | V i , C i ) = f ( V i | C i , θ ) = f ( V c i ≤ y ci ≤ V c i | y oi , θ ) f ( y oi | θ )= L p ci ( V c i , V c i ; µ coi , Σ cc.oi , Σ cc.o / i ϕ ci , τ coi ) × SN p oi ( y oi ; µ oi , Σ ooi , c oci Σ oo / i ˜ ϕ oi ) . We now describe how to carry out ML estimation for the MSNC model. The EM algorithm, originally proposed by[13], is a very popular iterative optimization strategy and commonly used to obtain ML estimates for incomplete-dataproblems. This algorithm has many attractive features, such as numerical stability, simplicity of implementation andquite reasonable memory requirement [27].By the essential property of a multivariate SN distribution, we can write Y i | ( T i = t i ) ∼ N p ( µ + ∆ t i , Γ ) and T i ∼ HN (0 , , (14) with HN referring to a half normal distribution and with ∆ and Γ as deﬁned in the previous section. The complete-data log-likelihood function of an equivalent set of parameters θ = ( µ ⊤ , ∆ ⊤ , α ⊤ Γ ) ⊤ , where α Γ = vech ( Γ ) , is given by ℓ c ( θ ) = P ni =1 ℓ ic ( θ ) , where the individual complete-data log-likelihood is ℓ ic ( θ ) = − (cid:8) ln | Γ | + ( y i − µ − ∆ t i ) ⊤ Γ − ( y i − µ − ∆ t i ) (cid:9) + c, with c being a constant that does not depend on θ . Subsequently, the EM algorithm for the MSNC model can besummarized as follows: E-step:

Given the current estimate b θ ( k ) = ( b µ ( k ) , b ∆ ( k ) , b α ( k )Γ ) at the k th step of the algorithm, the E-step provides theconditional expectation of the complete data log-likelihood function Q ( θ | b θ ( k ) ) = E h ℓ c ( θ ) | V , C , b θ ( k ) i = n X i =1 Q i ( θ | b θ ( k ) ) , where Q i ( θ | b θ ( k ) ) ∝ −

12 ln | b Γ ( k ) | − tr (cid:20)(cid:26)c y i ( k ) + b µ ( k ) b µ ( k ) ⊤ + b t i ( k ) b ∆ ( k ) b ∆ ( k ) ⊤ − b µ ( k ) b y i ( k ) ⊤ − b y i ( k ) b µ ( k ) ⊤ − b t y ( k ) i b ∆ ( k ) ⊤ − b ∆ ( k ) b t y ( k ) ⊤ i + b t i ( k ) b ∆ ( k ) b µ ( k ) ⊤ + b t i ( k ) b µ ( k ) b ∆ ( k ) ⊤ o b Γ − k ) i , with c y ri ( k ) = E T i Y i [ Y ri | V i , C i , b θ ( k ) ] , b t ri ( k ) = E T i Y i [ T ri | V i , C i , b θ ( k ) ] (for r = { , , } , with Y i = 1 , Y i = Y i and Y i = Y i Y ⊤ i ) and b t y ( k ) i = E T i Y i [ T i Y i | V i , C i , b θ ( k ) ] . Then, we can use Propositions 4 and 5 to obtain closed formexpressions for these conditional expectations as follows:1. If the i th subject has only non-censored components, then c y ri ( k ) = E Y i [ Y ri | V i , C i , b θ ( k ) ] = y ri , b t ri ( k ) = E T i Y i [ T ri | V i , C i , b θ ( k ) ] = E T i [ T ri | Y i , b θ ( k ) ] , c t y i ( k ) = E T i Y i [ T i Y i | V i , C i , b θ ( k ) ] = y i E T i [ T i | Y i , b θ ( k ) ] , with y i = 1 , y i = y i and y i = y i y ⊤ i and E T i [ T ri | Y i , b θ ( k ) ] = E T i [ T ri | Y i ] | θ = b θ ( k ) for r = { , } . These lastconditional expectations can be obtained directly from the results given in [11].2. If the i th subject has only censored components, from Proposition 4 we have c y ri ( k ) = E Y i [ Y ri | V i , C i , b θ ( k ) ] = c w ri ( k ) , b t i ( k ) = M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( c w i ( k ) − b µ ( k ) ) + b γ ( k ) i M ( b θ ( k ) ) , b t i ( k ) = M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( c w i ( k ) − c w i ( k ) b µ ( k ) ⊤ + b µ ( k ) b µ ( k ) ⊤ ) b Γ − k ) b ∆ ( k ) + M ( b θ ( k ) ) + b γ ( k ) i M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( c w k ) i − b µ ( k ) ) , c t y i ( k ) = M ( b θ ( k ) )( c w i k ) − c w i ( k ) b µ ( k ) ⊤ ) b Γ − k ) b ∆ ( k ) + b γ ( k ) i M ( b θ ( k ) ) c w k ) i , PREPRINT - S

EPTEMBER

24, 2020where M ( θ ) = (1 + ∆ ⊤ Γ − ∆ ) − , b w ( k ) i = E [ W i | b θ ( k ) ] , b w k ) i = E [ W i W ⊤ i | b θ ( k ) ] and c w k ) i = E [ W i | b θ ( k ) ] , with W i ∼ TSN p ( b µ ( k ) , b Σ ( k ) , b λ ( k ) , [ v i , v i ]) , W i ∼ TN p ( b µ ( k ) , b Γ ( k ) , [ v i , v i ]) and b γ ( k ) i = 1 q π (cid:0) b λ ( k ) ⊤ b λ ( k ) (cid:1) L p ( v i , v i , b µ ( k ) , b Γ ( k ) ) L p ( v i , v i , b µ ( k ) , b Σ ( k ) , b λ ( k ) , .

3. If the i th subject has both censored and uncensored components and given that ( Y i | V i , C i ) , ( Y i | V i , C i , Y oi ) ,and ( Y ci | V i , C i , Y oi ) are equivalent processes, we have from Proposition 5 that b y ( k ) i = E( Y i | Y oi , V i , C i , b θ ( k ) ) = vec( y oi , b w c ( k ) i ) , c y i ( k ) = E( Y i Y ⊤ i | Y oi , V i , C i , b θ ( k ) ) = y oi y o ⊤ i y oi b w c ( k ) ⊤ i b w c ( k ) i y o ⊤ i b w c ( k ) i ! , b y ( k )0 i = vec( y oi , b w c ( k )0 i ) , b t i ( k ) = M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( b y i ( k ) − b µ ( k ) ) + b γ ( k ) i M ( b θ ( k ) ) , b t i ( k ) = M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( c y i ( k ) − b y i ( k ) b µ ( k ) ⊤ + b µ ( k ) b µ ( k ) ⊤ ) b Γ − k ) b ∆ ( k ) + M ( b θ ( k ) ) + b γ ( k ) i M ( b θ ( k ) ) b ∆ ( k ) ⊤ b Γ − k ) ( c y k ) i − b µ ( k ) ) , c t y i ( k ) = M ( b θ ( k ) )( b y i k ) − b y i ( k ) b µ ( k ) ⊤ ) b Γ − k ) b ∆ ( k ) + b γ ( k ) i M ( b θ ( k ) ) c y k ) i , where b w c ( k ) i = E [ W ci | b θ ( k ) ] , b w c ( k ) i = E [ W ci W c ⊤ i | b θ ( k ) ] and c w c ( k ) i = E [ W c i | b θ ( k ) ] , with W ci ∼ TESN p ci (cid:0)b µ co ( k ) i , b Σ cc.o ( k ) i , b λ co ( k ) i , b τ co ( k ) i , [ v c i , v c i ] (cid:1) , W c i ∼ TN p ( b m co ( k ) i , b Γ cc.o ( k ) i , [ v c i , v c i ]) and b γ ( k ) i = η coi L p ( v c i , v c i ; b m co ( k ) i , b Γ cc.o ( k ) i ) L p ( v c i , v c i ; b µ co ( k ) i , b Σ cc.o ( k ) i , b λ co ( k ) i , b τ co ( k ) i ) , where λ coi = Σ cc.o / i ϕ ci , m coi = µ coi − µ cobi , and η coi , µ cobi and Γ cc.oi can be computed as expressions η , µ b and Γ in Proposition 4 but using the new set of parameters µ coi , Σ cc.oi , λ coi and τ coi (instead of µ , Σ , λ and τ ).To compute E [ W i ] , E [ W i ] and E [ W i W ⊤ i ] in items 2 and 3, we use the R library MomTrunc . M-step:

Conditionally maximizing Q ( θ | b θ ( k ) ) = P ni =1 Q i ( θ | b θ ( k ) ) with respect to each entry of θ , we update theestimate b θ ( k ) = (ˆ µ ( k ) , b ∆ ( k ) , b α ( k )Γ ) by b µ ( k +1) = 1 n n X i =1 n b y i ( k ) − b t i ( k ) b ∆ ( k ) o , (15) b ∆ ( k +1) = ( n X i =1 b t i ( k ) ) − n X i =1 n c t y i ( k ) − b t ( k ) i b µ ( k +1) o , (16) b Γ ( k +1) = 1 n n X i =1 (cid:26)c y i ( k ) + b µ ( k ) b µ ( k ) ⊤ + b t i ( k ) b ∆ ( k ) b ∆ ( k ) ⊤ − b µ ( k ) b y i ( k ) ⊤ − b y i ( k ) b µ ( k ) ⊤ − b t y ( k ) i b ∆ ( k ) ⊤ − b ∆ ( k ) b t y ( k ) ⊤ i + b t i ( k ) b ∆ ( k ) b µ ( k ) ⊤ + b t i ( k ) b µ ( k ) b ∆ ( k ) ⊤ o . (17) The algorithm is iterated until a suitable convergence rule is satisﬁed. In the later analysis, the algorithm is terminatedwhen the relative distance between two successive evaluations of the log-likelihood deﬁned in (13) is less than atolerance, i.e., | ℓ ( b θ ( k +1) | V , C ) /ℓ ( b θ ( k ) | V , C ) − | < ǫ , for example, ǫ = 10 − . Once converged, we can recover b λ and b Σ using the expressions b Σ = b Γ + b ∆ b ∆ ⊤ and b λ = b Σ − / b ∆ (1 − b ∆ ⊤ b Σ − b ∆ ) / . PREPRINT - S

EPTEMBER

24, 2020It is important to stress that, from Eqs. (15)-(17), the E-step reduces to the computation of c y i , b y i , b t i , b t i and b t y i , forwhich we have implementable expressions. As pointed out for an anonymous referee, since missing values as treatedas interval censored data, the computation burden relies heavily on the dimension of censored vector for evaluating theexpectations of TESN and TN random vectors. In next subsection, we brieﬂy discuss how to circumvent this problem,such that missing values do not represent neither a mathematical or computational burden. In the event that there are missing values, we can partition the censored vector as Y cens = ( Y ⊤ c , Y ⊤ m ) ⊤ , that is,as missing and (truly) censored, in order to avoid unnecessary calculation of integrals for obtaining its expectation.Considering the partition above such that dim ( Y c ) = p cc , dim ( Y m ) = p cm , where p cc + p cm = p c , it follows that E [ Y cens | Y obs ] = E (cid:20) E [ Y m | Y c , Y obs ] Y c | Y obs (cid:21) (18) and var[ Y cens | Y obs ] is given by (cid:20) E [var[ Y m | Y c , Y obs ]] + var[ E [ Y m | Y c , Y obs ]] cov[ E [ Y m | Y c , Y obs ] , Y c | Y obs ]cov[ Y c | Y obs , E [ Y m | Y c , Y obs ]] var[ Y c | Y obs ] (cid:21) . (19) By noting that Y m = ( V = ( − ∞ , ∞ ) , C = ) , we have that Y m | Y c , Y obs is a non-truncated partition following aESN distribution which moments have closed forms. Then, the computation of the ﬁrst two moments of Y cens | Y obs can be calculated using Eqs. (18) and (19), these last only depending on the computation of the truncated moments of Y c | Y obs , these are E [ Y c | Y obs ] and var[ Y c | Y obs ] . As can be seen, we can use the latter equations to treat missing dataas censored in a neat manner, where the truncated moments are computed only over the p cc -variate partition, avoidingsome unnecessary integrals and saving a signiﬁcant computational effort. Remark 1

In general, TESN distributions are not closed under marginalization but conditioning. For instance, Y m | Y obs does not follow a TESN distribution but its conditional distribution Y m | Y c , Y obs does. Furthermore, since V = ( − ∞ , ∞ ) for missing observations, we have that Y m | Y c , Y obs is a (conditionally) non-truncated partition, fol-lowing a ESN distribution. For this particular case, Y c | Y obs follow a TESN distribution due to the aforementionedcondition. Ignoring censoring for the moment, we consider a more general and robust framework for the multivariate responsevariable Y i of the model deﬁned in (8), which is assumed to follow a mixture of multivariate skew normal distributions: Y i ∼ G X j =1 π j SN p ( µ j , Σ j , λ j ) , (20) where π j are weights adding to 1 and G is the number of groups, also called components in mixture models. Themixture model considered in (20) can also be by letting Z ij be a latent class variable, such that Z ij = ( if the i th observation is from the j th component, otherwise . Thus, given Z ij = 1 , the response Y i follows a multivariate skew-normal distribution Y i ∼ SN p ( µ j , Σ j , λ j ) , i ∈ { , . . . , n } , j ∈ { , . . . , G } . (21) Now, suppose

Pr( Z ij = 1) = π j . Then the density of Y i , without observing Z ij , is given by f ( y i | θ ) = G X j =1 π j SN p ( y i ; µ j , Σ j , λ j ) , (22) where θ = ( θ ⊤ , . . . , θ ⊤ G ) ⊤ , with θ j = ( π j , µ ⊤ j , Σ j , λ j ) ⊤ .We treat the observed and censored components of Y i , separately, i.e. y i = ( y o ⊤ i , y c ⊤ i ) ⊤ , with respective partitionedparameters as in (10). Following [19], we deﬁne the mixture model for censored data as a mixture of the MSNC7 PREPRINT - S

EPTEMBER

24, 2020models given in (13), viz. f ( V i | C i , θ ) = G X j =1 π j f ij ( V i | C i , θ ) , (23) with f ij ( V i | C i , θ ) = L p ci ( V c i , V c i ; µ coi , Σ cc.oi , Σ cc.o / i ϕ ci , τ coi ) × SN p oi ( y oi ; µ oi , Σ ooi , c oci Σ oo / i ˜ ϕ oi ) , where, for each component j , the arguments are deﬁned as (11) and (12), respectively. The model deﬁned in (23) willbe called the FM-MSNC model. Thus, the log-likelihood function given the observed data ( V , C ) is given by ℓ ( θ | V , C ) = n X i =1 ln { f ( V i | C i , θ ) } . In this section, we present an EM algorithm for the ML estimation of the FM-MSNC model. To do so, we presentthe FM-MSNC model in an incomplete-data framework, using the results presented in Section 3. We recall that thelikelihood associated with ﬁnite mixtures of skew-normal distributions may be unbounded, as shown by [11]. Using astraightforward extension of their argument, it can be shown that the likelihood may be unbounded in the FM-MSNCcase as well. Despite this, following [32] (p. 41), we shall henceforth refer to the solution provided by the EMalgorithm as the ML estimate even in situations where it may not globally maximize the likelihood.Using the stochastic representation of the skew-normal distribution given in (14), it follows that the complete datalog-likelihood function is ℓ c ( θ ) = P ni =1 ℓ ic ( θ ) , where, for each i ∈ { , . . . , n } , ℓ ic ( θ ) = c + G X j =1 z ij ln π j − G X j =1 z ij ln ( | Γ j | ) − G X j =1 z ij ( y i − µ j − ∆ j t i ) ⊤ Γ − j ( y i − µ j − ∆ j t i ) , (24) with c being a constant which is independent of the parameter vector θ .For each j ∈ { , . . . , G } , let b θ ( k ) j = ( b π ( k ) j , b µ ( k ) j , b Σ ( k ) j , b λ ( k ) j ) ⊤ , and let b θ ( k ) = ( b θ ( k ) ⊤ , . . . , b θ ( k ) ⊤ G ) ⊤ be the estimate of θ at the k th iteration. It follows, after some simple algebra, that the conditional expectationof the complete log-likelihood function has the form Q ( θ | b θ ( k ) ) ∝ n X i =1 G X j =1 Z ij ( b θ ( k ) ) ln π j − n X i =1 G X j =1 Z ij ( θ ( k ) ) ln ( | c Γ j ( k ) | ) − n X i =1 G X j =1 tr hb Γ − k ) j n E ij ( b θ ( k ) ) − b µ ( k ) j E ⊤ ij ( b θ ( k ) ) − E ij ( b θ ( k ) ) b µ ( k ) ⊤ j − E ij ( b θ ( k ) ) b ∆ ( k ) ⊤ j − b ∆ ( k ) j E ij ( b θ ( k ) ⊤ ) + Z ij ( b θ ( k ) ) b µ ( k ) j b µ ( k ) ⊤ j + E ij ( b θ ( k ) ) b ∆ ( k ) j b ∆ ( k ) ⊤ j + E ij ( b θ ( k ) ) b ∆ ( k ) j b µ ( k ) ⊤ j + E ij ( b θ ( k ) ) b µ ( k ) j b ∆ ( k ) ⊤ j oi , where E ij ( b θ ( k ) ) = E( Z ij Y i | V i , C i , b θ ( k ) ) , E ij ( b θ ( k ) ) = E( Z ij Y i Y ⊤ i | V i , C i , b θ ( k ) ) , E ij ( b θ ( k ) ) = E( Z ij T i Y i | V i , C i , b θ ( k ) ) , E ij ( b θ ( k ) ) = E( Z ij T i | V i , C i , b θ ( k ) ) , E ij ( b θ ( k ) ) = E( Z ij T i | V i , C i , b θ ( k ) ) and Z ij ( b θ ( k ) ) = E( Z ij | V i , C i , b θ ( k ) ) . PREPRINT - S

EPTEMBER

24, 2020By using known properties of conditional expectation, we obtain Z ij ( b θ ( k ) ) = b π ( k ) j f ij ( V i | C i , b θ ( k ) j ) G X j =1 b π ( k ) j f ij ( V i | C i , b θ ( k ) j ) , (25) E ij ( b θ ( k ) ) = Z ij ( b θ ( k ) )E( Y i | V i , C i , b θ ( k ) , Z ij = 1) E ij ( b θ ( k ) ) = Z ij ( b θ ( k ) )E( Y i Y ⊤ i | V i , C i , b θ ( k ) , Z ij = 1) , E ij ( b θ ( k ) ) = Z ij ( b θ ( k ) )E( T i Y i | V i , C i , b θ ( k ) , Z ij = 1) , (26) E ij ( b θ ( k ) ) = Z ij ( b θ ( k ) )E( T i | V i , C i , b θ ( k ) , Z ij = 1) and E ij ( b θ ( k ) ) = Z ij ( b θ ( k ) )E( T i | V i , C i , b θ ( k ) , Z ij = 1) . The conditional expectations E( Y i | V i , C i , b θ ( k ) , Z ij = 1) , E( Y i Y ⊤ i | V i , C i , b θ ( k ) ,Z ij = 1) , E( T i Y i | V i , C i , b θ ( k ) , Z ij = 1) , E( T i | V i , C i , b θ ( k ) , Z ij = 1) and E( T i | V i , C i , b θ ( k ) , Z ij = 1) can bedirectly obtained from expressions b y i , c y i , b t y i , b t i and b t i , respectively, given in Subsection 3.2. Thus, we have closedform expressions for all the quantities involved in the E-step of the algorithm. Next, we describe the EM algorithmfor maximum likelihood estimation of the parameters in the FM-MSNC model. E-step : Given θ = b θ ( k ) , compute E sij ( b θ ( k ) ) for all s ∈ { , , , , } and Z ij ( b θ ( k ) ) for all i ∈ { , . . . , n } , j ∈ { , . . . , G } . M-step : Update b θ ( k +1) by maximizing Q ( θ | b θ ( k ) ) over θ , which leads to the following closed form expressions: b π ( k +1) j = 1 n n X i =1 Z ij ( b θ ( k ) ) , b µ ( k +1) j = ( n X i =1 Z ij ( b θ ( k ) ) ) − n X i =1 {E ij ( b θ ( k ) ) − E ij ( b θ ( k ) ) b ∆ ( k ) j } b ∆ ( k +1) j = ( n X i =1 E ij ( b θ ( k ) ) ) − n X i =1 {E ij ( b θ ( k ) ) − E ij ( b θ ( k ) ) b µ ( k +1) j } b Γ ( k +1) j = ( n X i =1 Z ij ( b θ ( k ) ) ) − n X i =1 n E ij ( b θ ( k ) ) − b µ ( k ) j E ⊤ ij ( b θ ( k ) ) − E ij ( b θ ( k ) ) b µ ( k ) ⊤ j − E ij ( b θ ( k ) ) b ∆ ( k ) ⊤ j − b ∆ ( k ) j E ij ( b θ ( k ) ⊤ ) + Z ij ( b θ ( k ) ) b µ ( k ) j b µ ( k ) ⊤ j + E ij ( b θ ( k ) ) b ∆ ( k ) j b ∆ ( k ) ⊤ j + E ij ( b θ ( k ) ) b ∆ ( k ) j b µ ( k ) ⊤ j + E ij ( b θ ( k ) ) b µ ( k ) j b ∆ ( k ) ⊤ j o , for all j ∈ { , . . . , G } .It is well known that mixture models can provide a multimodal log-likelihood function. In this sense, the method ofmaximum likelihood estimation through the EM algorithm may not give global solutions if the starting values are farfrom the real parameter values. Thus, the choice of starting values for the EM algorithm in the mixture context playsa big role in parameter estimation. In our examples and simulation studies, we consider the following procedure forthe FM-MSNC model:(i) Partition the data (censoring levels replacing the censored observations) into G groups using the K-meansclustering algorithm [11].(ii) Compute the proportion of data points belonging to the same cluster j , say π (0) j , j ∈ { , . . . , G } . This givesthe initial value for π j .(iii) For each group j , compute the initial values µ (0) j , Σ (0) j , λ (0) j using the R package mixsmsn [33].9 PREPRINT - S

EPTEMBER

24, 2020

Because there is no universal criterion for mixture model selection, we chose three criteria to compare the modelsconsidered in this work, namely, the Akaike information criterion (AIC) [1], Bayesian information criterion (BIC)[35] and efﬁcient determination criterion (EDC) [6]. Like the AIC and BIC, EDC has the form − ℓ ( b θ ) + ρc n , where ℓ ( θ ) is the actual log-likelihood, ρ is the number of free parameters that has to be estimated in the model and the penaltyterm c n is a convenient sequence of positive numbers. Here, we use c n = 0 . √ n , a proposal that was considered in [8]and [11]. Note that c n constant is given by c n = 2 for AIC and c n = log n for BIC, with n being the sample size. In this section, we describe how to obtain the standard errors of the ML estimates for the FM-MSNC model. Wefollow the information-based method exploited by [7] to compute the asymptotic covariance of the ML estimates. Theempirical information matrix, according to [30]’s formula, is deﬁned as I e ( θ | y ) = n X i =1 s ( y i | θ ) s ⊤ ( y i | θ ) − n S ( y i | θ ) S ⊤ ( y i | θ ) , (27) where S ( y i | θ ) = P Ni =1 s ( y i | θ ) and s ( y i | θ ) is the empirical score function for the i th subject. It is noted from the resultof [26] that the individual score can be determined as s ( y i | θ ) = E (cid:18) ∂ℓ i ( θ | y c ) ∂ θ (cid:12)(cid:12)(cid:12)(cid:12) V i , C i , θ (cid:19) . (28) Using the ML estimates b θ in s ( y i | θ ) , leads to S ( y i | b θ ) = 0 , so from (27) we have that I e ( b θ | y ) = n X i =1 b s i b s ⊤ i , (29) where b s i is an individual score vector given by b s i = ( b s i, µ , . . . , b s i, µ G , b s i, σ , . . . , b s i, σ G , b s i, λ , . . . , b s i, λ G , b s i,π , . . . , b s i,π G − ) ⊤ , where σ j is a vector with p ( p + 1) / distinct elements of Σ j .First we reparameterize Σ j = F j for ease of computation and theoretical derivation, where F j is the square root of Σ j containing p ( p + 1) / distinct elements.Now we have that b s i = ( b s i, µ , . . . , b s i, µ G , b s i, α , . . . , b s i, α G , b s i, λ , . . . , b s i, λ G , b s i,π , . . . , b s i,π G − ) ⊤ . So, the expressions for the elements of b s i are given by: b s i,π j = Z ij ( b θ ) b π j − Z ij ( b θ ) b π G , b s i, µ j = ( b s i,µ j , . . . , b s i,µ jp ) = b Γ − j (cid:16) E ij ( b θ ) − Z ij ( b θ ) b µ j − E ij ( b θ ) b ∆ j (cid:17) , b s i, α j = ( b s i,α j, , . . . , b s i,α j,pp ) = − Z ij ( b θ )2 tr (cid:16)b Γ − j A j ( b θ ) (cid:17) − n E ij ( b θ ) ⊤ b Γ − j A j ( b θ ) b Γ − j b µ j + b µ ⊤ j b Γ − j A j ( b θ ) b Γ − j E ij ( b θ ) − E ij ( b θ ) ⊤ (cid:16)b Γ − j ˙ F j ( r ) b δ j − b Γ − j A j ( b θ ) b Γ − j b ∆ j (cid:17) − tr (cid:16) E ij ( b θ ) b Γ − j A j ( b θ ) b Γ − j (cid:17) − (cid:16)b δ ⊤ j ˙ F j ( r ) b Γ − j − b ∆ ⊤ j b Γ − j A j ( b θ ) b Γ − j b ∆ j (cid:17) E ij ( b θ ) − b µ ⊤ j b Γ − j A j ( b θ ) b Γ − j b µ j + E ij ( b θ ) b µ ⊤ j (cid:16)b Γ − j ˙ F j ( r ) b δ j − b Γ − j A j ( b θ ) b Γ − j b ∆ j (cid:17) + E ij ( b θ ) (cid:16)b δ ⊤ j ˙ F j ( r ) b Γ − j b ∆ j − b ∆ ⊤ j b Γ − j A j ( b θ ) b Γ − j b ∆ j + b ∆ ⊤ j b Γ − j ˙ F j ( r ) b δ j (cid:17) + (cid:16)b δ ⊤ j ˙ F j ( r ) b Γ − j − b ∆ ⊤ j b Γ − j A j ( b θ ) b Γ − j (cid:17) b µ j E ij ( b θ ) o , PREPRINT - S

EPTEMBER

24, 2020 b s i, λ j = ( b s i,λ j , . . . , b s i,λ jp ) = Z ij ( b θ )2 tr (cid:16)b Γ − j B j ( b θ ) (cid:17) − n tr (cid:16) E ij ( b θ ) b Γ − j B j ( b θ ) b Γ − j (cid:17) − E ij ( b θ ) ⊤ b Γ − j B j ( b θ ) b Γ − j b µ j + b µ ⊤ j b Γ − j B j ( b θ ) b Γ − j b µ j − b µ ⊤ j b Γ − j B j ( b θ ) b Γ − j E ij ( b θ ) − E ij ( b θ ) ⊤ (cid:16)b Γ − j B j ( b θ ) b Γ − j b ∆ j + b Γ − j b j ( b θ ) (cid:17) − (cid:16) b ∆ ⊤ j b Γ − j B j ( b θ ) b Γ − j + b j ( b θ ) ⊤ b Γ − j (cid:17) E ij ( b θ )+ E ij ( b θ ) b µ ⊤ j (cid:16)b Γ − j B j ( b θ ) b Γ − j b ∆ j + b Γ − j b j ( b θ ) (cid:17) + (cid:16) b ∆ ⊤ j b Γ − j B j ( b θ ) b Γ − j + b j ( b θ ) ⊤ b Γ − j (cid:17) b µ j E ij ( b θ )+ E ij ( b θ ) (cid:16) b j ( b θ ) ⊤ b Γ − j b ∆ j + b ∆ ⊤ j b Γ − j B j ( b θ ) b Γ − j b ∆ j + b ∆ ⊤ j b Γ − j b j ( b θ ) (cid:17)o , where A j ( b θ ) = (cid:16) ˙ F j ( r )( I − b δ j b δ ⊤ j ) b F j + b F j ( I − b δ j b δ ⊤ j ) ˙ F j ( r ) (cid:17) ,B j ( b θ ) = b F j ˙ R j ( r )(1 + b λ ⊤ j b λ j ) − λ jr b λ j b λ ⊤ j (1 + b λ ⊤ j b λ j ) ! b F j ,b j ( b θ ) = b F j ˙ λ j ( r )(1 + b λ ⊤ j b λ j ) − λ jr b λ j (1 + b λ ⊤ j b λ j ) / ! , ˙ F j ( r ) = ∂ F j ∂σ jr (cid:12)(cid:12)(cid:12)(cid:12) σ = c σ , ˙ R j ( r ) = ∂ λ j λ ⊤ j ∂λ jr (cid:12)(cid:12)(cid:12)(cid:12) λ = b λ , and ˙ λ j ( r ) = ∂ λ j ∂λ jr (cid:12)(cid:12)(cid:12)(cid:12) λ = b λ , with r = 1 , , . . . , p . In order to study the performance of our proposed method, we present ﬁve simulation studies. The ﬁrst and secondstudy investigates whether we can estimate the true parameter values and their respective standard errors accurately byusing the proposed EM algorithm and approximated empirical information matrix, respectively involving censoringand missing data. The third one investigates the number of mixture components by comparing the

FM-MSNC with twogroups and

FM-MNC with various groups. The fourth study investigates the ability of the

FM-MSNC model to clusterobservations. Finally, the last one shows the asymptotic behavior of the EM estimates for the proposed model. Thecomputations were done using the R package CensMFM . This simulation study is designed to verify if we can estimate the true parameter values of the

FM-MSNC model accu-rately when we have censoring data by using the proposed EM algorithm. We simulated several datasets consideringmixtures with two components from model (23) with two left-censoring proportion settings ( and ), taken ineach mixture component, and different samples sizes n ∈ (500 , , . For each combination, we generated Monte Carlo (MC) samples. Summary statistics of the estimates across the

FM-MSNC model with their respective density contours for the skew-normal (top panel) and normal (bottom panel) distributions and the allocations in each group for samples sizes , and with left-censoring proportion of . The black points represent the ﬁrst component and the red trianglesrepresent the second component of the mixture. One can note that the contour lines of the skew-normal distributionare more appropriate to represent the shape assumed by the generated data.The results are presented in Table 1. This table shows that, regardless the sample size, the Monte Carlo mean of theparameter estimates deviates further the true values as the censoring level increases, i.e., the parameter estimates are11 PREPRINT - S

EPTEMBER

24, 2020 −10−50510 −8 −4 0 4 (a) n = 500 −10−505 −5 0 5 (b) n = 1000 −10−505 −5 0 5 (c) n = 2000 −10−50510 −8 −4 0 4 (d) n = 500 −10−505 −5 0 5 (e) n = 1000 −10−505 −5 0 5 (f) n = 2000

Figure 1:

Simulated data: Performance of the ML Estimates over censoring data. Scatter plot for some simulated data fromFM-MSNC model with the respective density contours, skew-normal (top panel) and normal (bottom panel) with censoringlevel. affected by the censoring level. In particular, the estimates of µ and µ appear to be less affected by increasing thecensoring level than the other parameters. Furthermore, the estimates of the standard errors, i.e., MC Sd and IM SE,provide relatively close results, which may indicate that the asymptotic approach proposed for the standard errors ofthe ML estimates is reliable. Table 1: Simulated data: Performance of the ML Estimates over censoring data. Parameter estimates based on 500simulated samples. Monte Carlo (MC) mean, MC Sd are the respective mean estimates and standard deviations. IM SEis the average value of the approximate standard error obtained through the information-based method.Censoring j Measure Parameter µ j µ j α j, α j, α j, λ j λ j πn = 5005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.1797 -4.1235 1.6262 0.2923 2.2499 -1.7236 2.1533 0.6607MC Sd 0.3101 0.3705 0.1510 0.1138 0.1906 0.5262 0.5739 0.0274IM SE 0.2844 0.3406 0.1361 0.0924 0.1944 0.5397 0.6202 0.02362 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 2.0196 2.0954 1.3152 0.3177 1.7883 -2.7361 3.7058MC Sd 0.2381 0.2808 0.1431 0.0928 0.1595 1.0896 1.3303IM SE 0.2677 0.2922 0.1391 0.0981 0.1816 1.2411 1.5434 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.3139 -4.1708 1.5445 0.3483 2.2909 -1.3744 2.0529 0.7030MC Sd 0.3360 0.4545 0.1580 0.1413 0.2700 0.5082 0.7388 0.0238IM SE 0.4719 0.5176 0.2080 0.1396 0.2762 0.7304 0.8737 0.02382 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 2.2651 2.5165 1.3373 0.2533 1.6079 -2.3831 3.1945MC Sd 0.5131 0.5107 0.2661 0.1682 0.2207 1.5120 1.4642IM SE 0.3213 0.3134 0.1929 0.1202 0.1726 1.4296 1.4575 n = 10005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.1812 -4.1450 1.6219 0.2804 2.2407 -1.6988 2.1169 0.6594MC Sd 0.1876 0.2248 0.0911 0.0615 0.1354 0.3611 0.4228 0.0165IM SE 0.1988 0.2351 0.0962 0.0607 0.1369 0.3660 0.4177 0.01672 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( ) Continued on next page PREPRINT - S

EPTEMBER

24, 2020

Table 1 –

Continued from previous page

Censored j Measure Parameter µ j µ j α j, α j, α j, λ j λ j π MC mean 2.0488 2.1190 1.3265 0.3071 1.7823 -2.6601 3.5305MC Sd 0.1698 0.2069 0.0912 0.0683 0.1091 0.6830 0.8175IM SE 0.1938 0.2113 0.0994 0.0676 0.1266 0.7723 0.9274 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.3168 -4.2133 1.5402 0.3276 2.2758 -1.3758 2.0334 0.7021MC Sd 0.2029 0.2736 0.0978 0.0776 0.1758 0.3513 0.5272 0.0162IM SE 0.3071 0.3289 0.1374 0.0878 0.1840 0.4791 0.5681 0.01602 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 2.3043 2.5243 1.3347 0.2208 1.5994 -2.3618 3.0762MC Sd 0.3216 0.2956 0.1899 0.1000 0.1513 1.0722 0.8913IM SE 0.2144 0.2025 0.1295 0.0743 0.1172 0.8982 0.8670 n = 20005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.2080 -4.1723 1.6098 0.2779 2.2581 -1.6429 2.1188 0.6600MC Sd 0.1577 0.1728 0.0678 0.0525 0.1010 0.2864 0.3225 0.0118IM SE 0.1411 0.1681 0.0672 0.0429 0.0978 0.2515 0.2926 0.01182 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 2.0480 2.1143 1.3204 0.3059 1.7857 -2.5958 3.4666MC Sd 0.1283 0.1995 0.0642 0.0553 0.0845 0.5176 0.6986IM SE 0.2882 0.3822 0.0739 0.0501 0.0888 0.6434 0.8032 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -3.3559 -4.2548 1.4775 0.3046 2.1378 -1.3237 2.0660 0.7036MC Sd 0.1785 0.2527 0.1310 0.0859 0.3274 0.2934 0.4413 0.0109IM SE 0.2138 0.2299 0.0921 0.0641 0.1320 0.3289 0.4023 0.01132 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 2.2991 2.5248 1.3155 0.2141 1.6032 -2.2326 2.9887MC Sd 0.2436 0.2913 0.1381 0.0824 0.1245 0.7816 0.7760IM SE 0.1601 0.1569 0.0922 0.0542 0.0847 0.5911 0.5862 In order to evaluate the performance of

FM-MSNC model for dealing with partially incomplete data, a simulationstudy was conducted. Various ways of using models for imputation are described in [25], among them, one of themost relevant is the missing completely at random (MCAR). We simulated several datasets considering mixtures withtwo components from model (23) with two missing data proportion settings ( and ), taken in each mixturecomponent, and different samples sizes n ∈ (500 , , . For each combination, we generated Monte Carlo(MC) samples. Summary statistics of the estimates across the

MC samples were computed, such as the meanestimate (MC mean), the empirical standard error (MC Sd), and the mean of the approximate standard errors of theestimates, obtained through the method described in Section 4.3 (IM SE). We consider small and different varianceswith the following parameter as in the simulation about asymptotic properties in 5.5.Table 2 shows the results for this simulation. The results obtained are similar to those of simulation 5.1 and the sameconclusions can be drawn. Additionally, we note that λ estimates appear to be more strongly affected as we increasethe proportion of missing data in the sample. Table 2: Simulated data: Performance of the ML Estimates over missing data. Parameter estimates based on 500 simu-lated samples. Monte Carlo (MC) mean, MC Sd are the respective mean estimates and standard deviations. IM SE is theaverage value of the approximate standard error obtained through the information-based method.Missing j Measure Parameter µ j µ j α j, α j, α j, λ j λ j πn = 5005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.2150 -3.3411 1.6844 0.4594 1.9442 -1.0396 1.4896 0.6495MC Sd 0.5773 0.8843 0.1075 0.2005 0.2071 1.0020 1.5677 0.0219IM SE 0.8618 0.9443 0.2666 0.1876 0.2603 0.9352 1.0664 0.02202 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 1.7699 3.4694 1.3709 0.4857 1.7308 -0.9892 1.5876MC Sd 0.5614 0.7703 0.1231 0.1904 0.1922 1.1875 1.6737IM SE 0.8408 0.9545 0.2759 0.2176 0.3247 1.2244 1.3864 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.2797 -3.2378 1.6751 0.4903 1.9116 -0.8182 1.1958 0.6495MC Sd 0.6079 0.9007 0.1120 0.1908 0.2215 0.9494 1.5060 0.0228IM SE 1.0618 1.1557 0.3267 0.2297 0.3216 1.1195 1.2620 0.02272 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( ) Continued on next page PREPRINT - S

EPTEMBER

24, 2020

Table 2 –

Continued from previous page

Missing j Measure Parameter µ j µ j α j, α j, α j, λ j λ j π MC mean 1.6992 3.5252 1.3680 0.5125 1.7031 -0.7323 1.3092MC Sd 0.5987 0.7648 0.1359 0.1924 0.2047 1.1693 1.5322IM SE 1.2860 1.4228 0.3530 0.2721 0.4106 1.6672 1.7740 n = 7005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.1275 -3.2608 1.6902 0.4589 1.9299 -1.0984 1.4147 0.6493MC Sd 0.5118 0.9054 0.0931 0.2000 0.1867 0.9515 1.5788 0.0188IM SE 0.6699 0.7212 0.2189 0.1539 0.2158 0.7571 0.8440 0.01862 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 1.8256 3.4568 1.3733 0.4711 1.7356 -1.1133 1.6816MC Sd 0.4959 0.7484 0.1053 0.1725 0.1741 1.0682 1.5869IM SE 0.6973 0.8268 0.2228 0.1766 0.2621 1.0000 1.1773 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.1948 -3.2079 1.6799 0.4817 1.8994 -0.9175 1.1938 0.6488MC Sd 0.5467 0.8905 0.0937 0.1907 0.1882 0.9160 1.4792 0.0192IM SE 0.8447 0.8977 0.2686 0.1866 0.2680 0.9194 1.0006 0.01922 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 1.7787 3.5349 1.3666 0.4987 1.7052 -0.8822 1.3254MC Sd 0.5295 0.7466 0.1172 0.1741 0.1832 1.0146 1.4570IM SE 0.9450 1.0692 0.2765 0.2118 0.3275 1.2557 1.3766 n = 9005% − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.1377 -3.4011 1.6847 0.4243 1.9552 -1.1829 1.6348 0.6488MC Sd 0.4678 0.8574 0.0794 0.1869 0.1790 0.8858 1.4904 0.0169IM SE 0.4769 0.4857 0.1699 0.1183 0.1781 0.5865 0.6471 0.01632 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 1.8644 3.3518 1.3744 0.4342 1.7513 -1.2648 1.9203MC Sd 0.4578 0.6741 0.0891 0.1616 0.1540 0.9946 1.4206IM SE 0.5043 0.5860 0.1656 0.1297 0.1999 0.7796 0.9222 − ) ( − ) ( . ) ( . ) ( . ) ( − ) ( ) ( . )MC mean -5.2002 -3.3434 1.6760 0.4495 1.9244 -0.9997 1.4014 0.6485MC Sd 0.5123 0.8387 0.0839 0.1816 0.1866 0.8815 1.4234 0.0175IM SE 0.6459 0.6892 0.2138 0.1502 0.2198 0.7351 0.8213 0.01682 True ( ) ( ) ( . ) ( . ) ( . ) ( − ) ( )MC mean 1.7946 3.4393 1.3699 0.4709 1.7147 -0.9902 1.5475MC Sd 0.5079 0.6797 0.0986 0.1640 0.1650 1.0069 1.3840IM SE 0.7262 0.7971 0.2178 0.1622 0.2598 1.0105 1.0992 To exemplify the predictive accuracies on the imputation of missing values, we compare the

FM-MSNC with thetraditional randomization-based mean imputation ( MI ) predictor [25], known as a common heuristic by ﬁlling in asingle value for each missing value with the observed sample mean of the associated attribute. As a measure ofprecision, we use the mean absolute error (MAE) and the mean absolute relative error (MARE). They are deﬁned as MAE = 1 m n X i =1 g X j =1 | y ij − ˆ y ij | and MARE = 1 m n X i =1 g X j =1 (cid:12)(cid:12)(cid:12)(cid:12) y ij − ˆ y ij y ij (cid:12)(cid:12)(cid:12)(cid:12) , (30) where m is the number of missing entries, y ij is the actual value and ˆ y ij is the respective predictive value. TheMAE and MARE measures, for both FM-MSCN and MI method, are listed in Table 3. We can see that the FM-MSCN predictor exhibits considerable promising accuracy in the prediction of missing values when compared with those ofMI imputations for all cases.Table 3: Simulated data: Performance of the ML Estimates over missing data. Average prediction accuracies for theboth imputation methods FM-MSNC and mean imputation (MI) with varying sample size, n ∈ (500 , , , andproportions of missing values , and . Imputation method Missing rate( % ) MAE MARE500 700 900 500 700 900FM-MSNC PREPRINT - S

EPTEMBER

24, 2020

In this section, we compare the ability of some classic model selection criteria discussed in Subsection 4.2 to selectthe appropriate model. One may argue that an arbitrary multivariate density can always be approximated by a ﬁnitemixture of normal multivariate distributions, see [31, Chapter 1], for example. Thus, an interesting comparison canbe made if we consider a sample from a two-component

FM-MSNC (2) and use some model choice criteria to comparethis model with the

FM-MNC and several components under different censoring levels. Here we consider samplesof size from a two-component

FM-MSNC (2) model with left censoring levels at , or , and parametervalues set at . SN (cid:18)(cid:20) (cid:21) , (cid:20) . . (cid:21) , (cid:20) − (cid:21)(cid:19) + 0 . SN (cid:18)(cid:20) − − (cid:21) , (cid:20) . . (cid:21) , (cid:20) − (cid:21)(cid:19) . The results are presented in Table 4, under different censoring levels, where it can be seen that all criteria favor thetrue model, that is, the

FM-MSNC (2) model instead the

FM-MNC model with two, three and four components, asexpected. This is evidence that these measures are capable of detecting departures from normality. It is important toemphasize that the

FM-MNC models with three and four components have and parameters respectively, while the FM-MSNC (2) model has parameters.Table 4: Simulated data: Number of mixture components. Percentage when the FM-MSNC model with two compo-nents is preferred over the other adjusted FM-MNC models. Censoring

5% 10% 20%

Group 2 3 4 2 3 4 2 3 4Criteria AIC 100 100 100 95 100 100 100 100 100BIC 100 100 100 93 100 100 100 100 100EDC 100 100 100 93 100 100 100 100 100

As pointed out for an anonymus referee, we can see in Table 4 that the case with censoring and two componentsis the only case where the correct model was not preferred by model selection criteria for a small number of instances.According to Figure 2, the preferred model in these (atypical) cases was the

FM-MNC(2) , however the differences inthe criteria values related to a

FM-MSNC(2) are close to zero. We believe that the amount of data sets generated in thesimulation ( data sets) may not be sufﬁcient and a more intensive simulation study would be required. However,due to the computational burden of the simulation, it would be too time consuming to move over to bigger simulations,say, data sets.

Mixture models in general can be used for two main purposes: 1. estimation, and 2. model-based clustering [28]. Inthis section, we investigate the ability of the

FM-MSNC model to cluster observations, that is, to allocate them intogroups of observations that are similar in some sense. We know that each data point belongs to g heterogeneouspopulations, but we don’t know how to discriminate between them. Fitting the data with mixture models allowsclustering the data in terms of the estimated posterior probability that a single point belongs to a given group. For thispurpose, we follow the method proposed by [38], to assess the quality of the clustering of each mixture model usingan index measure called correct classiﬁcation rate (CCR), which is based on the posterior assigned to each subject. Forthe investigation of the clustering ability of the FM-MSNC model, we simulated

MC samples considering mixtureswith two components from model (22), with sample size n ∈ (100 , , , without censoring and left-censoringproportion settings (5% , , taken in each mixture component, and parameter values set at . SN (cid:18)(cid:20) (cid:21) , (cid:20) (cid:21) , (cid:20) (cid:21)(cid:19) + 0 . SN (cid:18)(cid:20) (cid:21) , (cid:20) (cid:21) , (cid:20) (cid:21)(cid:19) . To ﬁt the data we used the models

FM-MSNC and

FM-MNC amd for each model we obtain the estimate of the posteriorprobability that an observation y i belongs to the j th component of the mixture, b Z ij . So, if max j c Z ij occurs in compo-nent j , then y i is classiﬁed into group j . For the m th sample of the MC, we computed the correct classiﬁcation rate,denoted by CCRm, then obtained the average of the correct classiﬁcation rate (ACCR) of CCRm. Table 5 shows theACCR values. From this table it is possible to observe that the model produces a high correct classiﬁcation rate inboth ﬁtted models. We see that the rate decreases when the censoring proportion increases, this decrease is strongerfor n = 100 . Looking at the n samples, keeping the censoring proportion ﬁxed, the rate increased when the samplesize increased. 15 PREPRINT - S

EPTEMBER

24, 2020

Index A I C v a l ue s AIC

Index B I C v a l ue s BIC

Index E DC v a l ue s EDC

Figure 2:

Simulated data: Number of mixture components. AIC, BIC and EDC values for 100 samples and left-censoring level . Solid line: FM-MSNC, long dashed: FM-MNC(2), dot dashed: FM-MNC(3) and dotted: FM-MNC(4).

Table 5: Simulated data: Clustering. ACCR for ﬁtted models FM-MSNC and FM-MNC for the simulated. n FM-MSNC FM-MNC

0% 5% 10% 20% 0% 5% 10% 20%

100 0.9685 0.9619 0.9558 0.945 0.8801 0.8809 0.8772 0.8735200 0.9729 0.9661 0.9599 0.9508 0.9206 0.9191 0.916 0.9016300 0.9733 0.9661 0.9608 0.9545 0.9248 0.9229 0.9233 0.9155

Figure 3 shows the allocations in each group for sample size n = 200 and left-censoring proportion of , , and , where the groups are represented by black and red points. The ﬁrst line of graphics (a - d) contains the scatterplot of the generated real data, where the black circles represent an observation erroneously classiﬁed as belonging tothe black group. The second line of graphics (e - h) contains the scatter plot of the ﬁtted FM-MSNC model, where theblack circles represent an observation erroneously classiﬁed as belonging to the black group. The last line of graphics(i - l) contains the scatter plot of the ﬁtted

FM-MNC model, where the red circles represent an observation erroneouslyclassiﬁed as belonging to the red group.

In this simulation study, we analyze the absolute bias and the mean square error (mse) of the estimates obtained fromthe

FM-MSNC model through the proposed EM algorithm. The idea of this simulation is to provide empirical evidenceabout the consistency of the ML estimates. These measures are deﬁned by bias ( θ i ) = 1 M M X m =1 | b θ ( m ) i − θ i | and mse ( θ i ) = 1 M M X m =1 ( b θ ( m ) i − θ i ) , (31) where M is the number of MC samples, and b θ ( m ) i is the estimated ML of the parameter θ i for the m th sample. Fourdifferent sample sizes ( n = 300 , , , are considered. For each sample size, we generated Monte Carlo16

PREPRINT - S

EPTEMBER

24, 2020 (a) (b) (c)

369 2.5 5.0 7.5 (d) (e) (f) (g)

369 2.5 5.0 7.5 (h) (i) (j) (k)

369 2.5 5.0 7.5 (l)

Figure 3:

Simulated data: Clustering. Scatter plots for some simulated data from FM-MSNC model with n = 200 and therespective density contours (ﬁrst line). Clustering scatter plots from ﬁtted skew-normal (second line) and normal (last line) withmultiples censoring level. samples with , , censoring proportion. Using the EM algorithm, the absolute bias and mean squared errorfor each parameter over the datasets were computed. The parameter setup is as follows . SN (cid:18)(cid:20) − − (cid:21) , (cid:20) . (cid:21) , (cid:20) − (cid:21)(cid:19) + 0 . SN (cid:18)(cid:20) (cid:21) , (cid:20) . (cid:21) , (cid:20) − (cid:21)(cid:19) . The results of the estimates of µ , F , λ , µ , F , λ and π are shown in Figures 4, 5, and 6. As a general rule, we cansay that the bias and mse tend to approach zero when the sample size increases, indicating that the estimates basedon the proposed EM algorithm, under the FM-MSNC model, provide good asymptotic properties.

To illustrate the performance of our proposed model and algorithm, we consider a dataset of trace metal concentrationscollected by the Virginia Department of Environmental Quality (VDEQ) that was previously analyzed by [16] and [19]using the normal and Student-t distribution, respectively.This dataset consists of p = 5 concentration levels of dissolved trace metals in independently selected n = 184 freshwater streams across the Commonwealth of Virginia. The ﬁve attributes are trace metals: copper (Cu), lead (Pb),zinc (Zn), calcium (Ca) and magnesium (Mg). The Cu, Pb, and Zn concentrations are reported in µ g/L of water,whereas Ca and Mg concentrations are reported in mg/L of water. Since the measurements were taken at differenttimes, the presence of multiple limits of detection values are possible for each trace metal [34]. The limits of detection17 PREPRINT - S

EPTEMBER

24, 2020

Samples sizes b i a s Censoring m

11 0.050.100.150.20 300 600 900 1200

Samples sizes m s e Censoring m

11 0.20.3 300 600 900 1200

Samples sizes b i a s Censoring m

12 0.050.100.150.20 300 600 900 1200

Samples sizes m s e Censoring m

12 0.20.3 300 600 900 1200

Samples sizes b i a s Censoring m Samples sizes m s e Censoring m

21 0.20.3 300 600 900 1200

Samples sizes b i a s Censoring m

22 0.050.100.150.20 300 600 900 1200

Samples sizes m s e Censoring m Figure 4:

Simulated data: Asymptotic properties. bias and mse of µ and µ estimate in the FM-MSNC model with differentcensoring levels: (solid line), (dashed line), (dot-dashed line). Samples sizes b i a s Censoring l

11 01234 300 600 900 1200

Samples sizes m s e Censoring l

11 0.250.500.751.00 300 600 900 1200

Samples sizes b i a s Censoring l

12 01234 300 600 900 1200

Samples sizes m s e Censoring l

12 0.250.500.751.00 300 600 900 1200

Samples sizes b i a s Censoring l Samples sizes m s e Censoring l

21 0.250.500.751.00 300 600 900 1200

Samples sizes b i a s Censoring l

22 01234 300 600 900 1200

Samples sizes m s e Censoring l Figure 5:

Simulated data: Asymptotic properties. bias and mse of λ and λ estimate in the FM-MSNC model with differentcensoring levels: (solid line), (dashed line), (dot-dashed line). for Cu and Pb are both the . µ g/L, . µ g/L for Zn, while Ca and Mg have limits of . mg/L and . mg/L,respectively.The percentage of left-censored values of . for (Ca), . for (Cu), . for (Mg) are small in comparison to . for (Pb) and . for (Zn). Also note that . of the streams had non-detected trace metals, . had , . had , . had , . had and . had . As the concentration levels are strictly positive measures,to guarantee this, we consider an interval-censoring analysis by setting all lower limits of detection equal to forall trace metals. Also, due to the different scales for each trace metal, we standardize the dataset to have zero meanand variance equal to one as in [37]. The work mentioned before considered this dataset to be left censored withouttaking into account the possibility of predicting negative concentration levels for the trace metals. For instance, notethat Pb censored concentrations take values in the small interval [0 , . . Thus, after transforming the data, the newlimits of detection are − . (Cu), − . (Pb), − . (Zn), − . (Ca), − . (Mg). Figure 7 shows thehistogram for each original trace metal with the detection limits and all of them together.It can be seen that most of thedistributions associated with the variables have two or more modes and are right skewed. For this reason we proposeto ﬁt a FM-MSNC model.We ﬁt the data with 1, 2 and 3 components considering the

FM-MSNC , FM-MtC and

FM-MNC models, for the

FM-MtC model we consider ﬁxed degrees of freedom, as described in [19]. The number of groups of the model is chosenaccording to the information criteria as shown in Table 6. It can be seen that according to all model selection criteriathe

FM-MSNC model with three components ﬁts the data best. We considered the variance-covariance ( Γ ) to be equalin order to reduce the number of parameters to be estimated (parsimonious model).The ML estimates of the parameters were obtained using the EM algorithm described in Subsection 4.1. The resultsare shown in Table 7. As can be seen, . of the freshwater streams belong to Cluster 1, Cluster 2 contains around18 PREPRINT - S

EPTEMBER

24, 2020

Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring a Samples sizes m s e Censoring a Samples sizes b i a s Censoring p Samples sizes m s e Censoring p Figure 6:

Simulated data: Asymptotic properties. bias and mse of F = Σ / , F = Σ / and π estimates in the FM-MSNCmodel with different censoring levels: (solid line), (dashed line), (dot-dashed line). Limit of detection = 0.1Censoring rate = 4.9% Cu D en s i t y Limit of detection = 0.1Censoring rate = 78.3% Pb D en s i t y Limit of detection = 1.0Censoring rate = 38.6% Zn D en s i t y Limit of detection = 0.5Censoring rate = 2.7% Ca D en s i t y Limit of detection = 1.0Censoring rate = 9.8% Mg D en s i t y All D en s i t y Figure 7:

VDEQ data. Histograms of all the original ﬁve attributes of Virginia trace metal concentration data and all attributestogether. The red line means the censoring limit detection for each concentration. . of them and the remaining . belong to Cluster 3. Table 6 shows that the best FM-MNC model has twocomponents, and in this . of freshwater streams belong to Cluster 1 and the remaining . are in Cluster 2.For the FM-MtC model, the best model also has two components and three degrees of freedom. In this case, . offreshwater streams belong to Cluster 1 and the remaining . belong to Cluster 2.In Figure 8, we ﬁt the data using the FM-MSNC with three components. The scatter plots of the observations y i ( i = 1 , . . . , for each pair of trace metals reveal that it is difﬁcult to classify freshwater streams by visualizationbecause these observations almost mix together. In this paper, a novel approach to analyze multiply censored and missing data is presented based on the use of ﬁnitemixtures of multivariate skew-normal distributions. This approach generalizes several previously proposed solutions19

PREPRINT - S

EPTEMBER

24, 2020Table 6: VDEQ data. Model selection criteria for various FM-MSNC and FM-MNC models. Values in bold correspondto the best model according to the criteria.

FM-MSNC FM-MNCCriteria G = 1 G = 2 G = 3 G = 1 G = 2 G = 3 Log-likelihood -1269.302 -910.3387 -697.6815 -1351.596 -1268.848 -1210.626AIC 2588.604 1892.677 ν = 3 ν = 4 Criteria G = 1 G = 2 G = 3 G = 1 G = 2 G = 3 Log-likelihood -1040.276 -1018.943 -1074.852 -1061.702 -1036.393 -1072.487AIC 2120.553 2089.887 2213.705 2163.404 2124.786 2208.974BIC 2184.852 2173.475 2316.583 2227.702 2208.375 2311.852EDC 2134.812 2108.423 2236.519 2177.662 2143.323 2231.788Time 16.672 sec. 35.9952 sec. 2.8734 min. 13.1202 sec. 29.9673 sec. 2.68 min.

Table 7: VDEQ data. ML estimates of parameters from ﬁtting the FM-MSNC model with 3 components to the Virginiatrace metal concentration data.

Parameter Estimate ( π , π , π ) (0 . , . , . µ ( − . , − . , − . , − . , − . µ ( − . , − . , − . , . , . µ (1 . , − . , − . , − . , − . λ (0 . , − . , . , − . , − . λ (28 . , − . , . , . , − . λ ( − . , . , . , − . , . F = Σ /  . . . . . . . − . . . − . − . . . .  F = Σ /  . − .

035 0 . . . . − . . . . . . . . .  F = Σ /  . − . − . . . . . − . − . . − . − . . . .  for censored data, such as, the ﬁnite mixture of Gaussian components [17, 12, 16] and the ﬁnite mixture of Student-t components [19], which are also restricted to a left or right censored problem. A simple and efﬁcient EM-typealgorithm was developed, which has closed-form expressions at the E-step and relies on formulas for the mean vectorand covariance matrix of the multivariate truncated skew-normal distribution, for which the the R MomTrunc libraryis used [15]. The proposed EM algorithm was implemented as part of the R package CensMFM and is available fordownload at the CRAN repository. The experimental results and the analysis of a real dataset provide support for theusefulness and effectiveness of our proposal.The method proposed in this paper can be extended to other types of mixture distributions, for example, the multivariatescale mixtures of skew-normal distributions [11] or generalized hyperbolic mixtures [10]. It is also of interest todevelop an effective Markov chain Monte Carlo algorithm for the FM-MSNC models in a fully Bayesian treatment.Finally, the proposed methods can also be easily applied to other substantial areas in which the data being analyzedhave censored and/or missing observations, for instance, factor analysis models [36] and linear mixed models [22, 18].

References [1] Akaike H (1974) A new look at the statistical model identiﬁcation. IEEE Trans Autom Cont 19:716–72320

PREPRINT - S

EPTEMBER

24, 2020

Cu0 1 2 3 405101520250.00.30.60.9 0 1 2 3 4 Cu P b Cu Z n Cu C a Cu M g Pb0.00 0.25 0.50 0.75 1.00020400102030 0.0 0.3 0.6 0.9 Pb Z n Pb C a Pb M g Zn0 10 20 30020400204060 0 10 20 30 Zn C a Zn M g Ca0 20 40 600102030400102030 0 20 40 60 Ca M g Mg0 10 20 30010203040

Figure 8: Scatter plots and histograms of the ﬁtted values of y j for each predicted clusters of the VDEQ data. Lowerdiagonal entries: • cluster 1; N cluster 2; + cluster 3.[2] Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. Journal of Multivariate Analysis96:93–116[3] Arellano-Valle RB, Genton MG (2010) Multivariate extended skew-t distributions and related families. MetronLXVIII:201–234[4] Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. Journal ofthe Royal Statistical Society, Series B 61:579–602[5] Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83(4):715–726[6] Bai Z, Krishnaiah P, Zhao L (1989) On rates of convergence of efﬁcient detection criteria in signal processingwith white noise. Inform Theory IEEE Trans 35:380–388[7] Basford K, Greenway D, McLachlan G, Peel D (1997) Standard errors of ﬁtted component means of normalmixtures. Computational Statistics 12:1–18[8] Basso RM, Lachos VH, Cabral CRB, Ghosh P (2010) Robust mixture modeling based on scale mixtures ofskew-normal distributions. Computational Statistics & Data Analysis 54(12):2926–294121 PREPRINT - S

EPTEMBER

24, 2020[9] Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-Based Clustering and Classiﬁcation for Data Sci-ence: With Applications in R. Cambridge University Press[10] Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Canadian Journal ofStatistics 43(2):176–198[11] Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independentdistributions. Computational Statistics & Data Analysis 56:126–142[12] Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normaldistributions. Statistical Methods & Applications 21:121–137[13] Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journalof the Royal Statistical Society, Series B 39:1–38[14] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer Science & Business Media[15] Galarza CE, Kan R, Lachos VH (2020) MomTrunc: Moments of Folded and Doubly Truncated MultivariateDistributions. R Package Version 5.87 URL http://CRANR-projectorg/package=MomTrunc[16] He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Ad-vances in Water Resources 59:15–24[17] Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Statistical Papers55(3):627–642[18] Lachos VH, Bandyopadhyay D, Dey DK (2011) Linear and nonlinear mixed-effects models for censored HIVviral loads using normal/independent distributions. Biometrics 67:1594–1604[19] Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using themultivariate Student-t distribution. Journal of Multivariate Analysis 159:151–167[20] Lachos VH, Cabral CRB, Zeller CB (2018) Finite Mixture of Skewed Distributions. Springer[21] Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multi-variate Analysis 100(2):257–265[22] Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. Journal ofMultivariate Analysis 100(19):2337–2351[23] Lin TI, Wang WL (2019) Multivariate-t linear mixed models with censored responses, intermittent missing valuesand heavy tails. Statistical Methods in Medical Research p 0962280219857103[24] Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittentmissing responses. Statistics in Medicine 37(19):2822–2835[25] Little RJ, Rubin DB (2002) Statistical analysis with missing data, vol 793. John Wiley & Sons[26] Louis TA (1982) Finding the observed information matrix when using the em algorithm. Journal of the RoyalStatistical Society, Series B 44:226–233[27] McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd edn. Wiley[28] McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York[29] McNicholas PD (2016) Mixture model-based classiﬁcation. Chapman and Hall/CRC[30] Meilijson I (1989) A fast improvement to the em algorithm on its own terms. Journal of the Royal StatisticalSociety, Series B (Methodological) 51(1):127–138[31] Peel D, McLachlan GJ (2000) Finite Mixture Models. John Wiley & Sons[32] Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Statistics and Computing10(4):339–348[33] Prates MO, Lachos VH, Cabral C (2013) mixsmsn: Fitting ﬁnite mixture of scale mixture of skew-normal distri-butions. Journal of Statistical Software 54(12):1–20[34] Richmond V (2003) The quality of virginia non-tidal streams: First year report[35] Schwarz G (1978) Estimating the dimension of a model. The Annals of Statistics 6:461–464[36] Wang WL, Liu M, Lin TI (2017) Robust skew-t factor analysis models for handling missing data. StatisticalMethods & Applications 26(4):649–672[37] Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factoranalyzers. Computational Statistics & Data Analysis 140:104–121[38] Zeller CB, Cabral CR, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25(2):375–396 2224, 2020[9] Bouveyron C, Celeux G, Murphy T, Raftery A (2019) Model-Based Clustering and Classiﬁcation for Data Sci-ence: With Applications in R. Cambridge University Press[10] Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Canadian Journal ofStatistics 43(2):176–198[11] Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independentdistributions. Computational Statistics & Data Analysis 56:126–142[12] Caudill SB (2012) A partially adaptive estimator for the censored regression model based on a mixture of normaldistributions. Statistical Methods & Applications 21:121–137[13] Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journalof the Royal Statistical Society, Series B 39:1–38[14] Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer Science & Business Media[15] Galarza CE, Kan R, Lachos VH (2020) MomTrunc: Moments of Folded and Doubly Truncated MultivariateDistributions. R Package Version 5.87 URL http://CRANR-projectorg/package=MomTrunc[16] He J (2013) Mixture model based multivariate statistical analysis of multiply censored environmental data. Ad-vances in Water Resources 59:15–24[17] Karlsson M, Laitila T (2014) Finite mixture modeling of censored regression models. Statistical Papers55(3):627–642[18] Lachos VH, Bandyopadhyay D, Dey DK (2011) Linear and nonlinear mixed-effects models for censored HIVviral loads using normal/independent distributions. Biometrics 67:1594–1604[19] Lachos VH, Moreno EJL, Chen K, Cabral CRB (2017) Finite mixture modeling of censored data using themultivariate Student-t distribution. Journal of Multivariate Analysis 159:151–167[20] Lachos VH, Cabral CRB, Zeller CB (2018) Finite Mixture of Skewed Distributions. Springer[21] Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multi-variate Analysis 100(2):257–265[22] Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. Journal ofMultivariate Analysis 100(19):2337–2351[23] Lin TI, Wang WL (2019) Multivariate-t linear mixed models with censored responses, intermittent missing valuesand heavy tails. Statistical Methods in Medical Research p 0962280219857103[24] Lin TI, Lachos VH, Wang WL (2018) Multivariate longitudinal data analysis with censored and intermittentmissing responses. Statistics in Medicine 37(19):2822–2835[25] Little RJ, Rubin DB (2002) Statistical analysis with missing data, vol 793. John Wiley & Sons[26] Louis TA (1982) Finding the observed information matrix when using the em algorithm. Journal of the RoyalStatistical Society, Series B 44:226–233[27] McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd edn. Wiley[28] McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York[29] McNicholas PD (2016) Mixture model-based classiﬁcation. Chapman and Hall/CRC[30] Meilijson I (1989) A fast improvement to the em algorithm on its own terms. Journal of the Royal StatisticalSociety, Series B (Methodological) 51(1):127–138[31] Peel D, McLachlan GJ (2000) Finite Mixture Models. John Wiley & Sons[32] Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Statistics and Computing10(4):339–348[33] Prates MO, Lachos VH, Cabral C (2013) mixsmsn: Fitting ﬁnite mixture of scale mixture of skew-normal distri-butions. Journal of Statistical Software 54(12):1–20[34] Richmond V (2003) The quality of virginia non-tidal streams: First year report[35] Schwarz G (1978) Estimating the dimension of a model. The Annals of Statistics 6:461–464[36] Wang WL, Liu M, Lin TI (2017) Robust skew-t factor analysis models for handling missing data. StatisticalMethods & Applications 26(4):649–672[37] Wang WL, Castro LM, Lachos VH, Lin TI (2019) Model-based clustering of censored data via mixtures of factoranalyzers. Computational Statistics & Data Analysis 140:104–121[38] Zeller CB, Cabral CR, Lachos VH (2016) Robust mixture regression modeling based on scale mixtures of skew-normal distributions. Test 25(2):375–396 22