[PDF] On moments of folded and doubly truncated multivariate extended skew-normal distributions

Abstract

This paper develops recurrence relations for integrals that relate the density of multivariate extended skew-normal (ESN) distribution, including the well-known skew-normal (SN) distribution introduced by Azzalini and Dalla-Valle (1996) and the popular multivariate normal distribution. These recursions offer a fast computation of arbitrary order product moments of the multivariate truncated extended skew-normal and multivariate folded extended skew-normal distributions with the product moments as a byproduct. In addition to the recurrence approach, we realized that any arbitrary moment of the truncated multivariate extended skew-normal distribution can be computed using a corresponding moment of a truncated multivariate normal distribution, pointing the way to a faster algorithm since a less number of integrals is required for its computation which result much simpler to evaluate. Since there are several methods available to calculate the first two moments of a multivariate truncated normal distribution, we propose an optimized method that offers a better performance in terms of time and accuracy, in addition to consider extreme cases in which other methods fail. The R MomTrunc package provides these new efficient methods for practitioners.

Full PDF

aa r X i v : . [ m a t h . S T ] S e p O N MOMENTS OF FOLDED AND DOUB LY TRUNC ATEDMULTIVARIATE EXTENDED SKEW - NORMAL DISTR IBUTIONS

Christian E. Galarza

Departamento de EstadísticaEscuela Superior Politecnica del LitoralGuayaquil, Ecuador [email protected]

Larissa A. Matos

Departamento de EstatísticaUniversidade Estadual de CampinasCampinas, Brazil [email protected]

Dipak K. Dey

Department of StatisticsUniversity of ConnecticutStorrs CT 06269, U.S.A. [email protected]

Victor H. Lachos

Department of StatisticsUniversity of ConnecticutStorrs CT 06269, U.S.A. [email protected]

September 29, 2020 A BSTRACT

This paper develops recurrence relations for integrals that relate the density of multivariate extendedskew-normal (ESN) distribution, including the well-known skew-normal (SN) distribution intro-duced by [1] and the popular multivariate normal distribution. These recursions offer a fast com-putation of arbitrary order product moments of the multivariate truncated extended skew-normaland multivariate folded extended skew-normal distributions with the product moments as a byprod-uct. In addition to the recurrence approach, we realized that any arbitrary moment of the truncatedmultivariate extended skew-normal distribution can be computed using a corresponding moment ofa truncated multivariate normal distribution, pointing the way to a faster algorithm since a less num-ber of integrals is required for its computation which result much simpler to evaluate. Since thereare several methods available to calculate the ﬁrst two moments of a multivariate truncated normaldistribution, we propose an optimized method that offers a better performance in terms of time andaccuracy, in addition to consider extreme cases in which other methods fail. The

R MomTrunc package provides these new efﬁcient methods for practitioners. K eywords Extended skew-normal distribution · Folded normal distribution · Product moments · Truncateddistributions.

Many applications on simulations or experimental studies, the researches often generate a large number of datasetswith restricted values to ﬁxed intervals. For example, variables such as pH, grades, viral load in HIV studies andhumidity in environmental studies, have upper and lower bounds due to detection limits, and the support of their den-sities is restricted to some given intervals. Thus, the need to study truncated distributions along with their propertiesnaturally arises. In this context, there has been a growing interest in evaluating the moments of truncated distributions.These variables are also often skewed, departing from the traditional assumption of using symmetric distributions.For instance, [2] provided the formulas for the ﬁrst two moments of truncated multivariate normal (TN) distributions.[3] gave the expressions for the moments of truncated bivariate log-normal distributions with applications to test theHouthakker effect ([4]) in future markets. [5] derived the truncated moments of several continuous univariate distri-butions commonly applied to hydrologic problems. [6] provided analytical formulas for moments of the truncatedunivariate Student-t distribution in a recursive form. [7] obtained expressions for the moments of truncated univariate

PREPRINT - S

EPTEMBER

29, 2020skew-normal distributions ([8]) and applied the results to model the relative humidity data. [9] studied the momentsof a doubly truncated member of the symmetrical class of univariate normal/independent distributions and their appli-cations to the actuarial data. [10] presented a general formula based on the slice sampling algorithm to approximatethe ﬁrst two moments of the truncated multivariate Student- t (TT) distribution under the double truncation. [11] pro-vided explicit expressions for computing arbitrary order product moments of the TN distribution by using the momentgenerating function (MGF). However, the calculation of this approach relies on differentiation of the MGF and can besomewhat time consuming.Instead of differentiating the MGF of the TN distribution, [12] recently presented recurrence relations for integralsthat are directly related to the density of the multivariate normal distribution for computing arbitrary order productmoments of the TN distribution. These recursions offer a fast computation of the moments of folded normal (FN)and TN distributions, which require evaluating p -dimensional integrals that involve the Normal (N) density. Explicitexpressions for some low order moments of FN and TN distributions are presented in a clever way, although someproposals to calculate the moments of the univariate truncated skew-normal distribution ([7]) and truncated univariateskew-normal/independent distribution ([7]) has recently been published. So far, to the best of our knowledge, there hasnot been attempt on studying neither moments nor product moments of the multivariate folded extended skew-normal(FESN) and truncated multivariate extended skew-normal (TESN) distributions. Moreover, our proposed methodsallow to compute, as a by-product, the product moments of folded and truncated distributions, of the N ([12]), SN([1]), and their respective univariate versions. The proposed algorithm and methods are implemented in the new Rpackage “MomTrunc”.The rest of this paper is organized as follows. In Section 2 we brieﬂy discuss some preliminary results related to themultivariate SN, ESN and TESN distributions and some of its key properties. The section 3 presents a recurrenceformula of an integral to be applied in the essential evaluation of moments of the TESN distribution as well as explicitexpressions for the ﬁrst two moments of the TESN and TN distributions. A direct relation between the moments of theTESN and TN distribution is also presented which is used to improved the proposed methods. In section 4, by means ofapproximations, we propose strategies to circumvent some numerical problems that arise on limiting distributions andextreme cases. We compare our proposal with others popular methods of the literature in Section 5. Finally, Section6 is devoted to the moments of the FESN distribution, several related results are discussed. Explicit expressions arepresented for high order moments for the univariate case and the mean vector and variance-covariance matrix of themultivariate FESN distribution. Finally, some concluding remarks are presented in Section 7. We start our exposition by deﬁning some notation and presenting the basic concepts which are used throughout thedevelopment of our theory. As is usual in probability theory and its applications, we denote a random variable by anupper-case letter and its realization by the corresponding lower case and use boldface letters for vectors and matrices.Let I p and J p represent an identity matrix and a matrix of ones, respectively, both of dimension p × p , A ⊤ be thetranspose of A , and | X | = ( | X | . . . . , | X p | ) ⊤ mean the absolute value of each component of the vector X . Formultiple integrals, we use the shorthand notation Z ba f ( x ) d x = Z b a . . . Z b p a p f ( x , . . . , x p ) dx . . . x p , where a = ( a , . . . , a p ) ⊤ and b = ( b , . . . , b p ) ⊤ . In this subsection we present the skew-normal distribution and some of its properties. We say that a p × randomvector Y follows a multivariate SN distribution with p × location vector µ , p × p positive deﬁnite dispersion matrix Σ and p × skewness parameter vector, and we write Y ∼ SN p ( µ , Σ , λ ) , if its joint probability density function (pdf)is given by SN p ( y ; µ , Σ , λ ) = 2 φ p ( y ; µ , Σ )Φ ( λ ⊤ Σ − / ( y − µ )) , (1)where φ p ( · ; µ , Σ ) represents the probability density distribution (pdf) of a p -variate normal distribution with vectormean µ and variance-covariance matrix Σ , and Φ ( · ) stands for the cumulative distribution function (cdf) of a standardunivariate normal distribution. If λ = then (1) reduces to the symmetric N p ( µ , Σ ) pdf. Except by a straightfor-ward difference in the parametrization considered in (1), this model corresponds to the one introduced by [1], whoseproperties were extensively studied in [14] (see also, [13]).2 PREPRINT - S

EPTEMBER

29, 2020

Proposition 1 ( cdf of the SN ) . If Y ∼ SN p ( µ , Σ , λ ) , then for any y ∈ R p F Y ( y ) = P ( Y ≤ y ) = 2Φ p +1 (cid:0) ( y ⊤ , ⊤ ; µ ∗ , Ω (cid:1) , where µ ∗ = ( µ ⊤ , ⊤ and Ω = (cid:18) Σ − ∆ − ∆ ⊤ (cid:19) , with ∆ = Σ / λ / (1 + λ ⊤ λ ) / . It is worth mentioning that the multivariate skew-normal distribution is not closed over marginalization and condition-ing. Next, we present its extended version which holds these properties, called, the multivariate ESN distribution.

We say that a p × random vector Y follows a ESN distribution with p × location vector µ , p × p positive deﬁnitedispersion matrix Σ , a p × skewness parameter vector, and shift parameter τ ∈ R , denoted by Y ∼ ESN p ( µ , Σ , λ , τ ) , if its pdf is given by ESN p ( y ; µ , Σ , λ , τ ) = ξ − φ p ( y ; µ , Σ )Φ ( τ + λ ⊤ Σ − / ( y − µ )) , (2)with ξ = Φ ( τ / (1 + λ ⊤ λ ) / ) . Note that when τ = 0 , we retrieve the skew-normal distribution deﬁned in (1), that is, ESN p ( y ; µ , Σ , λ ,

0) = SN p ( y ; µ , Σ , λ ) . Here, we used a slightly different parametrization of the ESN distributionthan the one given in [15] and [16]. Futhermore, [16] deals with the multivariate extended skew-t (EST) distribution,in which the ESN is a particular case when the degrees of freedom ν goes to inﬁnity. From this last work, it isstraightforward to see that ESN p ( y ; µ , Σ , λ , τ ) −→ φ p ( y ; µ , Σ ) , a s τ → + ∞ . Also, letting Z = Σ − / ( Y − µ ) , it follows that Z ∼ ESN p ( , I , λ , τ ) , with mean vector and variance-covariancematrix E [ Z ] = η λ and cov[ Z ] = I p − E [ Z ] (cid:18) E [ Z ] − τ λ ⊤ λ λ (cid:19) ⊤ , with η = φ ( τ ; 0 , λ ⊤ λ ) /ξ . Then, the mean vector and variance-covariance matrix of Y can be easily computedas E [ Y ] = µ + Σ / E [ Z ] and cov[ Y ] = Σ / cov[ Z ] Σ / .The following propositions are crucial to develop our methods. The proofs can be found in the Appendix A. Proposition 2 ( Marginal and conditional distribution of the ESN ) . Let Y ∼ ESN p ( µ , Σ , λ , τ ) and Y is partitionedas Y = ( Y ⊤ , Y ⊤ ) ⊤ of dimensions p and p ( p + p = p ), respectively. Let Σ = (cid:18) Σ Σ Σ Σ (cid:19) , µ = ( µ ⊤ , µ ⊤ ) ⊤ , λ = ( λ ⊤ , λ ⊤ ) ⊤ and ϕ = ( ϕ ⊤ , ϕ ⊤ ) ⊤ be the corresponding partitions of Σ , µ , λ and ϕ = Σ − / λ . Then, Y ∼ ESN p ( µ , Σ , c Σ / ˜ ϕ , c τ ) and Y | Y = y ∼ ESN p ( µ . , Σ . , Σ / . ϕ , τ . ) , where c = (1+ ϕ ⊤ Σ . ϕ ) − / , ˜ ϕ = ϕ + Σ − Σ ϕ , Σ . = Σ − Σ Σ − Σ , µ . = µ + Σ Σ − ( y − µ ) and τ . = τ + ˜ ϕ ⊤ ( y − µ ) . Proposition 3 ( Stochastic representation of the ESN ) . Let X = ( X ⊤ , X ) ⊤ ∼ N p +1 ( µ ∗ , Ω ) . If Y d = ( X | X < ˜ τ ) , it follows that Y ∼ ESN p ( µ , Σ , λ , τ ) , with µ ∗ and Ω as deﬁned in Proposition 1, and ˜ τ = τ / (1 + λ ⊤ λ ) / . The stochastic representation above can be derived from Proposition 1 in [16].

Proposition 4 (cdf of the ESN ) . If Y ∼ ESN p ( µ , Σ , λ , τ ) , then for any y ∈ R p F Y ( y ) = P ( Y ≤ y ) = ξ − Φ p +1 (cid:0) ( y ⊤ , ˜ τ ) ⊤ ; µ ∗ , Ω (cid:1) . Proof is direct from Proposition 3 by noting that ξ = P ( X < ˜ τ ) . Hereinafter, for Y ∼ ESN p ( µ , Σ , λ , τ ) , we willdenote to its cdf as F Y ( y ) ≡ ˜Φ p ( y ; µ , Σ , λ , τ ) for simplicity.Let A be a Borel set in R p . We say that the random vector Y has a truncated extended skew-normal distribution on A when Y has the same distribution as Y | ( Y ∈ A ) . In this case, the pdf of Y is given by f ( y | µ , Σ , ν ; A ) = ESN p ( y ; µ , Σ , λ , τ ) P ( Y ∈ A ) A ( y ) , PREPRINT - S

EPTEMBER

29, 2020where A is the indicator function of A . We use the notation Y ∼ TESN p ( µ , Σ , λ , τ ; A ) . If A has the form A = { ( x , . . . , x p ) ∈ R p : a ≤ x ≤ b , . . . , a p ≤ x p ≤ b p } = { x ∈ R p : a ≤ x ≤ b } , then we use the notation { Y ∈ A } = { a ≤ Y ≤ b } , where a = ( a , . . . , a p ) ⊤ and b = ( b , . . . , b p ) ⊤ . Here, we saythat the distribution of Y is doubly truncated. Analogously, we deﬁne { Y ≥ a } and { Y ≤ b } . Thus, we say that thedistribution of Y is truncated from below and truncated from above, respectively. For convenience, we also use thenotation Y ∼ TESN p ( µ , Σ , λ , τ ; [ a , b ]) . For two p -dimensional vectors x = ( x , . . . , x p ) ⊤ and κ = ( k , . . . , k p ) ⊤ , let x κ stand for ( x κ , . . . , x κ p p ) , and let a ( i ) be a vector a with its i th element being removed. For a matrix A , we let A i ( j ) stand for the i th row of A with its j th element being removed. Similarly, A ( i,j ) stands for the matrix A with its i th row and j th columns being removed.Besides, let e i denote a p × vector with its i th element equaling one and zero otherwise. Let L p ( a , b ; µ , Σ , λ , τ ) = Z ba ESN p ( x ; µ , Σ , λ , τ )d x . We are interested in evaluating the integral F p κ ( a , b ; µ , Σ , λ , τ ) = Z ba x κ ESN p ( x ; µ , Σ , λ , τ )d x . (3)The boundary condition is obviously F p ( a , b ; µ , Σ , λ , τ ) = L p ( a , b ; µ , Σ , λ , τ ) . When λ = and τ = 0 , werecover the multivariate normal case, and then F p κ ( a , b ; µ , Σ , , ≡ F p κ ( a , b ; µ , Σ ) = Z ba x κ φ p ( x ; µ , Σ ) d x , (4)with boundary condition L p ( a , b ; µ , Σ , , ≡ L p ( a , b ; µ , Σ ) = Z ba φ p ( x ; µ , Σ ) d x . (5)Note that we use calligraphic style for the integrals of interest F p κ and L p when we work with the skewed version. Inboth expressions (4) and (5), for the normal case, we are using compatible notation with the one used by [12]. When p = 1 , it is straightforward to use integration by parts to show that F ( a, b ; µ, σ , λ, τ ) = ξ − [Φ (( b − µ, τ ) ⊤ ; , Ω ) − Φ (( a − µ, τ ) ⊤ ; , Ω )] , F k +1 ( a, b ; µ, σ , λ, τ ) = µ F k ( a, b ; µ, σ , λ, τ ) + kσ F k − ( a, b ; µ, σ , λ, τ )+ σ (cid:0) a k ESN ( a ; µ, σ , λ, τ ) − b k ESN ( b ; µ, σ , λ, τ ) (cid:1) + λσηF k ( a, b ; µ − µ b , γ ); for k ≥ , where Ω = (cid:18) σ − σψ − σψ (cid:19) , ψ = λ/ √ λ , µ b = 1 σ λτ γ and γ = σ/ √ λ .When p > , we need a similar recurrence relation in order to compute F p κ ( a , b , µ , Σ , λ , τ ) which is presented in thenext theorem. For p ≥ and i = 1 , . . . , p , F p κ + e i ( a , b ; µ , Σ , λ , τ ) = µ i F p κ ( a , b ; µ , Σ , λ , τ ) + δ i F p κ ( a , b ; µ − µ b , Γ ) + e ⊤ i Σd κ , (6)4 PREPRINT - S

EPTEMBER

29, 2020 where δ = ( δ , . . . , δ p ) ⊤ = η Σ / λ , µ b = ˜ τ ∆ , Γ = Σ − ∆∆ ⊤ and d κ is a p -vector with j th element d κ ,j = k j F p κ − e j ( a , b , µ , Σ , λ , τ )+ a k j j ESN ( a j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) F p − κ ( j ) ( a ( j ) , b ( j ) ; ˜ µ a j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ a j ) − b k j j ESN ( b j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) F p − κ ( j ) ( a ( j ) , b ( j ) ; ˜ µ b j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ b j ) , (7) where ˜ µ a j = µ ( j ) + Σ ( j ) ,j a j − µ j σ j , ˜ µ b j = µ ( j ) + Σ ( j ) ,j b j − µ j σ j , ˜ ϕ j = ϕ j + 1 σ j Σ j ( j ) ϕ ( j ) , ˜ Σ j = Σ ( j ) , ( j ) − σ j Σ ( j ) ,j Σ j, ( j ) ,c j = 1(1 + ϕ ⊤ ( j ) ˜ Σ j ϕ ( j ) ) / , ˜ τ a j = τ + ˜ ϕ j ( a j − µ j ) , and ˜ τ b j = τ + ˜ ϕ j ( b j − µ j ) . Proof.

Let X = ( X ⊤ , X ) ⊤ ∼ N p +1 ( µ ∗ , Ω ) as in Proposition 2. From the conditional distribution of a multivariatenormal, it is straightforward to show that X | X ∼ N p ( µ − X ∆ , Γ ) and X | X ∼ N ( − ∆ ⊤ Σ − ( X − µ ) , − ∆ ⊤ Σ − ∆ ) . Then it holds that f X | X (˜ τ | x ) f X ( x ) = f X (˜ τ ) f X | X ( x | ˜ τ ) φ (˜ τ ; − ∆ ⊤ Σ − ( x − µ ) , − ∆ ⊤ Σ − ∆ ) φ p ( x ; µ , Σ ) = φ (˜ τ ) φ p ( x ; µ − ˜ τ ∆ , Γ ) p λ ⊤ λ × φ ( τ + λ ⊤ Σ − / ( x − µ )) φ p ( x ; µ , Σ ) = φ (˜ τ ) φ p ( x ; µ − µ b , Γ ) φ ( τ + λ ⊤ Σ − / ( x − µ )) φ p ( x ; µ , Σ ) = φ ( τ ; 0 , λ ⊤ λ ) φ p ( x ; µ − µ b , Γ ) , (8)where we have used that ∆ ⊤ Σ − ∆ = − λ ⊤ λ . Now, taking the derivative of the ESN density, then − ∂∂ x ESN p ( x ; µ , Σ , λ , τ )= − ξ − (cid:26) ∂∂ x φ p ( x ; µ , Σ ) × Φ( τ + λ ⊤ Σ − / ( x − µ )) + φ p ( x ; µ , Σ ) × ∂∂ x Φ (cid:0) τ + λ ⊤ Σ − / ( x − µ ) (cid:1)(cid:27) = ξ − n Σ − ( x − µ ) φ p ( x ; µ , Σ )Φ ( τ + λ ⊤ Σ − / ( x − µ )) − Σ − / λ φ ( τ + λ ⊤ Σ − / ( x − µ )) φ p ( x ; µ , Σ ) o = Σ − ( x − µ ) ESN p ( x ; µ , Σ , λ , τ ) − ξ − Σ − / λ φ ( τ + λ ⊤ Σ − / ( x − µ )) φ p ( x ; µ , Σ ) (8) = Σ − ( x − µ ) ESN p ( x ; µ , Σ , λ , τ ) − η Σ − / λ φ p ( x ; µ − µ b , Γ )= Σ − { ( x − µ ) ESN p ( x ; µ , Σ , λ , τ ) − δ φ p ( x ; µ − µ b , Γ ) } , with η = φ ( τ ; 0 , λ ⊤ λ ) /ξ and δ = η Σ / λ . Multiplying both sides by x κ and integrating from a to b , we have(after suppressing the arguments of F p κ and F p κ ) that d κ = Σ −  F p κ + e − µ F p κ − δ F p κ F p κ + e − µ F p κ − δ F p κ ... ... F p κ + e p − µ p F p κ − δ p F p κ  , and the j th element of the left hand side is d κ ,j = − Z b ( j ) a ( j ) x κ ESN p ( x ; µ , Σ , λ , τ ) (cid:12)(cid:12)(cid:12)(cid:12) b j x j = a j d x ( j ) + Z ba k j x κ − e j ESN p ( x ; µ , Σ , λ , τ )d x by using integration by parts. Using Proposition 2, we know that ESN p ( x ; µ , Σ , λ , τ ) (cid:12)(cid:12) x j = a j = ESN ( a j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) ESN p − ( x ( j ) ; ˜ µ a j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ a j ) ,ESN p ( x ; µ , Σ , λ , τ ) (cid:12)(cid:12) x j = b j = ESN ( b j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) ESN p − ( x ( j ) ; ˜ µ b j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ b j ) , PREPRINT - S

EPTEMBER

29, 2020and we obtain d κ ,j = k j F p κ − e j ( a , b ; µ , Σ , λ , τ )+ a k j j ESN ( a j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) F p − κ ( j ) ( a ( j ) , b ( j ) ; ˜ µ a j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ a j ) − b k j j ESN ( b j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) F p − κ ( j ) ( a ( j ) , b ( j ) ; ˜ µ b j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ b j ) . Finally, multiplying both sides by Σ , we obtain (6). This completes the proof.This delivers a simple way to compute any arbitrary moments of multivariate TSN distribution F p κ based on at most p + 1 lower order terms, with p + 1 of them being p -dimensional integrals, the rest being ( p − -dimensionalintegrals, and a normal integral F p κ that can be easily computed through our proposed R package MomTrunc availableat CRAN. When k j = 0 , the ﬁrst term in (7) vanishes. When a j = −∞ , the second term vanishes, and when b j = + ∞ , the third term vanishes. When we have no truncation, that is, all the a ′ i s are −∞ and all the b ′ i s are + ∞ ,for Y ∼ ESN p ( µ , Σ , λ , τ ) , we have that F p κ ( − ∞ , + ∞ ; µ , Σ , λ , τ ) = E [ Y κ ] , and in this case the recursive relation is E [ Y κ + e i ] = µ i E [ Y κ ] + δ i E [ W κ ] + p X j =1 σ ij k j E [ Y κ − e i ] , i = 1 , . . . , p, with W ∼ N p ( µ − µ b , Γ ) .It is worth to stress that any arbitrary truncated moment of Y , that is, E [ Y κ | a ≤ Y ≤ b ] = F p κ ( a , b ; µ , Σ , λ , τ ) L p ( a , b ; µ , Σ , λ , τ ) , (9)can be computed using the recurrence relation given in Theorem 1. In the next section, we proposed another approachto compute (9) using a unique corresponding arbitrary moment to a truncated normal vector. We have that F p κ ( a , b ; µ , Σ , λ , τ ) = ξ − F p +1 κ ∗ ( a ∗ , b ∗ ; µ ∗ , Ω ) , with µ ∗ and Ω as deﬁned in Proposition 1, and κ ∗ = ( κ ⊤ , ⊤ , a ∗ = ( a ⊤ , −∞ ) ⊤ and b ∗ = ( b ⊤ , ˜ τ ) ⊤ .In particular, for κ = , then L p ( a , b ; µ , Σ , λ , τ ) = ξ − L p +1 ( a ∗ , b ∗ ; µ ∗ , Ω ) . (10) Proof.

The proof is straightforward by Proposition 3. Since a ESN variate can be written as Y d = ( X | X < ˜ τ ) , itfollows that F p κ ( a , b ; µ , Σ , λ , τ ) = Z ba y κ f Y ( y )d y = 1 P ( X < ˜ τ ) Z ba y κ f X ( y ) P ( X < ˜ τ | X = y )d y = ξ − Z ba Z ˜ τ −∞ y κ f X ( y ) f X | X ( x | y )d x d y = ξ − Z b ∗ a ∗ x κ ∗ f X ( x , x )d x = ξ − F p +1 κ ∗ ( a ∗ , b ∗ ; µ ∗ , Ω ) , since X = ( X ⊤ , X ) ⊤ is distributed as N p +1 ( µ ∗ , Ω ) .Equation (10) offers us in a very convenient manner to compute L p ( a , b ; µ , Σ , λ , τ ) , since efﬁcient algorithms alreadyexist to calculate L p ( a , b ; µ , Σ ) (e.g., see [17]), which avoids performing p evaluations of cdf of the multivariate Ndistribution. Corollary 1.

For Y ∼ ESN p ( µ , Σ , λ , τ ) and X ∼ N p +1 ( µ ∗ , Ω ) , it follows from Theorem 2 that E [ Y κ | a ≤ Y ≤ b ] = E [ X κ ∗ | a ∗ ≤ X ≤ b ∗ ] , with a ∗ , b ∗ , κ ∗ , µ ∗ and Ω as deﬁned in Theorem 2. PREPRINT - S

EPTEMBER

29, 2020

Let us consider Y ∼ TESN p ( µ , Σ , λ , τ, [ a , b ]) . In light of Theorem 1, we have that E [ Y i ] = µ i + 1 L " δ i L + p X j =1 σ ij (cid:2) ESN ( a j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) L p − ( a ( j ) , b ( j ) ; ˜ µ a j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ a j ) − ESN ( b j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) L p − ( a ( j ) , b ( j ) ; ˜ µ b j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ b j ) (cid:3) , for i = 1 , . . . , p , where L ≡ L p ( a , b ; µ , Σ , λ , τ ) and L ≡ L p ( a , b ; µ − µ b , Γ ) .It follows that E [ Y ] = µ + 1 L [ L δ + Σ ( q a − q b )] , (11)where the j -th element of q a and q b are q a,j = ESN ( a j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) L p − ( a ( j ) , b ( j ) ; ˜ µ a j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ a j ) ,q b,j = ESN ( b j ; µ j , σ j , c j σ j ˜ ϕ j , c j τ ) L p − ( a ( j ) , b ( j ) ; ˜ µ b j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ b j ) . Denoting D = [ d e , . . . , d e p ] , we can write E [ YY ⊤ ] = µ E [ Y ] ⊤ + 1 L [ L δ E [ W ] ⊤ + ΣD ] , cov [ Y ] = (cid:2) µ − E [ Y ] (cid:3) E [ Y ] ⊤ + 1 L [ L δ E [ W ] ⊤ + ΣD ] , where W ∼ TN p ( µ − µ b , Γ , [ a , b ]) , that is a p -variate truncated normal distribution on [ a , b ] .Besides, from Corollary 1, we have that the ﬁrst two moments of Y can be also computed as E [ Y ] = E [ X ] ( p +1) , (12) E [ YY ⊤ ] = E [ XX ⊤ ] ( p +1 ,p +1) , (13)with X ∼ TN p +1 ( µ ∗ , Ω ; [ a ∗ , b ∗ ]) . Note that cov[ Y ] = E [ YY ⊤ ] − E [ Y ] E [ Y ⊤ ] . Equations (12) and (13) are moreconvenient for computing E [ Y ] and cov[ Y ] since all boils down to compute the mean and the variance-covariancematrix for a p + 1 -variate TN distribution which integrals are less complex than the ESN ones. Some approaches exists to compute the moments of a TN distribution. For instance, for doubly truncation, [18](method available through the tmvtnorm R package) computed the mean and variance of X directly deriving theMGF of the TN distribution. On the other hand, [12] (method available through the MomTrunc R package) is able tocompute arbitrary higher order TN moments using a recursive approach as a result of differentiating the multivariatenormal density. For right truncation, [19] (see Supplemental Material) proposed a method to compute the mean andvariance of X also by differentiating the MGF, but where the off-diagonal elements of the Hessian matrix are recycledin order to compute its diagonal, leading to a faster algorithm. Next, we present an extension of [19] algorithm tohandle doubly truncation. Let X ∼ T N p ( , R ; [ a , b ]) , with R being a correlation matrix of order p × p . Then, the ﬁrst two momentsof X are given by E [ X ] = ∂m ( t ) ∂ t (cid:12)(cid:12)(cid:12)(cid:12) ⊤ t = = − L Rq , E [ XX ⊤ ] = ∂ m ( t ) ∂ t ∂ t ⊤ (cid:12)(cid:12)(cid:12)(cid:12) t = = R + 1 L RHR , PREPRINT - S

EPTEMBER

29, 2020 and consequently, cov[ X ] = R + 1 L R (cid:0) L H − qq ⊤ (cid:1) R , where L ≡ L p ( a , b ; , R ) , q = q a − q b , with the i -th element of q a and q b as q a,i = φ ( a i ) L p − ( a ( i ) , b ( i ) ; a i R ( i ) ,i , ˜ R i ) and q b,i = φ ( b i ) L p − ( a ( i ) , b ( i ) ; b i R ( i ) ,i , ˜ R i ) , H being a symmetric matrix of dimension p , with off-diagonal elements h ij given by h ij = h aaij − h baij − h abij + h bbij , = φ ( a i , a j ; ρ ij ) L p − ( a ( i,j ) , b ( i,j ) ; µ aaij , ˜ R ij ) − φ ( b i , a j ; ρ ij ) L p − ( a ( i,j ) , b ( i,j ) ; µ baij , ˜ R ij ) − φ ( a i , b j ; ρ ij ) L p − ( a ( i,j ) , b ( i,j ) ; µ abij , ˜ R ij ) + φ ( b i , b j ; ρ ij ) L p − ( a ( i,j ) , b ( i,j ) ; µ bbij , ˜ R ij ) , and diagonal elements h ii = a i q a i − b i q b i − R i, ( i ) H ( i ) ,i , (14) with ˜ R i = R ( i ) , ( i ) − R ( i ) ,i R i, ( i ) , µ αβij = R ( ij ) , [ i,j ] ( α i , β j ) ⊤ and ˜ R ij = R ( i,j ) , ( i,j ) − R ( i,j ) , [ i,j ] R [ i,j ] , ( i,j ) .Proof. See Appendix A.The main difference of our proposal in Theorem 3 and other approaches deriving the MGF relies on (14), wherethe diagonal elements are recycled using the off-diagonal elements h ij , ≤ i = j ≤ p . Furthermore, for W ∼ T N p ( µ , Σ ; [˜ a , ˜ b ]) , we have that E [ W ] = µ − S E [ X ] , (15) cov[ W ] = S cov[ X ] S , (16)where Σ being a positive-deﬁnite matrix, S = diag( σ , σ , . . . , σ p ) , and truncation limits ˜ a and ˜ b such that a = S − (˜ a − µ ) and b = S − (˜ b − µ ) . Let consider Y ∼ ESN p ( µ , Σ , λ , τ ) . As τ → ∞ , we have that ξ = Φ(˜ τ ) → . Besides, as τ → −∞ , we have that ξ → and consequently F p κ ( a , b ; µ , Σ , λ , τ ) = ξ − F p +1 κ ∗ ( a ∗ , b ∗ ; µ ∗ , Ω ) → ∞ . Thus, for negative ˜ τ values smallenough, we are not able to compute E [ Y κ ] due to computation precision. For instance, in R software, Φ(˜ τ ) = 0 for ˜ τ < − . The next proposition helps us to circumvent this problem. Proposition 5. (Limiting distribution for the ESN) As τ → −∞ , ESN p ( y ; µ , Σ , λ , τ ) −→ φ p ( y ; µ − µ b , Γ ) . Proof.

Let X ∼ N (0 , . As ˜ τ → −∞ , we have that P ( X ≤ ˜ τ ) → , E [ X | X ≤ ˜ τ ] → ˜ τ and var[ X | X ≤ ˜ τ ] → (i.e., X is (i.e., X is degenerated on ˜ τ ). In light of Proposition 3, Y d = ( X | X = ˜ τ ) , and by theconditional distribution of a multivariate normal, it is straightforward to show that E [ X | X = ˜ τ ] = µ − µ b and cov[ X | X = ˜ τ ] = Γ , which concludes the proof. While using the normal relation (12) and (13), we may also face numerical problems for extreme settings of λ and τ dueto the scale matrix Ω does depend on them. Most common problem is that the normalizing constant L p ( a ∗ , b ∗ ; µ ∗ , Ω ) is approximately zero, because the probability density has been shifted far from the integration region. It is worthmentioning that, for these cases, it is not even possible to estimate the moments generating Monte Carlo (MC) samplesdue to the high rejection ratio when subsetting to a small integration region.For instance, consider a bivariate truncated normal vector X = ( X , X ) ⊤ , with X and X having zero mean andunit variance, cov( X , X ) = − . and truncation limits a = ( − , − ⊤ and b = ( − , ⊤ . Then, we have thatthe limits of X are far from the density mass since P ( − ≤ X ≤ − ≈ . For this case, both the mtmvnorm function from the tmvtnorm R package and the Matlab codes provided in [12] return wrong mean values outside thetruncation interval ( a , b ) and negative variances. Values are quite high too, with mean values greater than × PREPRINT - S

EPTEMBER

29, 2020and all the elements of the variance-covariance matrix greater than × . When changing the ﬁrst upper limit from − to − , that is b = ( − , ⊤ , both routines return Inf and

NaN values for all the elements.Although the above scenarios seem unusual, extreme situations that require correction are more common than expected.Actually, the development of this part was motivated as we identiﬁed this problem when we ﬁt censored regressionmodels, with high asymmetry and presence of outliers. Hence, we present correction method in order to approximatethe mean and the variance-covariance of a multivariate TN distribution even when the numerical precision of thesoftware is a limitation.

Dealing with out-of-bounds limits

Consider the partition X = ( X ⊤ , X ⊤ ) ⊤ such that dim ( X ) = p , dim ( X ) = p , where p + p = p . It is wellknown that E [ X ] = E (cid:20) E [ X | X ] X (cid:21) and cov[ X ] = (cid:20) E [cov[ X | X ]] + cov[ E [ X | X ]] cov[ E [ X | X ] , X ]cov[ X , E [ X | X ]] cov[ X ] (cid:21) . Now, consider X ∼ TN p (cid:0) µ , Σ , [ a , b ] (cid:1) to be partitioned as above. Also consider the corresponding partitions of µ , Σ , a = ( a ⊤ , a ⊤ ) ⊤ and b = ( b ⊤ , b ⊤ ) ⊤ . We say that the limits [ a , b ] of X are out-of-bounds if P ( a ≤ X ≤ b ) ≈ . Let us consider the case where we are not able to compute any moment of X , because there exists a partition X of X of dimension p that is out-of-bounds. Note this happens because L p ( a , b ; µ , Σ ) ≤ P ( a ≤ X ≤ b ) ≈ . Also, we consider the partition X such that P ( a ≤ X ≤ b ) > . Since the limits of X are out-of-bounds(and a < b ), we have two possible cases: b → − ∞ or a → ∞ . For convenience, let ξ = E [ X ] and Ψ = cov[ X ] . For the ﬁrst case, as b → − ∞ , we have that ξ → b and Ψ → p × p . Analogously, we havethat ξ → a and Ψ → p × p as a → ∞ .Then X ∼ TN p (cid:0) µ , Σ ; [ a , b ] (cid:1) , X ∼ N p (cid:0) ξ , (cid:1) (i.e., X is degenerated on ξ ) and X | X ∼ TN p (cid:0) µ + Σ Σ − ( ξ − µ ) , Σ − Σ Σ − Σ ; [ a , b ] (cid:1) . Given that cov[ E [ X | X ]] = p × p and cov[ E [ X | X ] , X ] = p × p , it follows that E [ X ] = (cid:20) ξ . ξ (cid:21) and cov[ X ] = (cid:20) Ψ . p × p p × p p × p (cid:21) , (17)with ξ . = E [ X | X ] and Ψ . = cov[ X | X ] being the mean and variance-covariance matrix of a TN distribution,which can be computed using (15) and (16).In the event that there are double inﬁnite limits, we can partition the vector as well, in order to avoid unnecessarycalculation of these integrals. Dealing with double inﬁnite limits

Let p be the number of pairs in [ a , b ] that are both inﬁnite. We consider the partition X = ( X ⊤ , X ⊤ ) ⊤ , such that theupper and lower truncation limits associated with X are both inﬁnite, but at least one of the truncation limits associatedwith X is ﬁnite. Since a = − ∞ and b = ∞ , it follows that X ∼ N p (cid:0) µ , Σ (cid:1) , X ∼ TN p (cid:0) µ , Σ , [ a , b ] (cid:1) and X | X ∼ N p (cid:0) µ + Σ Σ − ( X − µ ) , Σ − Σ Σ − Σ (cid:1) . This leads to E [ X ] = E (cid:20) µ + Σ Σ − ( X − µ ) X (cid:21) = (cid:20) µ + Σ Σ − ( ξ − µ ) ξ (cid:21) , (18)and cov[ X ] = (cid:20) Σ − Σ Σ − (cid:0) I p − Ψ Σ − (cid:1) Σ Σ Σ − Ψ Ψ Σ − Σ Ψ (cid:21) , (19)with ξ and Ψ being the mean vector and variance-covariance matrix of a TN distribution, which can be computedusing (15) and (16) as well.As can be seen, we can use equations (18) and (19) to deal with double inﬁnite limits, where the truncated momentsare computed only over a p -variate partition, avoiding some unnecessary integrals and saving some computationaleffort. On the other hand, expression (17) let us to approximate the mean and the variance-covariance matrix for caseswhere the computational precision is a limitation. 9 PREPRINT - S

EPTEMBER

29, 2020

Since this is the ﬁrst attempt to compute the moments of a TESN, it is not possible to compare our approach withothers methods already implemented in statistical softwares, for instance, R or Stata. However, this section intends tocompare three possible approaches to compute the mean vector and variance-covariance matrix of a p -variate TESNdistribution based on our results. We consider our ﬁrst proposal derived from Theorem 1 which is derived directlyfrom the ESN pdf, as well as the normal relation given in Theorem 2. For the latter, we use different (some existent)methods for computing the mean and variance-covariance of a TN distribution. The methods that we compare are thefollowing: Proposal 1:

Theorem 1, i.e., equations (11), and (13),

Proposal 2:

Normal relation (NR) in Theorem 2 using Theorem 3,

Proposal 3:

NR in Theorem 2 using the

Matlab routine from [12],

Proposal 4:

NR in Theorem 2 using the tmvtnorm R function from [18].Left panel of Figure 1 shows the number of integrals required to achieve this for different dimensions p . We comparethe proposal 1 for a p -variate TESN distribution and the equivalent p +1 -variate normal approaches K&R and proposal2. p ab s o l u t e t i m e ( s e c ) Proposal 1Proposal 2

Proposal 4Proposal 3Proposal 2

Proposal 3

Figure 1: Number of integrals required and absolute processing time (in seconds) for computing the mean vector andvariance-covariance matrix for a p -variate TESN distribution, for 3 different approaches under double truncation.It is clear that the importance of the new proposed method since it reduces the number of integral involved almost tohalf, this compared to the TESN direct results from proposal 1, when we consider the double truncation. In particular,for left/right truncation, we have that the equivalent p + 1 -variate normal approach along with [19] (now, a specialcase of proposal 2) requires up to 4 times less integrals than when we use the proposal 3. As seen before, the normalrelation proposal 2 outperforms the proposal 1, that is, the equivalent normal approach always resulted faster even itconsiders one more dimension, that is a p + 1 -variate normal vector, due to its integrals are less complex than for theESN case.Processing time when using the equivalent normal approach are depicted in the right panel of Figure 1. Here, wecompare the absolute processing time of the mean and variance-covariance of a TN distribution under the methodsin proposal 2, 3 and 4, for different dimensions p . In general, our proposal is the fastest one, as expected. Proposal3 resulted better only for p ≤ , which conﬁrms the necessity for a faster algorithm, in order to deal with highdimensional problems. Proposal 4 resulted to be the slowest one by far. Computational time in real life:

For applications where a unique truncated expectation is required (for example,conditional tail expectations as a measure of risk in Finance), the computation cost may seem insigniﬁcant, however,iterative algorithms depending on these quantities become computationally intensive. For instance, in longitudinalcensored models under a frequentist point of view, an EM algorithm reduces to the computation of the moments ofmultivariate truncated moments (Lachos et al., 2017) at each iteration, and for all censored observations along subjects.See that, 125K integrals will be required for an algorithm that converges in 250 iterations and a modest dataset with100 subjects and only four censored observations. Other models as geostatistical models are even more demanding, sosmall differences in times may be signiﬁcant between a tractable and non-tractable problem, even that without theseexpectations, these must be approximated invoking Monte Carlo methods.10

PREPRINT - S

EPTEMBER

29, 2020

First, we established some general results for the pdf, cdf and moments of multivariate folded distributions (MFD).These extend the results found in [20] for a FN distribution to any multivariate distribution, as well as the multivariatelocation-scale family. The proofs are given in Appendix A.

Theorem 4 ( pdf and cdf of a MFD ) . Let X ∈ R p be a p -variate random vector with pdf f X ( x ; θ ) and cdf F X ( x ; θ ) ,with θ being a set of parameters characterizing such distribution. If Y = | X | , then the joint pdf and cdf of Y thatfollows a folded distribution of X are given, respectively, by f Y ( y ) = X s ∈ S ( p ) f X ( Λ s y ; θ ) and F Y ( y ) = X s ∈ S ( p ) π s F X ( Λ s y ; θ ) , for y ≥ , where S ( p ) = {− , } p is a cartesian product with p elements, each of the form s = ( s , . . . , s p ) , Λ s = Diag ( s ) and π s = Q pi =1 s i . Corollary 2. If X ∼ f X ( x ; ξ , Ψ ) belongs to the location-scale family of distributions, with location and scale pa-rameters ξ and Ψ respectively, then Z s = Λ s X ∼ f X ( z ; Λ s ξ , Λ s ΨΛ s ) and consequently the joint pdf and cdf of Y = | X | are given by f Y ( y ) = X s ∈ S ( p ) f X ( y ; Λ s ξ , Λ s ΨΛ s ) and F Y ( y ) = X s ∈ S ( p ) π s F X ( Λ s y ; ξ , Ψ ) , for y ≥ . Hence, the κ -th moment of Y follows as E [ Y κ ] = X s ∈ S ( p ) E [( Z κ s ) + ] , where X + denotes the positive component of the random vector X . Let X ∼ ESN p ( µ , Σ , λ , τ ) , we now turn our attention to discuss the computation of any arbitrary order moment of | X | , a FESN distribution. Let deﬁne the I p κ ≡ I p κ ( µ , Σ , λ , τ ) function as I p κ ( µ , Σ , λ , τ ) = Z ∞ y κ ESN p ( y ; µ , Σ , λ , τ )d y . Note that I p κ is a special case of F p κ that occurs when a i = 0 and b i = + ∞ , i = 1 , . . . , p . In this scenario we have I p κ ( µ , Σ , λ , τ ) = F p κ ( , + ∞ ; µ , Σ , λ , τ ) . When λ = and τ = 0 , that is, the normal case we write I p κ ( µ , Σ , ,

0) = I p κ ( µ , Σ ) . Proposition 6. If X ∼ ESN p ( µ , Σ , λ , τ ) , then Z s = Λ s X ∼ ESN p ( µ s , Σ s , λ s , τ ) and consequently the joint pdf , cdf and the κ th raw moment of Y = | X | are, respectively, given by f Y ( y ) = X s ∈ S ( p ) ESN p ( y p ; µ s , Σ s , λ s , τ ) ,F Y ( y ) = L p ( − y , y ; µ , Σ , λ , τ ) , and E [ Y κ ] = X s ∈ S ( p ) I p κ ( µ s , Σ s , λ s , τ ) , where y s = Λ s y , µ s = Λ s µ , Σ s = Λ s ΣΛ s and λ s = Λ s λ .Proof. Note that is sufﬁces to show that, if X ∼ ESN p ( µ , Σ , λ , τ ) , then Z s = Λ s X ∼ ESN p ( µ s , Σ s , λ s , τ ) ,since the rest of the corollary is straightforward. We have that ESN p ( x ; µ s , Σ s , λ s , τ ) = ξ − φ p ( x ; Λ s µ , Λ s ΣΛ s ) × Φ (cid:0) τ + ( Λ s λ ) ⊤ ( Λ s ΣΛ s ) − / ( x − Λ s µ ) (cid:1) = ξ − | Λ s Λ s | / φ p ( Λ − s x ; µ , Σ ) × Φ (cid:0) τ + λ ⊤ Λ s ( Λ s ΣΛ s ) − / Λ s ( Λ − s x − µ ) (cid:1) = ξ − φ p ( Λ s x ; µ , Σ ) × Φ (cid:0) τ + λ ⊤ Λ s ( Λ s ΣΛ s ) − / Λ s ( Λ s x − µ ) (cid:1) (20) ? = ξ − φ p ( Λ s x ; µ , Σ ) × Φ (cid:0) τ + λ ⊤ Σ − / ( Λ s x − µ ) (cid:1) (21) = ESN p ( Λ s x ; µ , Σ , λ , τ ) , PREPRINT - S

EPTEMBER

29, 2020 where ξ − = Φ (cid:0) τ / p λ ⊤ s λ s (cid:1) due to λ ⊤ s λ s = λ ⊤ λ .In order to equalize (20) and (21), we see that it sufﬁces to show that Σ − / = Λ s ( Λ s ΣΛ s ) − / Λ s . This is equivalent to showthat A = B for A = ( Λ s ΣΛ s ) / and B = Λ s Σ / Λ s . We have that both matrices A and B are positive-deﬁnite matrices since ( Λ s ΣΛ s ) / and Σ / are too, as a consequence that they are obtained using Singular Value Decomposition (SVD). Finally, giventhat A = B = Λ s ΣΛ s and any positive-deﬁnite matrix has an unique positive-deﬁnite square root, we conclude that A = B by uniqueness, which concludes the proof. Remark 1.

As a consequence of Proposition 6, we also have the new vectors δ s = Λ s δ , µ bs = Λ s µ b , ϕ s = Λ s ϕ , ˜ ϕ s = Λ s ˜ ϕ , ˜ µ a js = Λ s ( j ) ˜ µ a j and ˜ µ b js = Λ s ( j ) ˜ µ b j , and matrix Γ s = Λ s ΓΛ s , while the constants ξ , η , c j , ˜ Σ j , and ˜ τ j remain invariant with respect to s . From Proposition 6, we can compute any arbitrary moment of a FESN distribution as a sum of I p κ integrals. In lightof Theorem 1, the recurrence relation for I p κ can be written as I p κ + e i ( µ , Σ , λ , τ ) = µ i I p κ ( µ , Σ , λ , τ ) + δ i I p κ ( µ − µ b , Γ ) + p X j =1 σ ij d κ,j , i = 1 , . . . , p, (22)where d κ,j = ( k j I p κ − e i ( µ , Σ , λ , τ ) ; for k j > ESN (0 | µ j , σ j , c j σ j ˜ ϕ j , c j τ ) I p − κ ( j ) ( ˜ µ j , ˜ Σ j , ˜ Σ / j ϕ ( j ) , ˜ τ j ) ; for k j = 0 with ˜ µ j = µ ( j ) − µ j σ j Σ ( j ) j and ˜ τ j = τ − ˜ ϕ j µ j .It is also possible to use the normal relation in Theorem 2 to compute E [ | X | κ ] in a simpler manner as in next proposi-tion. Proposition 7.

Let Y = | X | , with X ∼ ESN p ( µ , Σ , λ , τ ) . In light of Theorem 4, It follows that E [ Y κ ] = ξ − X s ∈ S ( p ) I p +1 κ ∗ ( µ ∗ s , Ω − s ) , where I p κ ( µ , Σ ) ≡ F p κ ( , ∞ ; µ , Σ ) , µ ∗ s = ( µ ⊤ s , ˜ τ ) ⊤ and Ω s = (cid:18) Σ s − ∆ s − ∆ ⊤ s (cid:19) , with µ s = Λ s µ , Σ s = Λ s ΣΛ s , ∆ s = Λ s ∆ and Ω − s standing for the block matrix Ω s with all its off-diagonal block elements signs changed. Proof is direct from Theorem 2 as I p κ is a special case of F p κ . From Proposition 2, we have that the mean and variance-covariance matrix can be calculated as a sum of p terms as well, that is E [ Y ] = X s ∈ S ( p ) E [ Z + s ] , (23) cov[ Y ] = X s ∈ S ( p ) E (cid:2) Z + s Z + s ⊤ (cid:3) − E [ Y ] E [ Y ] ⊤ , (24)where Z + s is the positive component of Z s = Λ s X ∼ ESN p ( µ s , Σ s , λ s , τ ) . Note that there are p times more integralsto be calculated as compared to the non-folded case, representing a huge computational effort for high dimensionalproblems.In order to circumvent this, we can use the fact that E [ Y ] = ( E [ Y ] , . . . , E [ Y p ]) ⊤ and the elements of E [ YY ⊤ ] aregiven by the second moments E [ Y i ] and E [ Y i Y j ] , ≤ i = j ≤ p . Thus, it is possible to calculate explicit expressionsfor the mean vector and variance-covariance matrix of the FESN only based on the marginal univariate means andvariances of Y i , as well as the covariance terms cov( Y i , Y j ) .Next, we circumvent this situation by propose explicit expressions for the mean and the variance-covariance of themultivariate FESN distribution. Let X ∼ ESN p ( µ , Σ , λ , τ ) . To obtain the mean and covariance matrix of | X | boils down to compute E [ | X i | ] , E [ | X i | ] and E [ | X i X j | ] . Consider X i to be the i -th marginal partition of X distributed as X i ∼ ESN ( µ i , σ i , λ i , τ i ) . In light ofProposition 6 it follows that E [ | X i | k ] = I k ( µ i , σ i , λ i , τ i ) + I k ( − µ i , σ i , − λ i , τ i ) . PREPRINT - S

EPTEMBER

29, 2020Thus, using the recurrence relation on I k in (22), and following the notation in Subsection 3.1.1, we can write explicitexpressions for E [ | X i | ] and E [ | X i | ] . High order moments for the univariate FESN and others related distributions aredetailed in Appendix B.It remains to obtain E [ | X i X j | ] for i = j , which can be obtained as E [ | X i X j | ] = I , ( µ i , µ j , σ i , σ ij , σ j , λ i , λ j , τ ) + I , ( µ i , − µ j , σ i , − σ ij , σ j , λ i , − λ j , τ )+ I , ( − µ i , µ j , σ i , − σ ij , σ j , − λ i , λ j , τ ) + I , ( − µ i , − µ j , σ i , σ ij , σ j , − λ i , − λ j , τ ) , (25)as pointed in Proposition 6, with ( X i , X j ) denoting an arbitrary bivariate partition of X . Without loss of generality,let’s consider the partition ( X , X ) ∼ ESN ( µ , Σ , λ , τ ) and ( W , W ) ∼ N ( m , Γ ) with m = µ − µ b . Forsimplicity, we denote I , ≡ I , ( µ , Σ , λ , τ ) , and the normalizing constants L ≡ L ( , ∞ ; µ , Σ , λ , τ ) and L ≡ L ( , ∞ ; µ − µ b , Γ ) .Using the recurrence relation on I κ + e i in (22), we can obtain I , for κ = (1 , ⊤ and e = (0 , ⊤ as I , =( µ µ + σ ) L + ( δ µ + δ ( µ − µ b )) L + ( µ σ + σ ) ˜ φ (1) (1 − ˜Φ (2 . ) + µ σ ˜ φ (2) (1 − ˜Φ (1 . )+ δ (cid:2) γ φ ( µ ; µ b , γ )(1 − Φ(0; m . , γ . )) + γ φ ( µ ; µ b , γ )(1 − Φ(0; m . , γ . ))) (cid:3) + σ ˜ φ (2) I ( µ . , σ . , σ . ϕ , τ . ) , where m . = m − γ m /γ , m . = m − γ m /γ , γ . = γ − γ /γ , γ . = γ − γ /γ , and in lightof Proposition 2 we have that ˜Φ (2 . ≡ ˜Φ (0; µ . , σ . , σ . ϕ , τ . ) , ˜Φ (1 . ≡ ˜Φ (0; µ . , σ . , σ . ϕ , τ . ) , and ˜ φ ( ℓ ) ≡ ESN (0; µ ℓ , σ ℓ , c ℓ σ ℓ ˜ ϕ ℓ , c ℓ τ ) for ℓ = { , } .Using Remark 1 along with (25), we ﬁnally obtain an explicit expression for E [ | X i X j | ] as E [ | X i X j | ] =( µ i µ j + σ ij )(1 −

2( ˜Φ ( i ) + ˜Φ ( j ) )) + ( δ i µ j + δ j ( µ i − µ bi )) (1 − ( i ) + Φ ( j ) ))+ 2 µ j h σ i ˜ φ ( i ) (1 − ( i ) ) + σ ij ˜ φ ( j ) (1 − ( j ) ) i + 2 δ j (cid:2) γ i φ ( µ i ; µ bi , γ i )(1 − m j.i , γ j.i )) + γ ij φ ( µ j ; µ bj , γ j )(1 − m i.j , γ i.j )) (cid:3) + 2 σ j ˜ φ ( j ) E [ | Y i.j | ] , with X i.j ∼ ESN i ( µ i.j , σ i.j , σ i.j ϕ i , τ i.j ) . Furthermore, ˜Φ (1) ≡ ˜Φ ( ; ( − µ i , µ j ) ⊤ , Σ − , ( − λ i , λ j ) ⊤ , τ ) , ˜Φ (2) ≡ ˜Φ ( ; ( µ i , − µ j ) ⊤ , Σ − , ( λ i , − λ j ) ⊤ , τ ) , Φ (1) ≡ Φ ( ; ( − m i , m j ) ⊤ , Γ − ) and Φ (2) ≡ Φ ( ; ( m i , − m j ) ⊤ , Γ − ) , with Σ − ( Γ − ) denoting the Σ = [ σ ij ] ( Γ = [ γ ij ] ) matrix with all its signs of covariances (off-diagonal elements)changed. Here, we have simpliﬁed using the equivalences L p ( , ∞ ; µ , Σ , λ s , τ ) = ˜Φ p ( ; − µ s , Σ s , − λ s , τ ) , for s ∈ S ( p ) ESN p ( ; µ q , Σ q , λ q , τ ) = ESN p ( ; µ r , Σ r , λ r , τ ) , for q , r ∈ S ( p ) P ( Y Y · · · Y p >

0) = X s ∈ S ( p ) π s L p ( , ∞ ; µ s , Σ s , λ s , τ ) , with π s = Q pi =1 s i as in Theorem 4 and P s ∈ S ( p ) L p ( , ∞ ; µ s , Σ s , λ s , τ ) = 1 . It is worth mentioning that theseexpressions hold for the normal case, when λ = and τ = 0 .As expected, this approach is much faster than the one using equations (23) and (24). For instance, when we considera trivariate folded ESN distribution, we have that it is approximately 56x times faster than using MC methods and10x times faster than using equations (23) and (24). Time comparison (summarized in the Figure in the Supplementarmaterial, right panel) as well as sample codes of our MomTrunc R package are provided in the Appendices C and D,respectively.

In this paper, we have developed a recurrence approach for computing order product moments of TESN and FESNdistributions as well as explicit expressions for the ﬁrst two moments as a byproduct, generalizing results obtained by[12] for the normal case. The proposed methods also includes the moments of the well-known truncated multivariate13

PREPRINT - S

EPTEMBER

29, 2020SN distribution, introduced by [1]. For the TESN, we have proposed an optimized robust algorithm based only innormal integrals, which for the limiting normal case outperforms the existing popular method for computing the ﬁrsttwo moments, even computing these two moments for extreme cases where all available algorithms fail. The proposedmethod (including its limiting and special cases) has been coded and implemented in the

R MomTrunc package, whichis available for the users on CRAN repository.During the last decade or so, censored modeling approaches have been used in various ways to accommodate in-creasingly complicated applications. Many of these extensions involve using Normal ([19]) and Student-t ([21, 22]),however statistical models based on distributions to accommodate censored and skewness, simultaneously, so far haveremained relatively unexplored in the statistical literature. We hope that by making the codes available to the commu-nity, we will encourage researchers of different ﬁelds to use our newly methods. For instance, now it is possible toderive analytical expressions on the E-step of the EM algorithm for multivariate SN responses with censored observa-tion asblur in [21].Finally, we anticipate in a near future to extend these results to the extended skew-t distribution ([23]). We conjecturethat our method can be extended to the context of the family of other scale mixtures of skew-normal distributions([24]). An in-depth investigation of such extension is beyond the scope of the present paper, but it is an interestingtopic for further research.

SUPPLEMENTARY MATERIAL

The Supplementary Materials, which is available upon request, contains the following two ﬁles: A Proofs of propositions and theorems; B Explicit expressions for moments of some folded univariate distributions; C Figures; D The

R MomTrunc package.

References [1] A. Azzalini and A. Dalla-Valle. The multivariate skew-normal distribution.

Biometrika , 83(4):715–726, 1996.[2] G. M. Tallis. The moment generating function of the truncated multi-normal distribution.

Journal of the RoyalStatistical Society. Series B (Statistical Methodology) , 23(1):223–229, 1961.[3] Da-Hsiang Donald Lien. Moments of truncated bivariate log-normal distributions.

Economics Letters , 19(3):243–247, 1985.[4] HJ Houthakker.

The scope and limits of futures trading . Cowles Foundation for Research in Economics at YaleUniversity, 1959.[5] James W Jawitz. Moments of truncated continuous univariate distributions.

Advances in Water Resources ,27(3):269–281, 2004.[6] H. M. Kim. A note on scale mixtures of skew normal distribution.

Statistics and Probability Letters , 78, 2008.1694-1701.[7] Cedric Flecher, Denis Allard, and Philippe Naveau. Truncated skew-normal distributions: moments, estimationby weighted moments and application to climatic data.

Metron , 68:331–345, 2010.[8] A. Azzalini. A class of distributions which includes the normal ones.

Scandinavian Journal of Statistics , 12:171–178, 1985.[9] Ali ˙I Genç. Moments of truncated normal/independent distributions.

Statistical Papers , 54:741–764, 2013.[10] H. J. Ho, T. I. Lin, H. Y. Chen, and W. L. Wang. Some results on the truncated multivariate t distribution.

Journalof Statistical Planning and Inference , 142:25–40, 2012.[11] Juan Carlos Arismendi. Multivariate truncated moments.

Journal of Multivariate Analysis , 117:41–75, 2013.[12] Raymond Kan and Cesare Robotti. On moments of folded and truncated multivariate normal distributions.

Jour-nal of Computational and Graphical Statistics , 25(1):930–934, 2017.[13] Arellano-Valle, R. B. and M. G. Genton. On fundamental skew distributions.

Journal of Multivariate Analysis ,96, 93–116, 2005. 14

PREPRINT - S

EPTEMBER

29, 2020[14] A. Azzalini and A. Capitanio. Statistical applications of the multivariate skew-normal distribution.

Journal ofthe Royal Statistical Society , 61:579–602, 1999.[15] R. B. Arellano-Valle and A. Azzalini. On the uniﬁcation of families of skew-normal distributions.

ScandinavianJournal of Statistics , 33(3):561–574, 2006.[16] Reinaldo B Arellano-Valle and Marc G Genton. Multivariate extended skew-t distributions and related families.

Metron , 68(3):201–234, 2010.[17] Alan Genz. Numerical computation of multivariate normal probabilities.

Journal of Computational and Graphi-cal Statistics , 1(2):141–149, 1992.[18] B.G. Manjunath and S. Wilhelm. Moments calculation for the double truncated multivariate normal density.

Available at SSRN 1472153 , 2009.[19] F. Vaida and L. Liu. Fast implementation for normal mixed effects models with censored response.

Journal ofComputational and Graphical Statistics , 18:797–817, 2009.[20] Ashis Kumar Chakraborty and Moutushi Chatterjee. On multivariate folded normal distribution.

Sankhya B ,75(1):1–15, 2013.[21] L. A. Matos, M. O. Prates, M. H. Chen, and V. H. Lachos. Likelihood-based inference for mixed-effects modelswith censored response using the multivariate-t distribution.

Statistica Sinica , 23:1323–1342, 2013.[22] Victor H Lachos, Edgar J López Moreno, Kun Chen, and Celso Rômulo Barbosa Cabral. Finite mixture modelingof censored data using the multivariate student-t distribution.

Journal of Multivariate Analysis , 159, 2017. 151-167.[23] A. Azzalini and A. Capitanio. Distributions generated and perturbation of symmetry with emphasis on themultivariate skew-t distribution.

Journal of the Royal Statistical Society, Series B , 61:367–389, 2003.[24] M. D. Branco and D. K. Dey. A general class of multivariate skew-elliptical distributions.