[PDF] EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions

Abstract

This paper presents an R package EMMIXcskew for the fitting of the canonical fundamental skew t-distribution (CFUST) and finite mixtures of this distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution provides a flexible family of models to handle non-normal data, with parameters for capturing skewness and heavy-tails in the data. It formally encompasses the normal, t, and skew-normal distributions as special and/or limiting cases. A few other versions of the skew t-distributions are also nested within the CFUST distribution. In this paper, an Expectation-Maximization (EM) algorithm is described for computing the ML estimates of the parameters of the FM-CFUST model, and different strategies for initializing the algorithm are discussed and illustrated. The methodology is implemented in the EMMIXcskew package, and examples are presented using two real datasets. The EMMIXcskew package contains functions to fit the FM-CFUST model, including procedures for generating different initial values. Additional features include random sample generation and contour visualization in 2D and 3D.

Full PDF

JJSS

EMMIXcskew : an R Package for the Fitting of aMixture of Canonical Fundamental Skew t -Distributions Sharon X. Lee

University of Queensland

Geoﬀrey J. McLachlan

University of Queensland

Abstract

This paper presents an R package

EMMIXcskew for the ﬁtting of the canonical funda-mental skew t -distribution (CFUST) and ﬁnite mixtures of this distribution (FM-CFUST)via maximum likelihood (ML). The CFUST distribution provides a ﬂexible family of mod-els to handle non-normal data, with parameters for capturing skewness and heavy-tails inthe data. It formally encompasses the normal, t , and skew-normal distributions as specialand/or limiting cases. A few other versions of the skew t -distributions are also nestedwithin the CFUST distribution.In this paper, an Expectation-Maximization (EM) algorithm is described for comput-ing the ML estimates of the parameters of the FM-CFUST model, and diﬀerent strategiesfor initializing the algorithm are discussed and illustrated. The methodology is imple-mented in the EMMIXcskew package, and examples are presented using two real datasets.The

EMMIXcskew package contains functions to ﬁt the FM-CFUST model, includingprocedures for generating diﬀerent initial values. Additional features include randomsample generation and contour visualization in 2D and 3D.

Keywords : mixture models, fundamental skew distributions, skew normal distribution, skew t -distribution, EM algorithm, R .

1. Introduction

Finite mixture models, in particular normal mixture models, have been widely used in statis-tics and a diverse range of applied ﬁelds such as bioinformatics, biomedicine, economics,ﬁnance, genetics, image analysis, psychometrics, and social science. They provide a power-ful and ﬂexible tool for the probabilistic modelling of data, with applications ranging fromdensity estimation to clustering, classiﬁcation, and discriminant analysis. For a survey on mix-ture models and their applications, see Everitt and Hand (1981), Titterington et al. (1985),McLachlan and Basford (1988), Lindsay (1995), B¨ohning (2000), McLachlan and Peel (2000),and Fr¨uhwirth-Schnatter (2006), the edited volume of Mengersen et al. (2011), and also thepapers by Banﬁeld and Raftery (1993) and Fraley and Raftery (1998). a r X i v : . [ s t a t . C O ] F e b EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions In recent years, mixture models with skew component distributions have received increasingattention. These models adopt densities that can take more ﬂexible distributional shapes thanthe traditional normal and t -distributions as component distributions, rendering them suitablefor a wider range of applications. Of these, the skew t -distribution is gaining popularity dueto its ability to handle both the asymmetry and heavy-tailedness in the data. In particular,a number of diﬀerent formulations of skew t -distribution have been used extensively in themodel-based clustering literature (see, for example, Lee and McLachlan (2014a, 2016a), Azza-lini et al. (2016), McLachlan and Lee (2016) and the references therein). They have also foundmany applications in a range of ﬁelds, including astrophysics (Riggi and Ingrassia 2013), ﬁnan-cial risk analysis and modelling (Soltyk and Gupta 2011, Bernardi 2013, Lee and McLachlan2013b, Abanto-Valle et al. et al. et al. et al. et al. et al. et al. et al. et al. et al. et al. et al. (2006), the book edited by Genton (2004), and the recent monograph by Azzalini andCapitanio (2014).Recently, Lee and McLachlan (2016a) introduced a ﬁnite mixture of canonical fundamentalskew t (FM-CFUST) distribution. This formulation of the skew t -distribution has a gen-eral p × q matrix of skewness of parameters (Arellano-Valle and Genton 2005). It thus pro-vides a more general characterisation than the restricted and unrestricted skew t -distributions(adopting the terminology of Lee and McLachlan (2013c)). This paper describes an R pack-age EMMIXcskew for the ﬁtting of the FM-CFUST model. It implements the EM algorithmdescribed in Lee and McLachlan (2016a) and provides other functionalities such as randomsample generation, density evaluation, and the plotting of contours in 2D and 3D.The remainder of this paper is organized as follows. Section 2 provides a brief description ofthe CFUST distribution and its nested models. Section 3 outlines an EM algorithm for ﬁttingﬁnite mixtures of CFUST distributions and examines diﬀerent approaches for generatingstarting values for this EM algorithm. In the next two sections, the usage of the

EMMIXcskew package is illustrated using real and simulated examples. Finally, we conclude with some briefremarks in Section 6.

2. The CFUST and related distributions

To establish notation, let Y be p -dimensional random vector that follows a multivariateCFUST distribution, denoted by Y ∼ CF U ST p,q ( µ , Σ , ∆ , ν ). Then the density of Y is givenby f ( y ; µ , Σ , ∆ , ν ) = 2 q t p ( y ; µ , Ω , ν ) T q (cid:18) c ( y ) r ν + pν + d ( y ) ; , Λ , ν + p (cid:19) , (1) haron Lee & Geoﬀrey McLachlan Ω = Σ + ∆∆ > , c ( y ) = ∆ > Ω − ( y − µ ) , Λ = I q − ∆ > Ω − ∆ ,d ( y ) = ( y − µ ) > Ω − ( y − µ ) . It can observed from (1) above that the CFUST distribution is described by the parameters( µ , Σ , ∆ , ν ), where µ is a p -dimensional vector of location parameters, Σ is a positive deﬁnitescale matrix, ∆ is a p × q matrix of skewness parameters, and ν is a scalar degrees of freedomthat regulate the tails of the distribution. In the above, we let t p ( y ; µ , Ω , ν ) denote the p -dimensional t -distribution with location parameter µ , scale matrix Ω , and degrees of freedom ν , and T q ( . ) is the q -dimensional (cumulative) t -distribution function.As mentioned previously, the CFUST distribution includes some commonly used distributionsas special and/or limiting cases. Taking ∆ = reduces (1) to the symmetric multivariate t -density t p ( µ , Ω , ν ), and further letting ν → ∞ and ν = 1 leads to the multivariate normal N p ( µ , Ω ) and Cauchy C p ( µ , Ω ) distributions, respectively. If ∆ is constrained to be a diagonalmatrix, we obtain the skew t -distribution of Sahu et al. (2003) which is referred to as theunrestricted skew t -distribution using the terminology in Lee and McLachlan (2014a, 2013c).To obtain the classical skew t -distribution by Azzalini and Capitanio (2003) from (1), one canset q = 1 or take ∆ to be a matrix of zeros except for one column (Lee and McLachlan 2016a).This formulation of the skew t -distribution, referred to as the restricted skew t -distribution,is equivalent to that given by Branco and Dey (2001), Gupta (2003), Lachos et al. (2010)and Pyne et al. (2009); see Lee and McLachlan (2013c). Analogously, the restricted andunrestricted skew normal distributions can be obtained by placing appropriate constraintson ∆ and letting ν → ∞ . Some properties of the CFUST distribution are described inArellano-Valle and Genton (2005). It is of interest to note that this distribution suﬀers anidentiﬁability issue as discussed in Lee and McLachlan (2016). In brief, this means that theCFUST density is invariant under permutations of the columns of the skewness matrix ∆ ,but this does not aﬀect parameter estimation. Hence, in practice, the user only needs to beaware that changing the order of the columns of ∆ does not aﬀect the density of the CFUSTmodel.There are several R packages on CRAN that deal with (multivariate) mixture models withskewed component densities. In particular, the restricted and unrestricted versions of skew t -mixture models are implemented in EMMIXskew (Wang et al.

EMMIXuskew (Lee and McLachlan 2013d), respectively. The package mixsmsn (Prates et al. t -distribution that is equivalent to the re-stricted skew normal distribution and restricted skew t -distribution, respectively. However,the estimation procedure used in mixsmsn imposes the condition that all components of theskew t -mixture model share a common value for the degrees of freedom. A recently developedpackage MixGHD (Tortora et al. t -distributions,the sn package (Azzalini 2014) can be used. For traditional normal mixture model and relatedtools, a number of other packages are available on CRAN, such as bgmm (Biecek et al. ﬂexmix (Leisch 2004, Gr¨un and Leisch 2008), mclust (Fraley and Raftery 2007, Scrucca et al. mixtools (Benaglia et al. EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions

3. Fitting mixtures of CFUST distributions via the EM algorithm

The density of a ﬁnite mixture model is given by a convex linear combination of componentdensities. More formally, adopting the CFUST distribution as component densities, we obtaina ﬁnite mixture of CFUST (FM-CFUST) distribution with density given by f ( y ; Ψ ) = g X h =1 π h f ( y ; µ h , Σ h , ∆ h , ν h ) , (2)where π h ( h = 1 , . . . , g ) are the mixing proportions and f ( . ) denotes the CFUST den-sity given by (1). The mixing proportions satisfy π h ≥ P gh =1 π h = 1. The vector Ψ = ( π , . . . , π g − , θ T , . . . , θ Tg ) contains all the unknown parameters of the model, with θ h containing the elements of µ h and δ h , the distinct elements of Σ h , and ν h .For the ﬁtting of the FM-CFUST model, we employ the EM algorithm (Dempster et al. Z j (corresponding to the j = 1 , . . . , n independentobservations on Y ), alongside two random variables U j and W j that follow a half-normaldistribution and a gamma distribution, respectively. Thus, the complete-data for the FM-CFUST model consist of these missing variables and the observations y j . This leads to afour-level hierarchical characterization of the FM-CFUST model, given by Y j | u j , w j , z hj = 1 ∼ N p (cid:18) µ + ∆ h u j , w j Σ h (cid:19) , U j | w j , z hj = 1 ∼ HN q (cid:18) , w j I q (cid:19) ,W j | z hj = 1 ∼ gamma (cid:16) ν h , ν h (cid:17) , Z j ∼ Mult g (1 , π ) , (3)where Z j is a g -dimensional vector of binary component labels such that Z hj = ( Z j ) h = 1if the j th observation belongs to the h th component and zero otherwise. Here, HN q ( , Σ )denotes the q -dimensional half-normal distribution with scale matrix Σ , gamma( α, β ) denotesthe gamma distribution with mean αβ , and Mult g (1 , π ) denotes the multinomial distributionwith one draw and g categories with probabilities speciﬁed by π . We now outline the E- andM-steps of the EM algorithm for ﬁtting the FM-CFUST model. The E-step of the EM algorithm requires the calculation of the so-called Q -function, Q ( Ψ ; Ψ ( k ) ),which is the conditional expectation of the complete-data log likelihood given the observeddata y , using the current estimate of Ψ , which is denoted by θ ( k ) after the k th iteration.It follows that on the ( k + 1)th iteration, the E-step requires the following ﬁve conditional haron Lee & Geoﬀrey McLachlan z ( k ) hj = E Ψ ( k ) (cid:2) z hj = 1 | y j (cid:3) , (4) w ( k ) hj = E Ψ ( k ) (cid:2) w hj | y j , z hj = 1 (cid:3) , (5) e ( k )1 hj = E Ψ ( k ) (cid:2) log( w hj ) | y j , z hj = 1 (cid:3) , (6) e ( k )2 hj = E Ψ ( k ) (cid:2) w hj u hj | y j , z hj = 1 (cid:3) , (7) e ( k )3 hj = E Ψ ( k ) h w hj u hj u > hj | y j , z hj = 1 i . (8)The expressions for (4) to (8) are given in Lee and McLachlan (2016a) and therefore notrepeated here. However, it should be noted that e ( k )1 hj can be evaluated using diﬀerent ap-proaches, two of which are described in the above reference. For simplicity, the EMMIXcskew package implements the one-step-late (OSL) approach for this conditional expectation. Itshould be noted that the use of the approximate OSL approach to calculate e ( k )1 hj can result inthe incomplete-data likelihood not increasing monotonically. This conditional expectation canbe calculated more accurately by a power series derived in Lee and McLachlan (2014b, 2016a)for which monotonicity of the likelihood is preserved. An implementation of this option willbe provided in a future update of this package. On the ( k + 1)th iteration of the the M-step, the current estimate of Ψ , Ψ ( k ) , is updated to Ψ ( k +1) , which is chosen to globally maximize Q ( Ψ ; Ψ ( k ) ) over Ψ . For the FM-CFUST model,the M-step leads to the following updates: π ( k +1) h = 1 n n X j =1 z ( k ) hj , µ ( k +1) h = P nj =1 z hj w ( k ) hj y j − ∆ ( k ) h P nj =1 z ( k ) hj e ( k )2 hj P nj =1 n z ( k ) hj w ( k ) hj , ∆ ( k +1) h =  n X j =1 z ( k ) hj (cid:16) y j − µ ( k +1) h (cid:17) e ( k ) > hj   n X j =1 z ( k ) hj e ( k )3 hj  − , (9) Σ ( k +1) h =  n X j =1 z ( k ) hj (cid:20) w ( k ) hj (cid:16) y j − µ ( k +1) h (cid:17) (cid:16) y j − µ ( k +1) h (cid:17) T − ∆ ( k +1) h e ( k )2 hj (cid:16) y j − µ ( k +1) h (cid:17) > − (cid:16) y j − µ ( k +1) h (cid:17) e ( k ) > hj ∆ ( k +1) > h + ∆ ( k +1) h e ( k ) > hj ∆ ( k +1) > h io  n X j =1 z ( k ) hj  − . (10)An update of the degrees of freedom ν h is obtained by solving the following equation for EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions ν ( k +1) h , 0 = n X h =1 z ( k ) hj ! " log ν ( k +1) h ! − ψ ν ( k +1) h ! + 1 − n X j =1 z ( k ) hj (cid:16) e ( k )1 hj − w ( k ) hj (cid:17) , where ψ ( · ) denotes the digamma function. As the log likelihood function may exhibit a complicated proﬁle with many local maximaand the EM algorithm is sensitive to its initial values, it is important to choose good start-ing values. In this section, we consider three strategies for generating valid initial values forthe EM algorithm for the FM-CFUST model. For the remainder of this section, we sup-press the subscript h (denoting the index of a component in a mixture model) for notationalconvenience. Nested approach

An intuitive approach is to start the EM algorithm with the solution given by one of the nestedmodels of a CFUST distribution, for example, the results from ﬁtting a normal or t -mixturemodel. This option is available in EMMIXcskew with the fmcfust.init function (see Section5.2), which accepts the outputs from the packages

EMMIXskew and

EMMIXuskew . Theformer package provide routines to ﬁt mixtures of (multivariate) normal, t -, (restricted) skewnormal, and skew t -distributions, whereas the latter package ﬁts a mixture of unrestrictedskew t -distributions. Method of moments-based approach

Another approach is based on the moments of an unrestricted multivariate skew normal(uMSN) distribution. As noted earlier, the uMSN distribution is a nested case of the CFUSTdistribution. It can be characterized as the convolution of a truncated normal variable and amultivariate normal variable as follows, Y j = µ + ∆ | U j | + U j , (11)where µ ∈ R , ∆ = diag( δ ) is a diagonal matrix of skewness parameters with diagonal elementsgiven by δ , U j ∼ N p ( , I p ) and U j ∼ N p ( , Σ ). The uMSN distribution (11) has mean andvariance given by E ( Y j ) = µ + r π δ , andcov( Y j ) = Σ + (1 − π ) ∆ , (12)respectively. On rearranging the above expressions, we obtain an expression for µ and Σ interms of δ (recall that by deﬁnition ∆ = diag( δ ) for the uMSN distribution). To obtain aninitial value for δ (0) , one can consider reducing the values of the diagonal elements of Σ (0) by haron Lee & Geoﬀrey McLachlan − a ) where a ∈ [0 ,

1] (Lin 2010). This leads a set of expressionsgiven by δ (0) = ± r π (1 − a ) π − s ∗ , Σ (0) = S + ( a −

1) diag( s ∗ ) , µ (0) = ¯ y − r π δ (0) , (13)where the sign of each element of δ (0) is given by the sign of the third-order sample momentof the corresponding variable about its sample mean. In (13) above, s ∗ is a p -dimensionalvector containing the diagonal elements of the sample covariance matrix S , and ¯ y denotes thesample mean. Concerning the degrees of freedom, it can be set (initially) to a large numberto reﬂect a uMSN distribution. Transformation approach

A third approach is based on a transformation of Y j in an attempt to better handle thecorrelation of the variables in Y j . We consider the transformation vector X j = CY j , where C is an orthogonal matrix such that the covariance matrix of X j , cov( X j ), is diagonal.In practice, we work with the sample covariance matrix of Y j . Then we can ﬁt a uMSTdistribution to the transformed vector X j , where each X j can be characterized as X j = CY j = µ + ∆ | U j | + U j , (14)where U j and U j follow a central multivariate t -distribution with ν degrees of freedomand scale matrix given by I q and Σ , respectively. Note that in (14) above, we have used astochastic representation of the CFUST distribution that is analogous to (11) for a uMSNdistribution. On pre-multiplying X j by C > in (14) to obtain Y j , we have Y j = C > µ + C > ∆ | U j | + C > U j . (15)This suggests that an initial value for µ and for ∆ in a CFUST distribution can be given by C > ˆ µ and C > ˆ ∆ , respectively, where ˆ µ and ˆ ∆ are the estimates of µ and ∆ obtained by ﬁttingthe uMST distribution to the X j . However, it should be noted that if the true distributionof Y j were a CFUSN distribution, then the transformed data X j may not necessarily havea uMSN distribution even though cov( X j ) is diagonal. This happens when the oﬀ-diagonalelements of Σ cancel out the oﬀ-diagonal elements of ∆∆ T . But it might be expected thatin most situations where the sample covariance matrix of the X j is approximately diagonal,the matrix Σ and the skewness matrix ∆ are both diagonal or close to it.In the case where a mixture of CFUST distributions is to be ﬁtted rather than a singlecomponent distribution, we would need to ﬁrst cluster the Y j into g clusters and proceedseparately within each cluster as described above.The above three methods are implemented in the function fmcfust.init of the EMMIXcskew package. By default, it adopts the moments method. An example on a real dataset demon-strating the use of these approaches is given in Section 5.2.

EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions We adopt the Aitken acceleration-based stopping criterion as the default strategy to assess theconvergence of the EM algorithm for the FM-CFUST model. The

EMMIXcskew package alsoprovides a few other criteria as an alternative, including those based on the relative changein the log likelihood value and estimates of the parameters of the model.

Aiken acceleration-based approach

In brief, when using the default strategy, our algorithm terminates when the absolute diﬀer-ence between a log likelihood value and its asymptotic estimate is smaller than a speciﬁedtolerance, (cid:15) ; that is, when (cid:12)(cid:12)(cid:12) ‘ ( k +1) ∞ − ‘ ( k +1) (cid:12)(cid:12)(cid:12) < (cid:15), (16)where ‘ ( k +1) is the log likelihood value at the ( k + 1)th iteration and ‘ ( k +1) ∞ is its asymptoticestimate, given by ‘ ( k +1) ∞ = ‘ ( k ) + ‘ ( k +1) − ‘ ( k ) − a ( k ) . (17)In the above, a ( k ) = ‘ ( k +1) − ‘ ( k ) ‘ ( k ) − ‘ ( k − denotes the Aitken’s acceleration at the k th iteration (B¨ohning et al. (cid:15) = 10 − is applied tothe examples in the following sections, but the user can specify a diﬀerent value. Relative likelihood-based approach

Another commonly used stopping criterion is to monitor the relative changes in the log like-lihood values at the end of each iteration and to stop the algorithm when the (relative)diﬀerence between two successive log likelihood values is less than a speciﬁed threshold. Moreformally, our algorithm terminates when (cid:12)(cid:12) ‘ ( k +1) − ‘ ( k ) (cid:12)(cid:12)(cid:12)(cid:12) ‘ ( k +1) (cid:12)(cid:12) < (cid:15), (18)where the threshold (cid:15) is set to 10 − by default. Again, the user can specify a diﬀerentthreshold. Relative parameters-based approach

Apart from tracking the changes in the log likelihood value, one can also monitor the changesin the parameter estimates. In this case, the algorithm is considered to have converged whenthe relative change in all the parameter estimates is less than a speciﬁed threshold (cid:15) . Notethat this criterion implies that the relative changes of all the free parameters needs to besmaller than (cid:15) , that is, all elements of Ψ including the mixing proportions. Thus, the EMalgorithm terminates when (cid:12)(cid:12)(cid:12) Ψ ( k +1) − Ψ ( k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Ψ ( k +1) (cid:12)(cid:12)(cid:12) < (cid:15). (19) haron Lee & Geoﬀrey McLachlan (cid:15) = 10 − . In the EM algorithm described in this paper, the number of components g and the dimension q of the skewing vector must be speciﬁed. In practice, these are typically unknown and modelselection criteria are employed to aid in choosing an appropriate value of g and q . Some of themore commonly used information criterion include the Bayesian information criterion (BIC;Schwarz (1978)), given by BIC = m log n − ‘, (20)and the Akaike information criterion (AIC; Akaike (1974)), given by BIC = 2 m − ‘, (21)where m is the number of free parameters, n is the number of observations, and ‘ is the valueof the log likelihood function at the ﬁtted parameter vector. As is typical in ﬁtting a mixtureof factor analyzers (MFA) model, one may ﬁt the FM-CFUST model for a range of values of p and q and choose the combination of p and q corresponding to the lowest AIC or BIC. Analternative strategy for automatically selecting an appropriate g was considered in Lee andMcLachlan (2016c) which is based on the minimum message length (MML) criteria. However,concerning the value of q , it was observed in Lee and McLachlan (2016a) that when ﬁttingthe CFUST model with q = p it was able to model data generated from the rMST ( q = 1)and uMST ( q = p and ∆ is a diagonal matrix) distributions quite well. In particular, theestimated ∆ matches the true ∆ reasonably well. For example, in the case of data generatedfrom a rMST distribution, all but one of the columns of the estimated ∆ has elements beingrelatively small, thus resembling the q = 1 case. Hence, in the implementation of the EMalgorithm in the EMMIXcskew package, the default value of q is set to p . But the user isencouraged to experiment with diﬀerent values of q when ﬁtting the FM-CFUST model.

4. Using the EMMIXcskew package

The

EMMIXcskew package implements the EM algorithm described in Section 3 and providesadditional functions such as random sample generation, density evaluation, and graphicsoutputs. The software is primarily written in R . EMMIXcskew will be made available on theComprehensive R Archive Network (CRAN).In this sequel, we now demonstrate the basic usage of the

EMMIXcskew package using simpleexamples. In particular, the following main routines are discussed: • dfmcfust : evaluation of density values; • rfmcfust : generation of a random sample from a FM-CFUST distribution; • fmcfust : ﬁtting a FM-CFUST model; • init.cfust : generation of initial values for use in fmcfust ; • fmcfust.contour.2d : plotting a 2D graph of the contours of a FM-CFUST model;0 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Parameters R arguments Dimensions Description µ mu p × × g the location parameters Σ sigma p × p × g the scale matrices ∆ delta p × q × g the skewness parameters ν dof g × π pro g × EMMIXcskew . • fmcfust.contour.3d : plotting 3D surface contours of a FM-CFUST model. The density of a FM-CFUST distribution is calculated by the dcfust function. The inputsto be passed into this function are similar to that in dfmmst from the

EMMIXuskew package,and must be structured as described in Table 1. Brieﬂy, the parameters µ , Σ , and δ are eachimplemented as a list of g matrices, where g is the number of components in the ﬁtted model.Each element of the list objects mu , sigma , and delta ( h = 1 , . . . , g ) must be speciﬁedas a p × p × p , and p × q matrix, respectively. The parameters dof and pro are g by 1arrays, representing the degrees of freedom and the mixing proportions for each component,respectively. Finally, the input data are speciﬁed by dat , an n × p matrix where each rowrepresents an individual observation.Typically, one may be interested in calculating density values for a ﬁtted FM-CFUST model(obtained from the fmcfust function). In this case, the output object of the ﬁtted model canbe directly passed in to dfmcfust as a single argument known . Note that if both known andall the model parameters were provided by he user, only the values speciﬁed by known wouldbe used. Issuing the following command will return a vector of n × dfmcfust(dat, mu, sigma, delta, dof, pro, known=NULL) For a single CFUST distribution (that is, g=1 ), the dcfust function can be used. Here, thearguments mu , sigma , and delta need not be a list object, but mu can be a numeric array,and sigma and delta are matrices. Similar to the above function call, the dcfust can becalled at the R command prompt as follows, which will return a numeric vector of densityvalues. dcfust(dat, mu, sigma, delta, dof) To ﬁt the FM-CFUST model with a single component ( g = 1), the main routine fmcfust canbe used. This implements the EM algorithm described in Section 3, with the default strategyfor initial values given in (13). By default, q is assumed to be the same as p . A typical callof fmcfust is: fmcfust(g, dat, q=p, initial=NULL, known=NULL, itmax=100, eps=1e-6,nkmeans=20, verbose=TRUE) haron Lee & Geoﬀrey McLachlan iris dataset, available directly in R . For illustrationpurposes, we look at the Versicolor species and focus on the two variables Sepal.Widthh and

Pedal.Length . We ﬁrst create a new data object iris.versicolor with the required data,then execute the fmcfust function with g = 1. This is the minimum information that mustbe supplied to fmcfust . R> library(EMMIXcskew)R> data(iris)R> iris.versicolor <- iris[iris$Species=="versicolor",2:3]R> Fit.versicolor <- fmcfust(1, iris.versicolor)

The above command will return a fmcfust object, containing the ﬁnal estimate of the pa-rameters, the predicted cluster labels, and a number of measures associated with the ﬁt-ted model. The ﬁnal estimated parameters can be accessed using

Fit.versicolor$mu , Fit.versicolor$sigma , Fit.versicolor$delta , Fit.versicolor$dof , and

Fit.versicolor$pro .To view these parameters, summary can be called:

R > summary(Fit.versicolor)Finite Mixture of Multivariate CFUST Distributionwith 1 componentMean: [,1][1,] 3.415878[2,] 4.886890Scale matrix:[,1] [,2][1,] 0.006138577 -0.007283746[2,] -0.007283746 0.020649780Skewness matrix:[,1] [,2][1,] -0.4901844 -0.37242352[2,] -0.9067630 0.03643873Degrees of freedom:[1] 87.47343

The other arguments of fmcfust are similar to that used in fmmst from the

EMMIXuskew package. Brieﬂy, known is a list of model parameters that are known a priori and, if sup-plied, will not be updated in the iterations of EM algorithm. The arguments itmax and eps determine when the EM algorithm is terminated. If either the maximum number of itera-tions as speciﬁed by itmax is reached or the tolerance as speciﬁed by eps is obtained, theEM loop will terminate. User speciﬁed initial values can be supplied using initial . Notethat this must be a list object structured as in Table 1. The argument nkmeans speci-ﬁes the number of k -means trials to be preformed when using the default starting strategy.2 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Figure 1: Contours of a FM-CFUST model ﬁtted to the versicolor data.With verbose=TRUE , fmcfust prints the log likelihood value at each iteration and displays asummary of the estimated parameters of the model.Note that in the above example we have used the default starting strategy to generate initialvalues, and assume q = p . As pointed out in Section 3, this may not always give the optimalﬁt. It is highly recommended that the EM algorithm is run from a range of diﬀerent startingvalues. Some alternative methods for generating diﬀerent starting values are discussed inSection 3.3. These are implemented in init.cfust (see Section 5.2 for further discussions).In addition, the user can also experiment with diﬀerent values of q . However, it is interestingto note that for the simulated dataset of Section 4.1 in Lee and McLachlan (2016a), it wasobserved that the FM-CFUST model is able to (roughly) recover the structure of ∆ withoutprior knowledge of any constraints on the matrix of skewness parameters.To assist in choosing a suitable model for the data from a range of diﬀerent ﬁtted results(for example, using diﬀerent starting values), log likelihood values and information measuressuch as AIC and BIC can be compared. These values are available as part of the out-put of fmcfust and can be accessed using Fit.versicolor$loglik , Fit.versicolor$aic ,and

Fit.versicolor$bic . They can also be viewed using the print command as shownbelow for the example above. The contours of the ﬁtted model can be visualized using fmcfust.contour.2d (see Figure 1). Further details and examples of contour plots will begiven in Section 5.5. > print(Fit.versicolor)Finite Mixture of Multivariate CFUST Distributionwith 1 component haron Lee & Geoﬀrey McLachlan ... the first five elements omitted ...$tau [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11][1,] 1 1 1 1 1 1 1 1 1 1 1$clusters[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1... the rest omitted..$loglik[1] -32.30904$aic[1] 84.61809$bic[1] 103.7383 Consider now the ﬁtting of a three-component FM-CFUST model to the entire iris dataset.It consists of four geometric measurements on 150 observations of Iris, with 50 observationsfrom each of three species of Iris (Setosa, Virginica, and Versicolor). The following code ﬁtsa FM-CFUST model using the results of a FM-uMST model as starting values.

R> fit.unrestricted <- fmmst(3,iris[,-5])R> fit.iris <- fmcfust(3, iris[,-5], initial=fit.unrestricted, method="EMMIXuskew")

Again, this returns a fmcfust object.

Finite Mixture of Multivariate CFUST Distributionswith 3 components--------------------------------------------------Iteration 0 : loglik = -171.9228Iteration 1 : loglik = -170.8919Iteration 2 : loglik = -170.419Iteration 3 : loglik = -170.027Iteration 4 : loglik = -169.6717Iteration 5 : loglik = -169.3511... rest omitted ...--------------------------------------------------Iteration 100: loglik = -154.561 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Component means:[,1] [,2] [,3][1,] 4.8679805 6.345333 5.888617[2,] 3.2808574 3.066393 2.808073[3,] 1.4284854 4.474453 5.037550[4,] 0.1343135 1.317850 2.072705Component scale matrices:[[1]] [,1] [,2] [,3] [,4][1,] 0.0443155106 0.062560933 -0.001555864 -0.0009055587[2,] 0.0625609328 0.113034979 -0.005108048 -0.0020813146[3,] -0.0015558645 -0.005108048 0.004406555 -0.0001740700[4,] -0.0009055587 -0.002081315 -0.000174070 0.0014520658[[2]] [,1] [,2] [,3] [,4][1,] 0.09076830 0.025306942 0.025998408 0.019130182[2,] 0.02530694 0.017246048 0.006082013 0.013000612[3,] 0.02599841 0.006082013 0.018261259 0.007659824[4,] 0.01913018 0.013000612 0.007659824 0.013618464[[3]] [,1] [,2] [,3] [,4][1,] 0.21080765 0.09796696 0.14128463 0.08171739[2,] 0.09796696 0.06083267 0.07166802 0.04816533[3,] 0.14128463 0.07166802 0.11360874 0.07217753[4,] 0.08171739 0.04816533 0.07217753 0.06425322Component skewness matrices:[[1]] [,1] [,2] [,3] [,4][1,] 0.31881607 -0.27858936 0.007451723 0.12347998[2,] 0.07624952 -0.11956708 0.069916382 0.16058854[3,] -0.04518459 -0.15517472 0.171888937 0.07288276[4,] 0.01427354 -0.01726205 0.004160663 0.14294473[[2]] [,1] [,2] [,3] [,4][1,] -0.26806780 0.08357397 -0.5825049 0.254300197[2,] 0.15339662 -0.23236022 -0.3423885 0.045900090[3,] 0.05249894 0.22566719 -0.6678188 0.115936808[4,] 0.12559134 0.08309988 -0.2042694 0.007060828[[3]] haron Lee & Geoﬀrey McLachlan [,1] [,2] [,3] [,4][1,] 0.136541936 -0.13366639 0.51628869 0.3692744[2,] -0.017338343 0.15012982 -0.15683670 0.2290092[3,] -0.219878985 0.03899513 0.53218928 0.3051550[4,] -0.006770662 0.10861986 -0.04394092 -0.1217400Component degrees of freedom:51.74962 185.87390 126.98517Component mixing proportions:0.3333333 0.3314298 0.3352368 In this section, we focus on the restricted and unrestricted versions of MST mixture models.For the normal and t -mixture models, routines for ﬁtting them are implemented in EM-MIXskew . As noted earlier, the rMST distribution corresponds to a CFUST distributionwith q = 1. Thus setting q=1 in fmcfust will ﬁt a FM-rMST model. However, as EM-MIXskew uses a specialized implementation of the EM algorithm for this model, the user isencouraged to use this package when ﬁtting a FM-rMST model. Similarly, the

EMMIXuskew package can be used for the ﬁtting of a FM-uMST model. To ﬁt the FM-rMST model to thesame dataset as in the example above using the EMMIXcskew package, the following codecan be used. > fit.restricted <- fmcfust(3, iris[,-5],q=1)

The above model can also be ﬁtted using the

EMMIXskew package with the command fit.restricted <- EmSkew(iris[,-5],3,"mst",debug=FALSE) . We can compare the pre-dicted cluster labels of these models against the true class labels. For all three models, thepredicted cluster labels are stored as clust in the output object. A cross-tabulation of theselabels suggests that the ﬁtted FM-CFUST model performs well with only one misclassiﬁedobservation, whereas the FM-rMST and FM-uMST models have three and six misclassiﬁedobservations, respectively.

R> table(iris$Species, fit.iris$clust)1 2 3setosa 0 50 0versicolor 49 0 1virginica 0 0 50R> table(iris$Species, fit.restricted$clust)1 2 3setosa 0 6 44versicolor 50 0 0virginica 50 0 0 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Model

FM-CFUST FM-rMST FM-uMST

MCR t -mixture models ﬁtted to the iris dataset. R> table(iris$Species, fit.unrestricted$clust)1 2 3setosa 50 0 0versicolor 0 47 3virginica 0 0 50

The misclassiﬁcation rate (MCR) against the true labels can be calculated using error.rate from the

EMMIXskew package. In this example, the FM-CFUST model obtained the lowestMCR compared to the FM-rMST and FM-uMST models (see Table 2 and the code below).Figure 2 shows the clustering of the iris dataset using these three models.

R> error.rate(unclass(iris$Species), fit.iris$clust)[1] 0.006666667R> error.rate(unclass(iris$Species), fit.restricetd$clust)[1] 0.3733333R> error.rate(unclass(iris$Species), fit.unrestricted$clust)[1] 0.02R>R> panel1 <- function(x,y,...){points(x,y,col=c("red","green3","blue")+ [fit.iris$clust],pch=20)}R> panel2 <- function(x,y,...){points(x,y,col=c("red","green3","blue")+ [fit.unrestricted$clust],pch=20)}R> panel3 <- function(x,y,...){points(x,y,col=c("red","green3","blue")+ [fit.restricted$clust],pch=20)}R> pairs(iris[1:4], main = "Iris Data", pch = 20, col = c("red","green3","blue")+ [unclass(iris$Species)], lower.panel=panel1)R> pairs(iris[1:4], main = "Iris Data", upper.panel=panel2, lower.panel=panel3)

In the case of univariate data, note that all three models become identical, and thus the use of

EmSkew from the

EMMIXskew package is recommended as it provides a more computationallyeﬃcient implementation. It should be noted that the ﬁtting of the FM-uMST and FM-CFUSTmodels can be time consuming due to the amount of calculations required, especially when q is large. When tested on a 3.4GHz machine, the example in Section 4.2 took around 3.5seconds to complete. For the examples in this Section, the CPU time for the FM-CFUST,FM-uMST, and FM-rMST models is around 1856, 1927, and 13 seconds, respectively. Notethat if the specialised implementation of the EMMIXskew package is used, the CPU time forthe FM-rMST model in this example reduces to 0.8 seconds.

5. Miscellaneous functions haron Lee & Geoﬀrey McLachlan iris dataset. The upper panels of the ﬁgure on the left showthe true labels, whereas the lower panels are the predicted labels given by the FM-CFUSTmodel. For the ﬁgure on the right, the upper panels correspond to the results given by theFM-uMST model and the lower panels correspond to that given by the FM-rMST model.This section presents some illustrative examples on how to generate random observations,generate/provide initial values (for the EM algorithm), use diﬀerent stopping criteria, anddraw contour plots for a FM-CFUST model using

EMMIXcskew . The CFUST admits a convolution-type stochastic representation that facilitates random sam-ple generation. More speciﬁcally, let U and U be independent random variables followingmultivariate normal distributions given by N p ( , Σ ) and N q ( , I q ), respectively. Let also w be a scalar random variable with the gamma( ν , ν ) distribution. Then Y = µ + 1 √ w ∆ | U | + 1 √ w U (22)has a CFUST distribution with density given by (1). The rcfust function adopts (22) togenerate a random sample of CFUST observations. Its mixture version is implemented as rfmcfust in EMMIXcskew . These two functions are given, respectively, by rcfust(n, mu, sigma, delta, dof, known=NULL, ...)rfmcfust(g, n, mu, sigma, delta, dof, pro, known=NULL, ...)

Input arguments for the above functions follow the same structure as described in Section4.1, permitting the parameters of the CFUST (or FM-CFUST) model to be speciﬁed eitherindividually using mu , sigma , delta , and dof (and also pro for a FM-CFUST model) or withina list object using known . The argument n speciﬁes the number of random observations tobe generated. In the case of a FM-CFUST model, n is either a single integer (which represents8 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions the total number of observations to be generated) or a vector of g integers representing thenumber of observations to be generated from each of the g component. Note that if n is asingle value, rfmcfust will determine the sample size for each component using the mixingproportion speciﬁed by pro .As an example, suppose one would like to generate 10 random observations from a CFUSTdistribution with µ = (1 , > , Σ = I , ∆ = (cid:20) (cid:21) , and ν = 4, the following command canbe issued at the R command prompt. A 10 × R> rfust(10,c(1,2),diag(2),matrix(c(2,1,1,2),2,2),4)[,1] [,2][1,] 5.836001 5.600793[2,] 3.080172 4.213700[3,] 3.305617 4.888012[4,] 4.390739 3.109635[5,] 4.003996 4.686407[6,] 1.609795 1.599386[7,] 3.361534 5.326190[8,] 3.449745 4.474217[9,] 10.886028 7.964134[10,] 5.752894 7.049037

Generating observations from a mixture of CFUST distributions is also quite simple using rfmcfust . We ﬁrst create an object with the required parameters. This can then be directlypassed in to rfmcfust . An example is shown below. A n × ( p + 1) matrix is returned wherethe last column gives component label for each data point generated. R> obj <- list()R> obj$mu <- list(c(17,19), c(5,22), c(6,10))R> obj$sigma <- list(diag(2), matrix(c(2,0,0,1),2),+ matrix(c(3,7,7,24),2))R> obj$delta <- list(matrix(c(3,0,2,1.5),2,2), matrix(c(5,0,0,10),2,2),+ matrix(c(2,0,5,0),2,2))R> obj$dof <- c(1, 2, 3)R> obj$pro <- c(0.25, 0.25, 0.5)R> rfmcfust(3, 100, known=obj) [,1] [,2] [,3][1,] 46.143907 25.56151304 1[2,] 17.816665 18.22572581 1[3,] 33.915805 25.54308697 1[4,] 44.609637 15.81099978 1[5,] 54.766995 16.10253015 1[6,] 18.610320 19.74165026 1[7,] 25.303312 20.80981782 1[8,] 20.608770 21.58460735 1[9,] 19.679756 20.51390429 1 haron Lee & Geoﬀrey McLachlan [10,] 20.988970 16.10998335 1... the rest omitted ... Three diﬀerent strategies for generating starting values for the FM-CFUST model were de-scribed in Section 3.3. These are implemented in the

EMMIXcskew package. Apart fromthe default starting strategy (13) which makes use of moment-based estimates of a uMSNdistribution, the init.cfust function implements the transformation approach (15) as oneof its options, and accepts starting values based on the results of its nested models as anotheroption. The arguments of the function are the following: init.cfust(g, dat, q=p, initial=NULL, known=NULL, clust=NULL, nkmeans=20,method=c("moments","transformation","EMMIXskew","EMMIXuskew"))

To use a ﬁtted model given by the packages

EMMIXskew and

EMMIXuskew , set method to "EMMIXskew" and "EMMIXuskew" , respectively, and the output of the functions EmSkew and fmmst can be directly passed into init.fust using the argument initial . If an initial valueof the parameter vector is not supplied (that is, initial=NULL ), then the default option is toprovide an initial value for each component-parameter vector obtained by applying the methodof moments (that is, method="moments" ) to the clusters corresponding to the components.These clusters are obtained by using the k -means procedure, but the user can specify aninitial partition obtained using some other method of clustering, for example, a model-basedapproach using a mixture of t -distributions. The user-speciﬁed initial partition is passed inusing clust . If the transformation approach is preferred, set method="transformation" .Again, in this case, if an initial partition is not supplied, the partition given by k -meansclustering is used.An example session is shown below demonstrating how to use the above function to generatediﬀerent starting values. We use the Geyser dataset (Azzalini and Bowman 1990), whichcontain measurements on 299 successive eruptions of the Old Faithful Geyser during 1 Augustto 15 August 1985. The two variables recorded were the waiting time between eruptions andthe duration of each eruption, both measured in minutes. This dataset is available from the MASS package.

R> library(MASS)R> data(geyser)R> plot(geyser, pch=20)

An initial inspection of the dataset (Figure 3(a)) suggests three clusters. Hence, we set g to 3. In the example below, initial.default and initial.transformation refers todefault (moment-based) approach and the transformation approach, respectively. For thenested approach, we have demonstrated in Section 4.3 how to use the results of a ﬁtted FM-uMST model as initial values. In that example, the model was ﬁtted using fmmst() in ourpackage, which is a replica of the same function in the EMMIXuskew package, and hence theoption method="EMMIXuskew" was used. This option can be used in the same way to supplyinitial values from the

EMMIXuskew package. In addition, the

EMMIXcskew package also0

EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions accepts outputs from the EMMIXskew package which provide routines to ﬁt ﬁnite mixturesof normal, t , (restricted) skew-normal, and (restricted) skew t -distributions. In this case, theoption method="EMMIXskew" is used to pass the results to fmcfust() . R> initial.default <- init.cfust(3, geyser)R> initial.transformation <- init.cfust(3, geyser, method="transformation")R> fit.geyser.restricted <- EmSkew(geyser, 3, "mst", debug=FALSE)R> initial.restricted <- init.cfust(3, geyser, initial=fit.gesyser.restricted,+ method="EMMIXskew")R> fit.geyser.unrestricted <- fmmst(3, geyser)R> initial.unrestricted <- init.cfust(3, geyser, initial=fit.gesyser.unrestricted,+ method="EMMIXuskew")R> fit.geyser.t <- EmSkew(geyser, 3, "mvt", debug=FALSE)R> initial.t <- init.cfust(3, geyser, initial=fit.gesyser.t, method="EMMIXskew")

To help choose an appropriate starting value, we can compare the log likelihood values forthe FM-CFUST model ﬁtted using these initial values.

R> initial.default$loglik[1] -1903.598R> initial.transformation$loglik[1] -1448.33R> initial.restricted$loglik[1] -1335.038R> initial.unrestricted$loglik[1] -1404.772R> initial.t$loglik[1] -1347.385

In this case, the starting value that corresponds to the ﬁtted model of FM-rMST gave thehighest log likelihood value. We now proceed to ﬁt a FM-CFUST model using the defaultstarting strategy, the transformation approach, and the ﬁtted model of FM-rMST.

R> fit.geyser1 <- fmcfust(3, geyser, initial=initial.default)R> fit.geyser2 <- fmcfust(3, geyser, initial=initial.transformation)R> fit.geyser3 <- fmcfust(3, geyser, initial=initial.restricted)R> fit.geyser1$loglik[1] -1415.442R> fit.geyser2$loglik[1] -1344.724R> fit.geyser3$loglik[1] -1333.533

According to the use of the ﬁnal log likelihood value, the results of fit.geyser3 are preferred.This corresponds to the result using the initial value with the highest log likelihood valueidentiﬁed above. Note that this may not always be the case; that is, using the initial value haron Lee & Geoﬀrey McLachlan fit.geyser2 perhaps gave a more natural partition of the data, althoughits log likelihood value is lower than that given by fit.geyser3 . R> plot(geyser, pch=20, col=c("red","blue","green")[fit.geyser1$clust])R> plot(geyser, pch=20, col=c("red","blue","green")[fit.geyser2$clust])R> plot(geyser, pch=20, col=c("red","blue","green")[fit.geyser3$clust])

The stopping criteria described in Section 3.4 are available through the convergence option in fmcfust() . The default is using Aitken’s acceleration-based approach ( convergence="Aitken" ).The other two options are convergence="likelihood" and convergence="parameters" , re-ferring to relative likelihood based and relative parameters-based approach respectively. Forillustration, using the Geyser dataset and intiial.restricted as initial values as an exam-ple, we can run the EM algorithm with the relative likelihood-based and relative parameter-based convergence criteria using the following commands.

R> fit.geyser4 <- fmcfust(3, geyser, initial = initial.restricted,+ convergence=c("likelihood"))

Finite Mixture of Multivariate CFUST Distributionswith 3 components----------------------------------------------------Iteration 0 : loglik = -1335.038Iteration 1 : loglik = -1335.01Iteration 2 : loglik = -1334.987Iteration 3 : loglik = -1334.965Iteration 4 : loglik = -1334.943Iteration 5 : loglik = -1334.922... rest omitted ...--------------------------------------------------Iteration 100: loglik = -1333.533

R. fit.geyser5 <- fmcfust(3, geyser, initial = initial.restricted,+ convergence=c("parameters"))

Finite Mixture of Multivariate CFUST Distributionswith 3 components---------------------------------------------------- EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Figure 3: The Old Faithful geyser data from the

MASS package. (a) Scatter plot of thedata. (b) Clustering results of the FM-CFUST model using the default (moments-based)approach for generating starting values. (c) Clustering results of the FM-CFUST modelusing the transformation approach for generating starting values. (d) Clustering results ofthe FM-CFUST model using the results of FM-rMST model as starting values. haron Lee & Geoﬀrey McLachlan Iteration 0 : loglik = -1335.038Iteration 1 : loglik = -1335.01Iteration 2 : loglik = -1334.987Iteration 3 : loglik = -1334.965Iteration 4 : loglik = -1334.943Iteration 5 : loglik = -1334.922... rest omitted ...--------------------------------------------------Iteration 100: loglik = -1333.533

In both cases, we can observe from the output above that the EM algorithm terminates inthe same iteration for this example. g with BIC Model selection criteria are typically used to guide the choice of g when ﬁtting ﬁnite mixturemodels. The fmcfust() function provides the values of AIC and BIC as part of the outputwhen ﬁtting a FM-CFUST model. We show here a short example of using BIC to assistin choosing the optimal value of g . We ﬁt the FM-CFUST model with g = 1 , . . . , g = 2. R> fit.geyser.g1 <- fmcfust(1, geyser)R> fit.geyser.g2 <- fmcfust(2, geyser)R> fit.geyser.g3 <- fit.geyser1R> fit.geyser.g4 <- fmcfust(4, geyser)

R> fit.geyser.g1$bic[1] 3166.698R> fit.geyser.g2$bic[1] 2963.833R> fit.geyser.g3$bic[1] 3013.298R> fit.geyser.g4$bic[1] 3069.547

Contour plots for a FM-CFUST model can be produced easily using the functions fmcfust.contour.2d and fmcfust.contour.3d for a two-dimensional and three-dimensionalspace, respectively. These two functions take a number of arguments described below. fmcfust.contour.2d(dat, model, grid=50, drawpoints=TRUE, clust=NULL,nlevels=10, component=NULL, ...)fmcfust.contour.3d(dat, model, grid=20, drawpoints=TRUE, clust=NULL,levels=0.9, component=NULL, ...) EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Brieﬂy, dat is a dataset that is either a matrix or data.frame . Note that if dat is notspeciﬁed, then the limits of the axes of the plot must be speciﬁed (using the standard xlim , ylim , and zlim arguments). The parameters of the FM-CFUST model are speciﬁed using model . Typically, this is an output from fmcfust . The argument grid determines the gridsize of the plots. Thus the higher the number in grid , the smoother the contours (at the costof longer computation time). The data points (if provided) are plotted by default. By setting drawpoints=FALSE , only the contours will be plotted. In the case where g >

1, a user-speciﬁedpartition of the data can be provided using clust . This is used when drawpoints=TRUE andthe data points in the plot will be colour-coded using the labels in clust . The arguments nlevels and levels control how many contours are displayed and at which percentiles arethey computed, respectively. Finally, component speciﬁes which component is included inthe plot. This option allows the components to be plotted individually without taking intoaccount the mixing proportions. In contrast, the default is to return the contours of a mixturedensity.To illustrate the use of fmcfust.contour.2d , we reconsider the iris.versicolor examplein Section 4.2. Figure 1 can be generated by the following command in R . R> fmcfust.contour.2d(iris.versicolor, fit.Versicolor, drawpoints=FALSE,+ main="versicolor")

We now turn to an example of generating a 3D contour plot of a FM-CFUST model. Sup-pose we would like to draw the contours of a two-component FM-CFUST distribution withparameters µ = (0 , , > , µ = (5 , , > , ν = ν = 3, π = (0 . , . Σ =   , Σ = 2 I , ∆ =   , and ∆ =   . We ﬁrst create an object obj with these parameters. By default, a mixture density is producedon running the fmcfust.contour.3d (see Figure 4(a)). A ﬁrst remark on this ﬁgure is thatthe two components seem to be ‘joined’ together. To gain a better view of the shapes of thesetwo components, we may set components=1:2 so that their mixing proportions are ignored.In addition, we generate 500 random observations from the speciﬁed FM-CFUST model andinclude them in the plot. Observe now in Figure 4(b) that the two components are plottedas two separately objects and their colours are matched with the simulated data.

R> obj <- list()R> obj$mu <- list(matrix(c(0,0,0),3), matrix(c(5,5,5),3))R> obj$sigma <- list(matrix(c(5,2,1,2,5,1,1,1,1),3,3), 2*diag(3))R> obj$delta <- list(matrix(c(1,0,0,1,0,0,1,0,0),3,3),+ matrix(c(5,0,0,0,10,0,0,0,15),3,3))R> obj$dof <- c(3,3)R> obj$pro <- c(0.2, 0.8)R> fmcfust.contour.3d(model=obj, level=0.98, drawpoints=TRUE, xlab="X",+ ylab="Y", zlab="Z", xlim=c(-20, 50), ylim=c(-20,50), zlim=c(-20,80))R> X <- rfmcfust(2, 500, known=obj)R> fmcfust.contour.3d(X, model=obj, level=c(0.99, 0.92), drawpoints=TRUE,+ clust=X[,4], xlab="X", ylab="Y", zlab="Z", xlim=c(-20, 50), ylim=c(-20, 50),+ zlim=c(-20, 80), component=1:2) haron Lee & Geoﬀrey McLachlan

EMMIXcskew package. (a) 3Dcontours of the density of a FM-CFUST distribution. (b) 3D contours of the density of eachcomponent of the FM-CFUST distribution shown in (b).

6. Concluding remarks

This paper has presented the

EMMIXcskew package for the ﬁtting of a CFUST distributionand ﬁnite mixtures of CFUST distributions to heterogeneous data that exhibit non-normalfeatures. In addition to computing the maximum likelihood estimates of the model param-eters, the

EMMIXcskew package provides routines for random number generation, densityevaluation, the plotting of 2D and 3D contours, and a few diﬀerent methods for initial valuesgeneration for the FM-CFUST model. A ﬁnite mixture of CFUST distributions provides amodel for the robust extension of traditional normal mixture models, with greater ﬂexibilityin handling asymmetry and heavy tails. The skewness parameters of a CFUST distributionare characterized by a general matrix, which provides an elegant uniﬁcation of the restrictedand unrestricted skew t -distributions. The aim of this package is to provide users with theoption of ﬁtting this ﬂexible distribution to their dataset. Model selection criteria such as theAIC and BIC are provided for the FM-CFUST model to assist the user in choosing betweendiﬀerent models for their data.It is noted that the ﬁtting of a FM-CFUST model can be quite computationally intensive when q is large. This is due to the calculations of some of the conditional expectations involved inthe E-step of the EM algorithm. Future work will consider applicable strategies to speed upthe parameter estimation process for this model such as parallel implementations describedin Lee et al. (2016b,a) and Lee and McLachlan (2016b). Acknowledgments

This work is supported by grants from the Australian Research Council.6

EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions References

Abanto-Valle CA, Lachos VH, Dey DK (2015). “Bayesian Estimation of a Skew-Student- t Stochastic Volatility Model.”

Methodology and Computing in Applied Probability , ,721–738.Akaike H (1974). “A New Look at the Statistical Model Identiﬁcation.” Automatic Control , , 716–723.Arellano-Valle RB, Azzalini A (2006). “On the Uniﬁcation of Families of Skew-Normal Dis-tributions.” Scandinavian Journal of Statistics , , 561–574.Arellano-Valle RB, Branco MD, Genton MG (2006). “A Uniﬁed View on Skewed DistributionsArising from Selections.” The Canadian Journal of Statistics , , 581–601.Arellano-Valle RB, Genton MG (2005). “On Fundamental Skew Distributions.” Journal ofMultivariate Analysis , , 93–116.Asparouhov T, Muth´en B (2016). “Structural Equation Models And Mixture Models WithContinuous Nonnormal Skewed Distributions.” Structural Equation Modeling , , 1–19.Azzalini A (2005). “The Skew-normal Distribution and Related Multivariate Families.” Scan-dinavian Journal of Statistics , , 159–188.Azzalini A (2014). The R sn package : The Skew-Normal and Skew- t Distributions (version1.0-0) . Universit`a di Padova, Italia. URL http://azzalini.stat.unipd.it/SN .Azzalini A, Bowman AW (1990). “A Look at Some Data on the Old Faithful Geyser.”

AppliedStatistics , , 357–365.Azzalini A, Browne RP, Genton MG, McNicholas P (2016). “On nomenclature for, and therelative merits of, two formulations of skew distributions.” Statistics and Probaility Letters , , 201–206.Azzalini A, Capitanio A (2003). “Distributions Generated by Perturbation of Symmetry withEmphasis on a Multivariate Skew t Distribution.”

Journal of the Royal Statistical SocietyB , , 367–389.Azzalini A, Capitanio A (2014). The Skew-Normal and Related Families . Institute of Math-ematical Statistics Monographs. Cambridge University Press, UK.Banﬁeld JD, Raftery AE (1993). “Model-Based Gaussian and Non-Gaussian Clustering.”

Biometrics , , 803–821.Benaglia T, Chauveau D, Hunter DR, Young DS (2009). “ mixtools : An R Package for Ana-lyzing Mixture Models.”

Journal of Statistical Software , (6), 1–29.Bernardi M (2013). “Risk Measures for Skew Normal Mixtures.” Statistics & ProbabilityLetters , , 1819–1824.Biecek P, Szczurek E, Vingron M, Tiuryn J (2012). “The R Package bgmm : Mixture Modelingwith Uncertain Knowledge.”

Journal of Statistical Software , (3), 1–31. haron Lee & Geoﬀrey McLachlan Computer Assisted Analysis of Mixtures and Applications: Meta-Analysis,Disease Mapping, and Others . Chapman and Hall/CRC, London.B¨ohning D, Dietz E, Schaub R, Schlattmann, P Lindsay BG (1994). “The Distribution of theLikelihood Ratio for Mixtures of Densities from the One-parameter Exponential Family.”

Annals of the Institute of Statistical Mathematics , , 373–388.Branco MD, Dey DK (2001). “A General Class of Multivariate Skew-Elliptical Distributions.” Journal of Multivariate Analysis , , 99–113.Contreras-Reyes JE, Arellano-Valle RB (2013). “Growth Estimates of Cardinalﬁsh (EpigonusCrassicaudus) Based on Scale Mixtures of Skew-Normal Distributions.” Fisheries Research , , 137–144.Dempster AP, Laird NM, Rubin DB (1977). “Maximum Likelihood From Incomplete Datavia the EM Algorithm.” Journal of Royal Statistical Society B , , 1–38.Everitt BS, Hand DJ (1981). Finite Mixture Distributions . Chapman & Hall, London.Fraley C, Raftery AE (1998). “How Many Clusters? Which Clustering Methods? Answersvia Model-Based Cluster Analysis.”

Computer Journal , , 578–588.Fraley C, Raftery AE (2007). “Model-Based Methods of Classiﬁcation: Using the mclust Software in Chemometrics.”

Journal of Statistical Software , (6), 1–13.Fr¨uhwirth-Schnatter S (2006). Finite Mixture and Markov Switching Models . Springer-Verlag,London.Fr¨uhwirth-Schnatter S, Pyne S (2010). “Bayesian Inference for Finite Mixtures of Univariateand Multivariate Skew-Normal and Skew- t Distributions.”

Biostatistics , , 317–336.Genton MG (ed.) (2004). Skew-Elliptical Distributions and Their Applications: A JourneyBeyond Normality . Chapman & Hall, CRC.Gr¨un B, Leisch F (2008). “

FlexMix

Version 2: Finite Mixtures with Concomitant Variablesand Varying and Constant Parameters.”

Journal of Statistical Software , (4), 1–35.Gupta AK (2003). “Multivariate Skew- t Distribution.”

Statistics , , 359–363.Ho HJ, Lin TI, Chang HH, Haase HB, Huang S, Pyne S (2012). “Parametric Modeling ofCellular State Transitions as Measured With Flow Cytometry Diﬀerent Tissues.” BMCBioinformatics , , (Suppl 5): S5.Hu X, Kim H, Brennan PJ, Han B, Baecher-Allan CM, De Jager PL, Brenner MB, Ray-chaudhuri S (2013). “Application of User-Guided Automated Cytometric Data Analysis toLarge-Scale Immunoproﬁling of Invariant natural killer T cells.” Proceedings of the NationalAcademy of Sciences USA , , 19030–19035. doi:10.1073/pnas.1318322110 .Lachos VH, Ghosh P, Arellano-Valle RB (2010). “Likelihood Based Inference for Skew NormalIndependent Linear Mixed Models.” Statistica Sinica , , 303–322.Lee S, McLachlan GJ (2014a). “Finite Mixtures of Multivariate Skew t -Distributions: SomeRecent and New Results.” Statistics and Computing , , 181–202.8 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Lee SX, Leemaqz KL, McLachlan GJ (2016a). “A block EM algorithm for multivariate skewnormal and skew t-mixture models.” arXiv:1608.02797 .Lee SX, Leemaqz KL, McLachlan GJ (2016b). “A simple parallel EM algorithm for statisticallearning via mixture models.” In

Proceedings of the 2016 International Conference onDigital Image Computing: Techniques and Applications , pp. 295–302.Lee SX, McLachlan GJ (2013a). “Model-Based Clustering and Classiﬁcation with Non-NormalMixture Distributions.”

Statistical Methods and Applications , , 427–454.Lee SX, McLachlan GJ (2013b). “Modelling Asset Return Using Multivariate Asym- met-ric Mixture Nodels with Applications to Wstimation of Value-at-Risk.” In J Piantadosi,RS Anderssen, J Boland (eds.), MODSIM 2013 (20th International Congress on Modellingand Simulation) , pp. 1228–1234. Adelaide, Australia.Lee SX, McLachlan GJ (2013c). “On Mixtures of Skew-Normal and Skew t -Distributions.” Advances in Data Analysis and Classiﬁcation , , 241–266.Lee SX, McLachlan GJ (2013d). “ EMMIXuskew : An R Package for Fitting Mixtures ofMultivariate Skew t Distributions via the EM Algorithm.”

Journal of Statistical Software , (12), 1–22. URL .Lee SX, McLachlan GJ (2014b). “Maximum Likelihood Estimation for Finite Mixtures ofCanonical Fundamental Skew t -Distributions: the Uniﬁcation of the Unrestricted and Re-stricted Skew t-Mixture Models.” arXiv:1401.8182 .Lee SX, McLachlan GJ (2016a). “Finite Mixtures of Canonical Fundamental Skew t -Distributions: The Uniﬁcation of the Restricted and Unrestricted Skew t -Mixture Models.” Statistics and Computing , , 573–589.Lee SX, McLachlan GJ (2016b). “On mixture modelling with multivariate skew distributions.”In A Colubi, A Blanco, C Gatu (eds.), Proceedings of COMPSTAT 2016 , pp. 137–147.The Hague: The International Statistical Institute/International Association for StatisticalComputing.Lee SX, McLachlan GJ (2016c). “Unsupervised component-wise EM learning for ﬁnite mix-tures of skew t-distributions.” In

Lecture Notes in Artiﬁcial Intelligence , volume 10086, pp.692–699. Berlin: Springer.Lee SX, McLachlan GJ, Pyne S (2014). “Supervised Classiﬁcation of Flow Cytometric Samplesvia the Joint Clustering and Matching (JCM) Procedure.” arXiv:1411.2820 .Lee SX, McLachlan GJ, Pyne S (2016c). “Modelling of Inter-sample Variation in Flow Cy-tometric Data with the Joint Clustering and Matching (JCM) Procedure.”

Cytometry A . doi:10.1002/cyto.a.22789 .Leisch F (2004). “ FlexMix : A General Framework for Finite Mixture Models and LatentClass Regression in R .” Journal of Statistical Software , (8), 1–18.Lin TI (2010). “Robust Mixture Modeling using Multivariate Skew- t Distribution.”

Statisticsand Computing , , 343–356. haron Lee & Geoﬀrey McLachlan Journal of Multivariate Analysis , ,398–413.Lin TI, Wu PH, McLachlan GJ, Lee SX (2015). “A Robust Factor Analysis Model Using theRestricted Skew t -Distribution.” TEST , , 510–531.Lindsay BG (1995). Mixture Models: Theory, Geometry, and Applications . NSF-CBMSRegional Conference Series in probability and Statistics, Vol. 5 (Institute of MathematicalStatistics and the American Statistical Association), Alexandria, VA.McLachlan GJ, Basford KE (1988).

Mixture Models: Inference and Applications . MarcelDekker, New York.McLachlan GJ, Krishnan T (1997).

The EM Algorithm and Extensions . John Wiley & Sons,Hokoben, New York.McLachlan GJ, Lee SX (2016). “Comment on ”On nomenclature for, and the relative meritsof, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P.McNicholas.”

Statistics and Probaility Letters , , 1–5.McLachlan GJ, Peel D (2000). Finite Mixture Models . Second edition. John Wiley & Sons,New York.Mengersen KL, Robert CP, Titterington DM (2011).

Mixtures: Estimation and Applications .John Wiley & Sons, New York.Muth´en B, Asparouhov T (2014). “Growth Mixture Modeling with Non-Normal Distribu-tions.”

Statistics and Medicine , , 1041–1058.Prates MO, Cabral CRB, Lachos VH (2013). “ mixsmsn : Fitting Finite Mixture of ScaleMixture of Skew-Normal Distributions.” Journal of Statistical Software , (12), 1–20.Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ,Tamayo P, Haﬂer DA, De Jager PL, Mesirow JP (2009). “Automated High-DimensionalFlow Cytometric Data Analysis.” Proceedings of the National Academy of Sciences USA , , 8519–8524.Pyne S, Lee S, McLachlan G (2015). “Nature and Man: The Goal of Bio-security in theCourse of Rapid and Inevitable Human Development.” Journal of the Indian Society ofAgricultural Statistics , , 117–125.Pyne S, Lee SX, Wang K, Irish J, Tamayo P, Nazaire MD, Duong T, Ng SK, Haﬂer D,Levy R, Nolan GP, Mesirov J, McLachlan G (2014). “Joint Modeling and Registration ofCell Populations in Cohorts of High-Dimensional Flow Cytometric Data.” PLOS ONE , ,e100334. doi:10.1371/journal.pone.0100334 .Riggi S, Ingrassia S (2013). “A Model-Based Clustering Approach for Mass CompositionAnalysis of High Energy Cosmic Rays.” Astroparticle Physics , , 86–96.Rossin E, Lin TI, Ho HJ, Mentzer SJ, Pyne S (2011). “A Framework for Analytical Char-acterization of Monoclonal Antibodies Based on Reactivity Proﬁles in Diﬀerent Tissues.” Bioinformatics , , 2746–2753.0 EMMIXcskew : Finite Mixtures of Canonical Fundamental Skew t -distributions Sahu SK, Dey DK, Branco MD (2003). “A New Class of Multivariate Skew Distributionswith Applications to Bayesian Regression Models.”

The Canadian Journal of Statistics , , 129–150.Schaarschmidt F, Hofmann M, Jaki T, Gr¨un B, Hothorn LA (2015). “Statistical Approachesfor the Determination of Cut Points in Anti-Drug Antibody Bioassays.” Journal of Im-munological Methods , , 295–306.Schwarz G (1978). “Estimating the Dimension of a Model.” Annals of Statistics , , 461–464.Scrucca L, Fop M, Murphy TB, Raftery AE (2016). “ mclust

5: Clustering, Classiﬁcation andDensity Estimation Using Gaussian Finite Mixture Models.”

The R Journal , (1).Soltyk S, Gupta R (2011). “Application of the Multivariate Skew Normal Mixture Modelwith the EM Algorithm to Value-at-Risk.” In F Chan, D Marinova, RS Anderssen (eds.), MODSIM 2011 (19th International Congress on Modelling and Simulation) , pp. 1638–1644.Perth, Australia.Titterington DM, Smith AFM, Markov UE (1985).

Statistical Analysis of Finite MixtureDistributions . John Wiley & Sons, New York.Tortora C, Browne RP, Franczak BC, McNicholas PD (2015).

MixGHD : Model BasedClustering, Classiﬁcation and Discriminant Analysis Using the Mixture of Generalized Hy-perbolic Distributions. R package version 1.7, URL http://cran.r-project.org/web/packages/MixGHD .Wang K, McLachlan GJ, Ng SK, Peel D (2009). EMMIXskew : EM Algorithm for Mixtureof Multivariate Skew Normal/ t Distributions. R package version 1.0.20, URL http://CRAN.R-project.org/package=EMMIXskew . Aﬃliation:

Sharon X. LeeSchool of Mathematics and PhysicsUniversity of QueenslandBrisbane, AustraliaE-mail: [email protected]