Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul D. McNicholas is active.

Publication


Featured researches published by Paul D. McNicholas.


Statistics and Computing | 2008

Parsimonious Gaussian mixture models

Paul D. McNicholas; Thomas Brendan Murphy

Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases.In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed.These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.


Bioinformatics | 2010

Model-based clustering of microarray expression data via latent Gaussian mixture models

Paul D. McNicholas; Thomas Brendan Murphy

MOTIVATION In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info


Computational Statistics & Data Analysis | 2010

Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models

Paul D. McNicholas; Thomas Brendan Murphy; Aaron F. McDaid; Dermot Frost

Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an ecient algorithm for its implementation is presented. This algorithm uses the alternating expectationconditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this family of models, namely model selection and convergence criteria, are discussed. These central issues also have implications for other model-based clustering techniques and for the implementation of techniques like the EM algorithm, in general. The Bayesian information criterion (BIC) is used for model selection and Aitken’s acceleration, which is shown to outperform the lack of progress criterion, is used to determine convergence. A brief introduction to parallel computing is then given before the implementation of this algorithm in parallel is facilitated within the master-slave paradigm. A simulation study is then carried out to confirm the eectiveness of this parallelization. The resulting software is applied to two data sets to demonstrate its eectiveness when compared to existing software.


Statistics and Computing | 2011

Extending mixtures of multivariate t-factor analyzers

Jeffrey L. Andrews; Paul D. McNicholas

Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken’s acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Mixtures of Shifted AsymmetricLaplace Distributions

Brian C. Franczak; Ryan P. Browne; Paul D. McNicholas

A mixture of shifted asymmetric Laplace distributions is introduced and used for clustering and classification. A variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution. This approach is mathematically elegant and relatively computationally straightforward. Our novel mixture modelling approach is demonstrated on both simulated and real data to illustrate clustering and classification applications. In these analyses, our mixture of shifted asymmetric Laplace distributions performs favourably when compared to the popular Gaussian approach. This work, which marks an important step in the non-Gaussian model-based clustering and classification direction, concludes with discussion as well as suggestions for future work.


BMC Genomics | 2013

Genome-wide expression profiling of maize in response to individual and combined water and nitrogen stresses

Sabrina Humbert; Sanjeena Subedi; Jonathan Cohn; Bin Zeng; Yong-Mei Bi; Xi Chen; Tong Zhu; Paul D. McNicholas; Steven J. Rothstein

BackgroundWater and nitrogen are two of the most critical inputs required to achieve the high yield potential of modern corn varieties. Under most agricultural settings however they are often scarce and costly. Fortunately, tremendous progress has been made in the past decades in terms of modeling to assist growers in the decision making process and many tools are now available to achieve more sustainable practices both environmentally and economically. Nevertheless large gaps remain between our empirical knowledge of the physiological changes observed in the field in response to nitrogen and water stresses, and our limited understanding of the molecular processes leading to those changes.ResultsThis work examines in particular the impact of simultaneous stresses on the transcriptome. In a greenhouse setting, corn plants were grown under tightly controlled nitrogen and water conditions, allowing sampling of various tissues and stress combinations. A microarray profiling experiment was performed using this material and showed that the concomitant presence of nitrogen and water limitation affects gene expression to an extent much larger than anticipated. A clustering analysis also revealed how the interaction between the two stresses shapes the patterns of gene expression over various levels of water stresses and recovery.ConclusionsOverall, this study suggests that the molecular signature of a specific combination of stresses on the transcriptome might be as unique as the impact of individual stresses, and hence underlines the difficulty to extrapolate conclusions obtained from the study of individual stress responses to more complex settings.


Computational Statistics & Data Analysis | 2014

Mixtures of skew-t factor analyzers

Paula M. Murray; Ryan P. Browne; Paul D. McNicholas

A mixture of skew-t factor analyzers is introduced as well as a family of mixture models based thereon. The particular formulation of the skew-t distribution used arises as a special case of the generalized hyperbolic distribution. Like their Gaussian and t-distribution analogues, mixtures of skew-t factor analyzers are very well-suited for model-based clustering of high-dimensional data. The alternating expectation–conditional maximization algorithm is used for model parameter estimation and the Bayesian information criterion is used for model selection. The models are applied to both real and simulated data, giving superior clustering results when compared to a well-established family of Gaussian mixture models.


Canadian Journal of Statistics-revue Canadienne De Statistique | 2015

A mixture of generalized hyperbolic distributions

Ryan P. Browne; Paul D. McNicholas

We introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives of which the mixture of multivariate t and skew-t distributions are predominant. The mathematical development of our mixture of generalized hyperbolic distributions model relies on its relationship with the generalized inverse Gaussian distribution. The latter is reviewed before our mixture models are presented along with details of the aforesaid reliance. Parameter estimation is outlined within the expectation-maximization framework before the clustering performance of our mixture models is illustrated via applications on simulated and real data. In particular, the ability of our models to recover parameters for data from underlying Gaussian and skew-t distributions is demonstrated. Finally, the role of Generalized hyperbolic mixtures within the wider model-based clustering, classification, and density estimation literature is discussed.


Statistics and Computing | 2012

Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

Jeffrey L. Andrews; Paul D. McNicholas

The last decade has seen an explosion of work on the use of mixture models for clustering. The use of the Gaussian mixture model has been common practice, with constraints sometimes imposed upon the component covariance matrices to give families of mixture models. Similar approaches have also been applied, albeit with less fecundity, to classification and discriminant analysis. In this paper, we begin with an introduction to model-based clustering and a succinct account of the state-of-the-art. We then put forth a novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure. This family, which is largely a t-analogue of the well-known MCLUST family, is known as the tEIGEN family. The efficacy of this family for clustering, classification, and discriminant analysis is illustrated with both real and simulated data. The performance of this family is compared to its Gaussian counterpart on three real data sets.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Model-Based Learning Using a Mixture of Mixtures of Gaussian and Uniform Distributions

Ryan P. Browne; Paul D. McNicholas; Matthew D. Sparling

We introduce a mixture model whereby each mixture component is itself a mixture of a multivariate Gaussian distribution and a multivariate uniform distribution. Although this model could be used for model-based clustering (model-based unsupervised learning) or model-based classification (model-based semi-supervised learning), we focus on the more general model-based classification framework. In this setting, we fit our mixture models to data where some of the observations have known group memberships and the goal is to predict the memberships of observations with unknown labels. We also present a density estimation example. A generalized expectation-maximization algorithm is used to estimate the parameters and thereby give classifications in this mixture of mixtures model. To simplify the model and the associated parameter estimation, we suggest holding some parameters fixed-this leads to the introduction of more parsimonious models. A simulation study is performed to illustrate how the model allows for bursts of probability and locally higher tails. Two further simulation studies illustrate how the model performs on data simulated from multivariate Gaussian distributions and on data from multivariate t-distributions. This novel approach is also applied to real data and the performance of our approach under the various restrictions is discussed.

Collaboration


Dive into the Paul D. McNicholas's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge