Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Geoffrey J. McLachlan is active.

Publication


Featured researches published by Geoffrey J. McLachlan.


Archive | 2005

Finite Mixture Models: McLachlan/Finite Mixture Models

Geoffrey J. McLachlan; David Peel

The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and ge...


Knowledge and Information Systems | 2007

Top 10 algorithms in data mining

Xindong Wu; Vipin Kumar; J. Ross Quinlan; Joydeep Ghosh; Qiang Yang; Hiroshi Motoda; Geoffrey J. McLachlan; Angus F. M. Ng; Bing Liu; Philip S. Yu; Zhi-Hua Zhou; Michael Steinbach; David J. Hand; Dan Steinberg

This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.


Journal of the American Statistical Association | 1989

Mixture models : inference and applications to clustering

Geoffrey J. McLachlan; K. E. Basford

General Introduction Introduction History of Mixture Models Background to the General Classification Problem Mixture Likelihood Approach to Clustering Identifiability Likelihood Estimation for Mixture Models via EM Algorithm Start Values for EMm Algorithm Properties of Likelihood Estimators for Mixture Models Information Matrix for Mixture Models Tests for the Number of Components in a Mixture Partial Classification of the Data Classification Likelihood Approach to Clustering Mixture Models with Normal Components Likelihood Estimation for a Mixture of Normal Distribution Normal Homoscedastic Components Asymptotic Relative Efficiency of the Mixture Likelihood Approach Expected and Observed Information Matrices Assessment of Normality for Component Distributions: Partially Classified Data Assessment of Typicality: Partially Classified Data Assessment of Normality and Typicality: Unclassified Data Robust Estimation for Mixture Models Applications of Mixture Models to Two-Way Data Sets Introduction Clustering of Hemophilia Data Outliers in Darwins Data Clustering of Rare Events Latent Classes of Teaching Styles Estimation of Mixing Proportions Introduction Likelihood Estimation Discriminant Analysis Estimator Asymptotic Relative Efficiency of Discriminant Analysis Estimator Moment Estimators Minimum Distance Estimators Case Study Homogeneity of Mixing Proportions Assessing the Performance of the Mixture Likelihood Approach to Clustering Introduction Estimators of the Allocation Rates Bias Correction of the Estimated Allocation Rates Estimated Allocation Rates of Hemophilia Data Estimated Allocation Rates for Simulated Data Other Methods of Bias Corrections Bias Correction for Estimated Posterior Probabilities Partitioning of Treatment Means in ANOVA Introduction Clustering of Treatment Means by the Mixture Likelihood Approach Fitting of a Normal Mixture Model to a RCBD with Random Block Effects Some Other Methods of Partitioning Treatment Means Example 1 Example 2 Example 3 Example 4 Mixture Likelihood Approach to the Clustering of Three-Way Data Introduction Fitting a Normal Mixture Model to Three-Way Data Clustering of Soybean Data Multidimensional Scaling Approach to the Analysis of Soybean Data References Appendix


Proceedings of the National Academy of Sciences of the United States of America | 2002

Selection bias in gene extraction on the basis of microarray gene-expression data

Christophe Ambroise; Geoffrey J. McLachlan

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.


Applied statistics | 1987

On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture

Geoffrey J. McLachlan

An important but difficult problem in practice is assessing the number of components g in a mixture. An obvious way of proceeding is to use the likelihood ratio test statistic A{ to test for the smallest value of g consistent with the data. Unfortunately with mixture models, regularity conditions do not hold for -2 log A, to have it usual asymptotic null distribution of chi-squared. In this paper the role of the bootstrap is highlighted for the assessment of the null distribution of -2 log A{ for the test of a single normal density versus a mixture of two normal densities in the univariate case.


Bioinformatics | 2002

A mixture model-based approach to the clustering of microarray expression data

Geoffrey J. McLachlan; Richard Bean; David Peel

MOTIVATION This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. RESULTS The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. AVAILABILITY EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/


Statistics and Computing | 2000

Robust mixture modelling using the t distribution

David Peel; Geoffrey J. McLachlan

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.


Archive | 2008

The EM Algorithm and Extensions: Second Edition

Geoffrey J. McLachlan; Thriyambakam Krishnan

The EM algorithm and extensions 2008 , The EM algorithm and extensions 2008 , کتابخانه دیجیتال جندی شاپور اهواز


Computational Statistics & Data Analysis | 2003

Modelling high-dimensional data by mixtures of factor analyzers

Geoffrey J. McLachlan; David Peel; Richard Bean

We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments.


Proceedings of the National Academy of Sciences of the United States of America | 2012

Conservation and divergence in Toll-like receptor 4-regulated gene expression in primary human versus mouse macrophages

Kate Schroder; Katharine M. Irvine; Martin S. Taylor; Nilesh J. Bokil; Kim-Anh Lê Cao; Kelly-Anne Masterman; Larisa I. Labzin; Colin A. Semple; Ronan Kapetanovic; Lynsey Fairbairn; Altuna Akalin; Geoffrey J. Faulkner; John Kenneth Baillie; Milena Gongora; Carsten O. Daub; Hideya Kawaji; Geoffrey J. McLachlan; Nick Goldman; Sean M. Grimmond; Piero Carninci; Harukazu Suzuki; Yoshihide Hayashizaki; Boris Lenhard; David A. Hume; Matthew J. Sweet

Evolutionary change in gene expression is generally considered to be a major driver of phenotypic differences between species. We investigated innate immune diversification by analyzing interspecies differences in the transcriptional responses of primary human and mouse macrophages to the Toll-like receptor (TLR)–4 agonist lipopolysaccharide (LPS). By using a custom platform permitting cross-species interrogation coupled with deep sequencing of mRNA 5′ ends, we identified extensive divergence in LPS-regulated orthologous gene expression between humans and mice (24% of orthologues were identified as “divergently regulated”). We further demonstrate concordant regulation of human-specific LPS target genes in primary pig macrophages. Divergently regulated orthologues were enriched for genes encoding cellular “inputs” such as cell surface receptors (e.g., TLR6, IL-7Rα) and functional “outputs” such as inflammatory cytokines/chemokines (e.g., CCL20, CXCL13). Conversely, intracellular signaling components linking inputs to outputs were typically concordantly regulated. Functional consequences of divergent gene regulation were confirmed by showing LPS pretreatment boosts subsequent TLR6 responses in mouse but not human macrophages, in keeping with mouse-specific TLR6 induction. Divergently regulated genes were associated with a large dynamic range of gene expression, and specific promoter architectural features (TATA box enrichment, CpG island depletion). Surprisingly, regulatory divergence was also associated with enhanced interspecies promoter conservation. Thus, the genes controlled by complex, highly conserved promoters that facilitate dynamic regulation are also the most susceptible to evolutionary change.

Collaboration


Dive into the Geoffrey J. McLachlan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sharon X. Lee

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

David Peel

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

K. E. Basford

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Richard Bean

University of Queensland

View shared research outputs
Top Co-Authors

Avatar

Kui Wang

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christophe Ambroise

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge