Murray A. Jorgensen
University of Waikato
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Murray A. Jorgensen.
Communications in Statistics-theory and Methods | 2004
William J. Reed; Murray A. Jorgensen
Abstract A family of probability densities, which has proved useful in modelling the size distributions of various phenomens, including incomes and earnings, human settlement sizes, oil-field volumes and particle sizes, is introduced. The distribution, named herein as the double Pareto-lognormal or dPlN distribution, arises as that of the state of a geometric Brownian motion (GBM), with lognormally distributed initial state, after an exponentially distributed length of time (or equivalently as the distribution of the killed state of such a GBM with constant killing rate). A number of phenomena can be viewed as resulting from such a process (e.g., incomes, settlement sizes), which explains the good fit. Properties of the distribution are derived and estimation methods discussed. The distribution exhibits Paretian power-law) behaviour in both tails, and when plotted on logarithmic axes, its density exhibits hyperbolic-type behaviour.
Computational Statistics & Data Analysis | 2003
Lynette A. Hunt; Murray A. Jorgensen
One difficulty with classification studies is unobserved or missing observations that often occur in multivariate datasets. The mixture likelihood approach to clustering has been well developed and is much used, particularly for mixtures where the component distributions are multivariate normal. It is shown that this approach can be extended to analyse data with mixed categorical and continuous attributes and where some of the data are missing at random in the sense of Little and Rubin (Statistical Analysis with Mixing Data, Wiley, New York).
Australian & New Zealand Journal of Statistics | 1999
Lynette A. Hunt; Murray A. Jorgensen
Hunt (1996) implemented the finite mixture model approach to clustering in a program called MULTIMIX. The program is designed to cluster multivariate data that have categorical and continuous variables and that possibly contain missing values. This paper describes the approach taken to design MULTIMIX and how some of the statistical problems were dealt with. As an example, the program is used to cluster a large medical dataset.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2011
Lynette A. Hunt; Murray A. Jorgensen
Mixture model clustering proceeds by fitting a finite mixture of multivariate distributions to data, the fitted mixture density then being used to allocate the data to one of the components. Common model formulations assume that either all the attributes are continuous or all the attributes are categorical. In this paper, we consider options for model formulation in the more practical case of mixed data: multivariate data sets that contain both continuous and categorical attributes.
The Computer Journal | 2008
Murray A. Jorgensen; Geoffrey J. McLachlan
We describe the Snob program for unsupervised learning as it has evolved from its beginning in the 1960s until its present form. Snob uses the minimum message length principle expounded in Wallace and Freeman (Wallace, C.S. and Freeman, P.R. (1987) Estimation and inference by Compact coding. J. Roy. Statist. Soc. Ser. B, 49, 240– 252.) and we indicate how Snob estimates class parameters using the approach of that paper. We will survey the evolution of Snob from these beginnings to the state that it has reached as described by Wallace and Dowe (Wallace, C.S. and Dowe, D.L. (2000) MMM mixture modelling of multi-state, Poisson, Von Mises Circular and Gaussian distributions. Stat. Comput., 10, 73–83.) We pay particular attention to the revision of Snob in the 1980s where definite assignment of things to classes was abandoned.
Biometrika | 1993
Murray A. Jorgensen
SUMMARY Hampels influence function and its finite-sample counterparts are the basis for a number of diagnostic statistics. These diagnostics can be expensive to compute in the natural way when the estimation calculations are iterative, as they frequently are when maximum likelihood or robust methods are used. We show how the influence function can be calculated in these situations by implicit differentiation of the fixed-point equation satisfied by the limit of the iterative procesb. We consider in particular the cases of Newtons method and iteratively reweighted least squares where interesting analytic results are available. As an application we consider the generalization of Pregibons (1981) logistic regression diagnostics to cover generalized linear models with noncanonical link functions such as probit regression.
Statistics and Computing | 1999
Murray A. Jorgensen
We investigate the use of a dynamic form of the EM algorithm to estimate proportions in finite mixtures of known distributions. We prove a consistency result for this algorithm, which employs only a single EM update for each new observation. Our aim is to demonstrate that the slow convergence rate of the EM algorithm in many applications is of little practical consequence in a situation when data is frequently being updated.
modeling and optimization in mobile, ad-hoc and wireless networks | 2009
Scott Raynel; Anthony James McGregor; Murray A. Jorgensen
Low power devices such as common wireless router platforms are not capable of performing reliable full packet capture due to resource constraints. In order for such devices to be used to perform link-level measurement on IEEE 802.11 networks, a packet sampling technique is required in order to reliably capture a representative sample of frames. The traditional Berkeley Packet Filter mechanism found in UNIX-like operating systems does not directly support packet sampling as it provides no way of generating pseudo-random numbers and does not allow a filter program to keep state between invocations. This paper explores the use of the IEEE 802.11 Frame Check Sequence as a source of pseudo-random numbers for use when deciding whether to sample a packet. This theory is tested by analysing the distribution of Frame Check Sequences from a large, real world capture. Finally, a BPF program fragment is presented which can be used to efficiently select packets for sampling.
Journal of Hydrology | 1999
W.E. Bardsley; Murray A. Jorgensen; Pinhas Alpert; T Ben-Gai
Abstract Regression analysis is usually the statistical tool of choice in hydrological studies when there is a strong correlation between two variables. However, weak correlations can also be of interest if a region within the scatter plot is data-free. This could direct attention to seeking some underlying physical process that might create regions with low probability of generating data points. A necessary prior requirement here is to verify that the data-free area in the plot is sufficiently large to be a real effect and not a visual illusion. This check can be most simply carried out in a hypothesis-testing framework. A permutation approach to hypothesis testing is suggested for the particular case where a data-free region occupies one of the corners of a scatter plot, and a test statistic Δ is presented for testing the statistical significance of the size of this “empty corner”. Application to some rainfall data from southern Israel shows that the new test can sometimes yield higher levels of statistical significance than linear regression when applied to the same data.
Communications in Statistics-theory and Methods | 1999
Murray A. Jorgensen
Approximating a parameter estimator that is not a linear function of the regression response vector y by one that is suggests a generalization beyond the original linear model context of a class of model-robust dispersion estimates proposed by Glasbey (1988). The dispersion estimators that we propose can be applied whenever parameters are estimated by the iteratively reweighted least squares algorithm, regardless of the theoretical motivation for using this algorithm. We compare the performance of several such estimators in a robustified probit regression on overdispersed binomial data by simulation.