David J. C. MacKay
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David J. C. MacKay.
Neural Computation | 1992
David J. C. MacKay
Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed that measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness.
Network: Computation In Neural Systems | 1995
David J. C. MacKay
Bayesian probability theory provides a unifying framework for data modelling. In this framework the overall aims are to find models that are well-matched to the data, and to use these models to make optimal predictions. Neural network learning is interpreted as an inference of the most probable parameters for the model, given the training data. The search in model space (i.e., the space of architectures, noise models, preprocessings, regularizers and weight decay constants) can then also be treated as an inference problem, in which we infer the relative probability of alternative models, given the data. This review describes practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks.
Proceedings of the NATO Advanced Study Institute on Learning in graphical models | 1998
David J. C. MacKay
This chapter describes a sequence of Monte Carlo methods: importance sampling, rejection sampling, the Metropolis method, and Gibbs sampling. For each method, we discuss whether the method is expected to be useful for high—dimensional problems such as arise in inference with graphical models. After the methods have been described, the terminology of Markov chain Monte Carlo methods is presented. The chapter concludes with a discussion of advanced methods, including methods for reducing random walk behaviour.
Neural Computation | 1994
Kenneth D. Miller; David J. C. MacKay
Models of unsupervised, correlation-based (Hebbian) synaptic plasticity are typically unstable: either all synapses grow until each reaches the maximum allowed strength, or all synapses decay to zero strength. A common method of avoiding these outcomes is to use a constraint that conserves or limits the total synaptic strength over a cell. We study the dynamic effects of such constraints. Two methods of enforcing a constraint are distinguished, multiplicative and subtractive. For otherwise linear learning rules, multiplicative enforcement of a constraint results in dynamics that converge to the principal eigenvector of the operator determining unconstrained synaptic development. Subtractive enforcement, in contrast, typically leads to a final state in which almost all synaptic strengths reach either the maximum or minimum allowed value. This final state is often dominated by weight configurations other than the principal eigenvector of the unconstrained operator. Multiplicative enforcement yields a graded receptive field in which most mutually correlated inputs are represented, whereas subtractive enforcement yields a receptive field that is sharpened to a subset of maximally correlated inputs. If two equivalent input populations (e.g., two eyes) innervate a common target, multiplicative enforcement prevents their segregation (ocular dominance segregation) when the two populations are weakly correlated; whereas subtractive enforcement allows segregation under these circumstances. These results may be used to understand constraints both over output cells and over input cells. A variety of rules that can implement constrained dynamics are discussed.
Archive | 2001
David J. C. MacKay; Matthew C. Davey
Gallager codes with large block length and low rate (e.g., N ≃ 10,000–40,000, R ≃ 0.25–0.5) have been shown to have record-breaking performance for low signal-to-noise applications. In this paper we study Gallager codes at the other end of the spectrum. We first explore the theoretical properties of binary Gallager codes with very high rates and observe that Gallager codes of any rate offer runlength-limiting properties at no additional cost.
Neural Computation | 1999
David J. C. MacKay
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models that include unknown hyperparameters such as regularization constants and noise levels. In the evidence framework, the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a gaussian approximation to the posterior distribution. In the alternative MAP method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a gaussian approximation is made. The similarities of the two approaches and their relative merits are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak, which causes signifi-cant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error under straightforward conditions. General lessons are drawn concerning inference in many dimensions.
IEEE Transactions on Information Theory | 2004
David J. C. MacKay; Graeme Mitchison; Paul McFadden
Sparse-graph codes appropriate for use in quantum error-correction are presented. Quantum error-correcting codes based on sparse graphs are of interest for three reasons. First, the best codes currently known for classical channels are based on sparse graphs. Second, sparse-graph codes keep the number of quantum interactions associated with the quantum error-correction process small: a constant number per quantum bit, independent of the block length. Third, sparse-graph codes often offer great flexibility with respect to block length and rate. We believe some of the codes we present are unsurpassed by previously published quantum error-correcting codes.
ICA | 2000
James W. Miskin; David J. C. MacKay
In this chapter, ensemble learning is applied to the problem of blind source separation and deconvolution of images. It is assumed that the observed images were constructed by mixing a set of images (consisting of independent, identically distributed pixels), convolving the mixtures with unknown blurring filters and then adding Gaussian noise.
Archive | 1996
David J. C. MacKay
Bayesian probability theory provides a unifying framework for data modeling. In this framework, the overall aims are to find models that are well matched to the data, and to use these models to make optimal predictions. Neural network learning is interpreted as an inference of the most probable parameters for the model, given the training data. The search in model space (i.e., the space of architectures, noise models, preprocessings, regularizers, and weight decay constants) also then can be treated as an inference problem, in which we infer the relative probability of alternative models, given the data. This provides powerful and practical methods for controlling, comparing, and using adaptive network models. This chapter describes numerical techniques based on Gaussian approximations for implementation of these methods.
IEEE Transactions on Biomedical Engineering | 2008
Oliver Stegle; Sebastian V. Fallert; David J. C. MacKay; Soren Brage
Heart rate data collected during nonlaboratory conditions present several data-modeling challenges. First, the noise in such data is often poorly described by a simple Gaussian; it has outliers and errors come in bursts. Second, in large-scale studies the ECG waveform is usually not recorded in full, so one has to deal with missing information. In this paper, we propose a robust postprocessing model for such applications. Our model to infer the latent heart rate time series consists of two main components: unsupervised clustering followed by Bayesian regression. The clustering component uses auxiliary data to learn the structure of outliers and noise bursts. The subsequent Gaussian process regression model uses the cluster assignments as prior information and incorporates expert knowledge about the physiology of the heart. We apply the method to a wide range of heart rate data and obtain convincing predictions along with uncertainty estimates. In a quantitative comparison with existing postprocessing methodology, our model achieves a significant increase in performance.