Christopher M. Bishop

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher M. Bishop is active.

Explore More

Publication

Featured researches published by Christopher M. Bishop.

Neural Computation | 1999

Mixtures of probabilistic principal component analyzers

Michael E. Tipping; Christopher M. Bishop

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

Neural Computation | 1998

GTM: the generative topographic mapping

Christopher M. Bishop; Markus Svensén; Christopher K. I. Williams

Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis, which is based on a linear transformation between the latent space and the data space. In this article, we introduce a form of nonlinear latent variable model called the generative topographic mapping, for which the parameters of the model can be determined using the expectation-maximization algorithm. GTM provides a principled alternative to the widely used self-organizing map (SOM) of Kohonen (1982) and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multiphase oil pipeline.

Review of Scientific Instruments | 1994

Neural networks and their applications

Christopher M. Bishop

Neural networks provide a range of powerful new techniques for solving problems in pattern recognition, data analysis, and control. They have several notable features including high processing speeds and the ability to learn the solution to a problem from a set of examples. The majority of practical applications of neural networks currently make use of two basic network models. We describe these models in detail and explain the various techniques used to train them. Next we discuss a number of key issues which must be addressed when applying neural networks to practical problems, and highlight several potential pitfalls. Finally, we survey the various classes of problem which may be addressed using neural networks, and we illustrate them with a variety of successful applications drawn from a range of fields. It is intended that this review should be accessible to readers with no previous knowledge of neural networks, and yet also provide new insights for those already making practical use of these techniques.

Neural Computation | 1995

Training with noise is equivalent to Tikhonov regularization

Christopher M. Bishop

It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that for the purposes of network training, the regularization term can be reduced to a positive semi-definite form that involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.

IEE Proceedings - Vision, Image, and Signal Processing | 1994

Novelty detection and neural network validation

Christopher M. Bishop

One of the key factors limiting the use of neural networks in many industrial applications has been the difficulty of demonstrating that a trained network will continue to generate reliable outputs once it is in routine use. An important potential source of errors arises from input data which differs significantly from that used to train the network. In this paper we investigate the relation between the degree of novelty of input data and the corresponding reliability of the output data. We provide a quantitative procedure for measuring novelty, and we demonstrate its performance using an application involving the monitoring of oil flow in multi-phase pipelines.

international conference on machine learning | 2005

The 2005 PASCAL visual object classes challenge

Mark Everingham; Andrew Zisserman; Christopher K. I. Williams; Luc Van Gool; Moray Allan; Christopher M. Bishop; Olivier Chapelle; Navneet Dalal; Thomas Deselaers; Gyuri Dorkó; Stefan Duffner; Jan Eichhorn; Jason Farquhar; Mario Fritz; Christophe Garcia; Thomas L. Griffiths; Frédéric Jurie; Daniel Keysers; Markus Koskela; Jorma Laaksonen; Diane Larlus; Bastian Leibe; Hongying Meng; Hermann Ney; Bernt Schiele; Cordelia Schmid; Edgar Seemann; John Shawe-Taylor; Amos J. Storkey; Sandor Szedmak

The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

Neural Computation | 1991

Improving the generalization properties of radial basis function neural networks

Christopher M. Bishop

An important feature of radial basis function neural networks is the existence of a fast, linear learning algorithm in a network capable of representing complex nonlinear mappings. Satisfactory generalization in these networks requires that the network mapping be sufficiently smooth. We show that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training. A simple example is used to demonstrate the resulting improvement in the generalization properties of the network.

American Journal of Respiratory and Critical Care Medicine | 2010

BEYOND ATOPY: MULTIPLE PATTERNS OF SENSITIZATION IN RELATION TO ASTHMA IN A BIRTH COHORT STUDY

Angela Simpson; Vincent Y. F. Tan; John Winn; Markus Svensén; Christopher M. Bishop; David Heckerman; Iain Buchan; Adnan Custovic

RATIONALE The pattern of IgE response (over time or to specific allergens) may reflect different atopic vulnerabilities which are related to the presence of asthma in a fundamentally different way from current definition of atopy. OBJECTIVES To redefine the atopic phenotype by identifying latent structure within a complex dataset, taking into account the timing and type of sensitization to specific allergens, and relating these novel phenotypes to asthma. METHODS In a population-based birth cohort in which multiple skin and IgE tests have been taken throughout childhood, we used a machine learning approach to cluster children into multiple atopic classes in an unsupervised way. We then investigated the relation between these classes and asthma (symptoms, hospitalizations, lung function and airway reactivity). MEASUREMENTS AND MAIN RESULTS A five-class model indicated a complex latent structure, in which children with atopic vulnerability were clustered into four distinct classes (Multiple Early [112/1053, 10.6%]; Multiple Late [171/1053, 16.2%]; Dust Mite [47/1053, 4.5%]; and Non-dust Mite [100/1053, 9.5%]), with a fifth class describing children with No Latent Vulnerability (623/1053, 59.2%). The association with asthma was considerably stronger for Multiple Early compared with other classes and conventionally defined atopy (odds ratio [95% CI]: 29.3 [11.1-77.2] versus 12.4 [4.8-32.2] versus 11.6 [4.8-27.9] for Multiple Early class versus Ever Atopic versus Atopic age 8). Lung function and airway reactivity were significantly poorer among children in Multiple Early class. Cox regression demonstrated a highly significant increase in risk of hospital admissions for wheeze/asthma after age 3 yr only among children in the Multiple Early class (HR 9.2 [3.5-24.0], P < 0.001). CONCLUSIONS IgE antibody responses do not reflect a single phenotype of atopy, but several different atopic vulnerabilities which differ in their relation with asthma presence and severity.

computer vision and pattern recognition | 2006

Principled Hybrids of Generative and Discriminative Models

Julia Lasserre; Christopher M. Bishop; Thomas P. Minka

When labelled training data is plentiful, discriminative techniques are widely used since they give excellent generalization performance. However, for large-scale applications such as object recognition, hand labelling of data is expensive, and there is much interest in semi-supervised techniques based on generative models in which the majority of the training data is unlabelled. Although the generalization performance of generative models can often be improved by ‘training them discriminatively’, they can then no longer make use of unlabelled data. In an attempt to gain the benefit of both generative and discriminative approaches, heuristic procedure have been proposed [2, 3] which interpolate between these two extremes by taking a convex combination of the generative and discriminative objective functions. In this paper we adopt a new perspective which says that there is only one correct way to train a given model, and that a ‘discriminatively trained’ generative model is fundamentally a new model [7]. From this viewpoint, generative and discriminative models correspond to specific choices for the prior over parameters. As well as giving a principled interpretation of ‘discriminative training’, this approach opens door to very general ways of interpolating between generative and discriminative extremes through alternative choices of prior. We illustrate this framework using both synthetic data and a practical example in the domain of multi-class object recognition. Our results show that, when the supply of labelled training data is limited, the optimum performance corresponds to a balance between the purely generative and the purely discriminative.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998

A hierarchical latent variable model for data visualization

Christopher M. Bishop; Michael E. Tipping

Visualization has proven to be a powerful and widely-applicable tool for the analysis and interpretation of multivariate data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space, it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and subclusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach on a toy data set, and we then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multiphase flows in oil pipelines, and to data in 36 dimensions derived from satellite images.

Explore More