Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John Shawe-Taylor is active.

Publication


Featured researches published by John Shawe-Taylor.


Cambridge University Press (2000) | 2000

An introduction to Support Vector Machines

Nello Cristianini; John Shawe-Taylor

This book is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. The book also introduces Bayesian analysis of learning and relates SVMs to Gaussian Processes and other kernel based learning methods. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc. Their first introduction in the early 1990s lead to a recent explosion of applications and deepening theoretical analysis, that has now established Support Vector Machines along with neural networks as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and application of these techniques. The concepts are introduced gradually in accessible and self-contained stages, though in each stage the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally the book will equip the practitioner to apply the techniques and an associated web site will provide pointers to updated literature, new applications, and on-line software.


Neural Computation | 2001

Estimating the Support of a High-Dimensional Distribution

Bernhard Schölkopf; John Platt; John Shawe-Taylor; Alexander J. Smola; Robert C. Williamson

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a simple subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.


Journal of Machine Learning Research | 2002

Text classification using string kernels

Huma Lodhi; Craig Saunders; John Shawe-Taylor; Nello Cristianini; Chris Watkins

We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results.


neural information processing systems | 2001

On Kernel-Target Alignment

Nello Cristianini; John Shawe-Taylor; André Elisseeff; Jaz S. Kandola

We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction.


Machine Learning | 2002

Linear Programming Boosting via Column Generation

Ayhan Demiriz; Kristin P. Bennett; John Shawe-Taylor

We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to construct a learning function in the label space that minimizes misclassification error and maximizes the soft margin. We prove that for classification, minimizing the 1-norm soft margin error function directly optimizes a generalization error bound. The equivalent linear program can be efficiently solved using column generation techniques developed for large-scale optimization problems. The resulting LPBoost algorithm can be used to solve any LP boosting formulation by iteratively optimizing the dual misclassification costs in a restricted LP and dynamically generating weak hypotheses to make new LP columns. We provide algorithms for soft margin classification, confidence-rated, and regression boosting problems. Unlike gradient boosting algorithms, which may converge in the limit only, LPBoost converges in a finite number of iterations to a global solution satisfying mathematically well-defined optimality conditions. The optimal solutions of LPBoost are very sparse in contrast with gradient based methods. Computationally, LPBoost is competitive in quality and computational cost to AdaBoost.


international conference on machine learning | 2005

The 2005 PASCAL visual object classes challenge

Mark Everingham; Andrew Zisserman; Christopher K. I. Williams; Luc Van Gool; Moray Allan; Christopher M. Bishop; Olivier Chapelle; Navneet Dalal; Thomas Deselaers; Gyuri Dorkó; Stefan Duffner; Jan Eichhorn; Jason Farquhar; Mario Fritz; Christophe Garcia; Thomas L. Griffiths; Frédéric Jurie; Daniel Keysers; Markus Koskela; Jorma Laaksonen; Diane Larlus; Bastian Leibe; Hongying Meng; Hermann Ney; Bernt Schiele; Cordelia Schmid; Edgar Seemann; John Shawe-Taylor; Amos J. Storkey; Sandor Szedmak

The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.


international conference on machine learning | 2001

Latent Semantic Kernels

Nello Cristianini; John Shawe-Taylor; Huma Lodhi

Kernel methods like support vector machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representation of two documents, in analogy with classical information retrieval (IR) approaches.Latent semantic indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost.In this paper we describe how the LSI approach can be implemented in a kernel-defined feature space.We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.


Machine Learning | 2011

Sparse canonical correlation analysis

David R. Hardoon; John Shawe-Taylor

We present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is compared to alternative sparse solutions as well as demonstrated on paired corpuses for mate-retrieval. We are able to observe, in the mate-retrieval, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features.


Monthly Notices of the Royal Astronomical Society | 2010

Results of the GREAT08 Challenge: an image analysis competition for cosmological lensing

Sarah Bridle; Sreekumar T. Balan; Matthias Bethge; Marc Gentile; Stefan Harmeling; Catherine Heymans; Michael Hirsch; Reshad Hosseini; M. Jarvis; D. Kirk; Thomas D. Kitching; Konrad Kuijken; Antony Lewis; Stephane Paulin-Henriksson; Bernhard Schölkopf; Malin Velander; Lisa Voigt; Dugan Witherick; Adam Amara; G. M. Bernstein; F. Courbin; M. S. S. Gill; Alan Heavens; Rachel Mandelbaum; Richard Massey; Baback Moghaddam; A. Rassat; Alexandre Refregier; Jason Rhodes; Tim Schrabback

We present the results of the Gravitational LEnsing Accuracy Testing 2008 (GREAT08) Challenge, a blind analysis challenge to infer weak gravitational lensing shear distortions from images. The primary goal was to stimulate new ideas by presenting the problem to researchers outside the shear measurement community. Six GREAT08 Team methods were presented at the launch of the Challenge and five additional groups submitted results during the 6-month competition. Participants analyzed 30 million simulated galaxies with a range in signal-to-noise ratio, point spread function ellipticity, galaxy size and galaxy type. The large quantity of simulations allowed shear measurement methods to be assessed at a level of accuracy suitable for currently planned future cosmic shear observations for the first time. Different methods perform well in different parts of simulation parameter space and come close to the target level of accuracy in several of these. A number of fresh ideas have emerged as a result of the Challenge including a re-examination of the process of combining information from different galaxies, which reduces the dependence on realistic galaxy modelling. The image simulations will become increasingly sophisticated in future GREAT Challenges, meanwhile the GREAT08 simulations remain as a benchmark for additional developments in shear measurement algorithms.


international conference on neural information processing | 2013

Challenges in Representation Learning: A Report on Three Machine Learning Contests

Ian J. Goodfellow; Dumitru Erhan; Pierre Carrier; Aaron C. Courville; Mehdi Mirza; Ben Hamner; Will Cukierski; Yichuan Tang; David Thaler; Dong-Hyun Lee; Yingbo Zhou; Chetan Ramaiah; Fangxiang Feng; Ruifan Li; Xiaojie Wang; Dimitris Athanasakis; John Shawe-Taylor; Maxim Milakov; John Park; Radu Tudor Ionescu; Marius Popescu; Cristian Grozea; James Bergstra; Jingjing Xie; Lukasz Romaszko; Bing Xu; Zhang Chuang; Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.

Collaboration


Dive into the John Shawe-Taylor's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zakria Hussain

University College London

View shared research outputs
Top Co-Authors

Avatar

Martin Anthony

London School of Economics and Political Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tom Diethe

University College London

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge