Nick Littlestone | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nick Littlestone is active.

Explore More

Publication

Featured researches published by Nick Littlestone.

Information & Computation | 1994

The weighted majority algorithm

Nick Littlestone; Manfred K. Warmuth

We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case where the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log |A| + m) mistakes on that sequence, where c is fixed constant.

Machine Learning | 1988

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Nick Littlestone

Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.

foundations of computer science | 1987

Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm

Nick Littlestone

Valiant and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss on-line learning of these functions. In on-line learning, the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in the on-line setting is the number of mistakes the learner makes. For suitable classes of functions, on-line learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm, which learns disjunctive Boolean functions, and variants of the algorithm for learning other classes of Boolean functions. The algorithm can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes that it makes is relatively little affected by the presence of large numbers of irrelevant attributes in the examples; we show that the number of mistakes grows only logarithmically with the number of irrelevant attributes. At the same time, the algorithm is computationaUy time and space efficient.

conference on learning theory | 1991

Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow

Nick Littlestone

First we study the performance of the linear-threshold algorithm, Winnow, in the presence of attribute errors in the data available for learning. We study how such errors affect bounds on the number of mistakes that Winnow will make. We allow the errors to be generated by an adversary. In the presence of such errors the bounds for Winnow retain the logarithmic dependence on the number of relevant attributes that they have in the noise free case. There is an additional additive term proportional to a weighted sum of the number of errors occurring in each relevant attribute during a sequence of trials. We next examine probabilistic mistake bounds that can be obtained by making stronger assumptions about the instances seen by the learner. We are particularly interested in models of noisy redundant information. The situation that we consider is motivated by the case where there are two prototypical instances each described by a number of attributes, one prototype with classification 1 and the other with classification 0. The instances seen by the learner are noisy versions of these prototypes. The noise affecting irrelevant attributes can be arbitrary (even generated by an adversary), but we assume that the relevant attributes are affected by random noise independently of one another. We obtain bounds (holding with high probability) that grow in proportion to the number of trials. In a typical case the constant of proportionality that we obtain decreases exponentially with the number of independent relevant attributes.

conference on learning theory | 1997

General convergence results for linear discriminant updates

Adam J. Grove; Nick Littlestone; Dale Schuurmans

Information & Computation | 1994

Predicting {0, 1}-functions on randomly drawn points

David Haussler; Nick Littlestone; Manfred K. Warmuth

Abstract We consider the problem of predicting {0, 1}-valued functions on Rn and smaller domains, based on their values on randomly drawn points. Our model is related to Valiant′s PAC learning model, but does not require the hypotheses used for prediction to be represented in any specified form. In our main result we show how to construct prediction strategies that are optimal to within a constant factor for any reasonable class F of target functions. This result is based on new combinatorial results about classes of functions of finite VC dimension. We also discuss more computationally efficient algorithms for predicting indicator functions of axis-parallel rectangles, more general intersection closed concept classes, and halfspaces in Rn. These are also optimal to within a constant factor. Finally, we compare the general performance of prediction strategies derived by our method to that of those derived from methods in PAC learning theory.

conference on learning theory | 1999

The robustness of the p -norm algorithms

Claudio Gentile; Nick Littlestone

We consider two on-line learning frameworks: binary classification through linear threshold functions and linear regression. We study a family of on-line algorithms, called p-norm algorithms, introduced by Grove, Littlestone and Schuurmans in the context of deterministic binary classification. We show how to adapt these algorithms for use in the regression setting, and prove worst-case bounds on the square loss, using a technique from Kivinen and Warmuth. As pointed out by Grove, et al., these algorithms can be made to approach a version of the classification algorithm Winnow as p goes to infinity; similarly they can be made to approach the corresponding regression algorithm EG in the limit. Winnow and EG are notable for having loss bounds that grow only logarithmically in the dimension of the instance space. Here we describe another way to use the p-norm algorithms to achieve this logarithmic behavior. With the way to use them that we propose, it is less critical than with Winnow and EG to retune the parameters of the algorithm as the learning task changes. Since the correct setting of the parameters depends on characteristics of the learning task that are not typically known a priori by the learner, this gives the p-norm algorithms a desireable robustness. Our elaborations yield various new loss bounds in these on-line settings. Some of these bounds improve or generalize known results. Others are incomparable.

conference on learning theory | 1989

From on-line to batch learning

Nick Littlestone

We contrast on-line and batch settings for concept learning, and describe an on-line learning model in which no probabilistic assumptions are made. We briefly mention some of our recent results pertaining to on-line learning algorithms developed using this model. We then turn to the main topic, which is an analysis of a conversion to improve the performance of on-line learning algorithms in a batch setting. For the batch setting we use the PAC-learning model. The conversion is straightforward, consisting of running the given on-line algorithm, collecting the hypotheses it uses for making predictions, and then choosing the hypothesis among them that does the best in a subsequent hypothesis testing phase. We have developed an analysis, using a version of Chernoff bounds applied to supermartingales, that shows that for some target classes the converted algorithm will be asymptotically optimal.

Machine Learning | 2001

General Convergence Results for Linear Discriminant Updates

Adam J. Grove; Nick Littlestone; Dale Schuurmans

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge.Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

Journal of Computer and System Sciences | 1995

Learning in the Presence of Finitely or Infinitely Many Irrelevant Attributes

Avrim Blum; Lisa Hellerstein; Nick Littlestone

This paper addresses the problem of learning boolean functions in query and mistake-bound models in the presence of irrelevant attributes. In learning a concept, a learner may observe a great many more attributes than those that the concept depends upon, and in some sense the presence of extra, irrelevant attributes does not change the underlying concept being learned. Because of this, we are interested not only in the learnability of concept classes, but also in whether the classes can be learned by an algorithm that is attribute-efficient in that the dependence of the mistake bound (or number of queries) on the number of irrelevant attributes is low. The results presented here apply to projection and embedding-closed (p.e.c.) concept classes. We show that if a p.e.c. class is learnable attribute-efficiently in the mistake-bound model, then it is learnable in the infinite-attribute mistake-bound model as well. We show in addition how to convert any algorithm that learns a p.e.c. dass in the mistake-bound model with membership queries into an algorithm that learns the class attribute-efficiently in that model, or even in the infinite attribute version. In the membership query only model we show that learnability does not always imply attribute-efficient learnability for deterministic algorithms. However, we describe a large class of functions, including the set of monotone functions, for which learnability does imply attribute-efficient learnability in this model.

Explore More