Philip D. Laird | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philip D. Laird is active.

Explore More

Publication

Featured researches published by Philip D. Laird.

Machine Learning | 1988

Learning From Noisy Examples

Dana Angluin; Philip D. Laird

The basic question addressed in this paper is: how can a learning algorithm cope with incorrect training examples? Specifically, how can algorithms that produce an “approximately correct” identification with “high probability” for reliable data be adapted to handle noisy data? We show that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average. In this setting we are able to estimate the rate of noise using only the knowledge that the rate is less than one half. The basic ideas extend to other types of random noise as well. We also show that the search problem associated with this strategy is intractable in general. However, for particular classes of rules the target rule may be efficiently identified if we use techniques specific to that class. For an important class of formulas – the k-CNF formulas studied by Valiant – we present a polynomial-time algorithm that identifies concepts in this form when the rate of classification errors is less than one half.

Archive | 1988

Learning from good and bad data

Philip D. Laird

I Identification in the Limit from Indifferent Teachers.- 1 The Identification Problem.- 1.1 Learning from Indifferent Teachers.- 1.2 A Working Assumption.- 1.3 Convergence.- 1.4 A General Strategy.- 1.5 Examples from Existing Research.- 1.6 Basic Definitions.- 1.7 A General Algorithm.- 1.8 Additional Comments.- 2 Identification by Refinement.- 2.1 Order Homomorphisms.- 2.2 Refinements.- 2.2.1 Introduction.- 2.2.2 Upward and Downward Refinements.- 2.2.3 Summary.- 2.3 Identification by Refinement.- 2.4 Conclusion.- 3 How to Work With Refinements.- 3.1 Introduction.- 3.2 Three Useful Properties.- 3.3 Normal Forms and Monotonic Operations.- 3.4 Universal Refinements.- 3.4.1 Abstract Formulation.- 3.4.2 A Refinement for Clause-Form Sentences.- 3.4.3 Inductive Bias.- 3.5 Conclusions.- 3.6 Appendix to Chapter 3.- 3.6.1 Summary of Logic Notation and Terminology.- 3.6.2 Proof of Theorem 3.32.- 3.6.3 Refinement Properties of Figure 3.2.- II Probabilistic Identification from Random Examples.- 4 Probabilistic Approximate Identification.- 4.1 Probabilistic Identification in the Limit.- 4.2 The Model of Valiant.- 4.2.1 Pac-Identification.- 4.2.2 Identifying Normal-Form Expressions.- 4.2.3 Related Results about Valiants Model.- 4.3 Using the Partial Order.- 4.4 Summary.- 5 Identification from Noisy Examples.- 5.1 Introduction.- 5.2 Prior Research Results.- 5.3 The Classification Noise Process.- 5.4 Pac-Identification.- 5.4.1 Finite Classes.- 5.4.2 Infinite Classes.- 5.4.3 Estimating the Noise Rate ?.- 5.5 Probabilistic Identification in the Limit.- 5.6 Identifying Normal-Form Expressions.- 5.7 Other Models of Noise.- 5.8 Appendix to Chapter 5.- 6 Conclusions.

Machine Learning | 1994

Discrete Sequence Prediction and Its Applications

Philip D. Laird; Ronald Saul

Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We present a simple and practical algorithm (TDAG) for discrete sequence prediction. Based on a text-compression method, the TDAG algorithm limits the growth of storage by retaining the most likely prediction contexts and discarding (forgetting) less likely ones. The storage/speed tradeoffs are parameterized so that the algorithm can be used in a variety of applications. Our experiments verify its performance on data compression tasks and show how it applies to two problems: dynamically optimizing Prolog programs for good average-case behavior and maintaining a cache for a database on mass storage.

algorithmic learning theory | 1993

Identifying and Using Patterns in Sequential Data

Philip D. Laird

Whereas basic machine learning research has mostly viewed input data as an unordered random sample from a population, researchers have also studied learning from data whose input sequence follows a regular sequence. To do so requires that we regard the input data as a stream and identify regularities in the data values as they occur. In this brief survey I review three sequential-learning problems, examine some new, and not-so-new, algorithms for learning from sequences, and give applications for these methods. The three generic problems I discuss are: Predicting sequences of discrete symbols generated by stochastic processes. Learning streams by extrapolation from a general rule. Learning to predict time series.

Archive | 1988

The Identification Problem

Philip D. Laird

This chapter introduces the identification problem and reviews some of the ways it has been studied in the literature. We offer a formal definition for the identification problem and present a familiar, but fundamental, algorithm for identification in the limit.

world congress on computational intelligence | 1994

Automated feature extraction for supervised learning

Philip D. Laird; Ronald Saul

Feature extraction has traditionally been a manual process and something of an art. Methods derived from statistics and linear systems theory have been proposed, but by general consensus effective feature extraction remains a difficult problem. Recently W. Tackett (1993) showed that genetic programming (GP) can be effective in automatically constructing features for identifying potential targets in digital images with high accuracy. From a basis set of simple arithmetic functions, he was able to construct numerical features that outperformed manually-constructed features when used as inputs to several classifiers, including a binary-tree classifier and a multi-layer perceptron trained by back-propagation. Seeking a more generic feature-construction procedure, we developed a GP-based algorithm to extract features in a variety of domains and for most classification methods, including decision trees, feed-forward neural networks, and Bayesian classifiers. We have tested the technique with success by extracting features for three different types of problems: Boolean functions with binary features, a NASA telemetry problem with multiple classes and real-valued time-series inputs, and a wine variety classification problem with real-valued features from the UCI Machine Learning repository. We formally define the feature-construction method and show in some detail how it applies to specific classification problems.<<ETX>>

conference on learning theory | 1993

A model of sequence extrapolation

Philip D. Laird; Ronald Saul; Peter Dunning

We study sequence extrapolation as an abstract learning problem. The task is to learn a stream—a semi-infinite sequence (s1, sz, . . . , Sn, . . .) of values all of the same data type-from a finite initial segment (sl, S2, . . .,s~ ), We assume that all elements of the stream are of the same type (e.g., integers, strings, etc.). In order to represent the hypotheses, we define a language for streams called elementary stream descrzptzons and present an algorithm that learns in the limit elementary streams over an extensive family of data types. The complexity of the algorithm depends on the type and on a stream property known as the delay. In general the complexity is exponential or worse, but for streams with bounded delay over freely generated types the algorithm runs in time polynomial in the size of the examples. Sample size analysis is difficult, but for streams of delay 2 over ap plicative pairs, we calculate exactly the sample size required in the worst case to identify the stream uniquely. This bound helps explain why sequence extrapolation requires so few examples compared to statistical learning.

Machine Learning | 1991

A “ PAC ” Algorithm for Making Feature Maps

Philip D. Laird; Evan Gamble

Kohonen and others have devised network algorithms for computing so-called topological feature maps. We describe a new algorithm, called the CDF-Inversion(CDFI) Algorithm, that can be used to learn feature maps and, in the process, approximate an unknown probability distribution to within any specified accuracy. The primary advantages of the algorithm over previous feature-map algorithms are that it is simple enough to analyze mathematically for correctness and efficiency, and that it distributes the points of the map evenly, in a sense that can be made rigorous. Like other vector-quantization algorithms it is potentially useful for many applications, including monitoring and statistical modeling. While not a network algorithm, the CDFI algorithm is well-suited to implementation on parallel computers.

Archive | 1988

Identification from Noisy Examples

Philip D. Laird

When some of the training examples may be incorrect, none of the foregoing identification strategies are effective: With algorithms based on identification by enumeration (including refinement algorithms), an incorrect example may cause a correct hypothesis to be discarded. Even if correct hypotheses occur infinitely often in the enumeration, errors can cause them to be discarded infinitely often, and thereby frustrate convergence. With pac-identification, the fundamental strategy of choosing a hypothesis consistent with the examples may fail, because there may be no such hypothesis.

Archive | 1988

Identification by Refinement

Philip D. Laird

We have seen that many identification procedures work by choosing a hypothesis and then generalizing or specializing it in response to counterexamples. We shall now formalize this idea. The key concept is that of a refinement operator — this term was used by Shapiro, but the concept is ubiquitous.

Explore More