Lisa Hellerstein
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lisa Hellerstein.
Algorithmica | 1988
Garth A. Gibson; Lisa Hellerstein; Richard M. Karp; Randy H. Katz; David A. Patterson
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.
Journal of the ACM | 1993
Dana Angluin; Lisa Hellerstein; Marek Karpinski
A read-once formula is a Boolean formula in which each variable occurs, at most, once. Such formulas are also called μ-formulas or Boolean trees. This paper treats the problem of exactly identifying an unknown read-once formula using specific kinds of queries. The main results are a polynomial-time algorithm for exact identification of monotone read-once formulas using only membership queries, and a polynomial-time algorithm for exact identification of general read-once formulas using equivalence and membership queries (a protocol based on the notion of a minimally adequate teacher [1]). The results of the authors improve on Valiants previous results for read-once formulas [26]. It is also shown, that no polynomial-time algorithm using only membership queries or only equivalence queries can exactly identify all read-once formulas.
Journal of the ACM | 1996
Lisa Hellerstein; Krishnan Pillaipakkamnatt; Vijay Raghavan; Dawn Wilkins
We investigate the query complexity of exact learning in the membership and (proper) equivalence query model. We give a complete characterization of concept classes that are learnable with a polynomial number of polynomial sized queries in this model. We give applications of this characterization, including results on learning a natural subclass of DNF formulas, and on learning with membership queries alone. Query complexity has previously been used to prove lower bounds on the time complexity of exact learning. We show a new relationship between query complexity and time complexity in exact learning: If any “honest” class is exactly and properly learnable with polynomial query complexity, but not learnable in polynomial time, then P = NP. In particular, we show that an honest class is exactly polynomial-query learnable if and only if it is learnable using an oracle for Γ p 4 .
Journal of Computer and System Sciences | 1995
Avrim Blum; Lisa Hellerstein; Nick Littlestone
This paper addresses the problem of learning boolean functions in query and mistake-bound models in the presence of irrelevant attributes. In learning a concept, a learner may observe a great many more attributes than those that the concept depends upon, and in some sense the presence of extra, irrelevant attributes does not change the underlying concept being learned. Because of this, we are interested not only in the learnability of concept classes, but also in whether the classes can be learned by an algorithm that is attribute-efficient in that the dependence of the mistake bound (or number of queries) on the number of irrelevant attributes is low. The results presented here apply to projection and embedding-closed (p.e.c.) concept classes. We show that if a p.e.c. class is learnable attribute-efficiently in the mistake-bound model, then it is learnable in the infinite-attribute mistake-bound model as well. We show in addition how to convert any algorithm that learns a p.e.c. dass in the mistake-bound model with membership queries into an algorithm that learns the class attribute-efficiently in that model, or even in the infinite attribute version. In the membership query only model we show that learnability does not always imply attribute-efficient learnability for deterministic algorithms. However, we describe a large class of functions, including the set of monotone functions, for which learnability does imply attribute-efficient learnability in this model.
european conference on information retrieval | 2005
Yuval Marton; Ning Wu; Lisa Hellerstein
Compression-based text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are character-based, and thus have the potential to automatically capture non-word features of a document, such as punctuation, word-stems, and features spanning more than one word. However, compression-based classification methods have drawbacks (such as slow running time), and not all such methods are equally effective. We present the results of a number of experiments designed to evaluate the effectiveness and behavior of different compression-based text classification methods on English text. Among our experiments are some specifically designed to test whether the ability to capture non-word (including super-word) features causes character-based text compression methods to achieve more accurate classification.
foundations of computer science | 1994
Aditi Dhagat; Lisa Hellerstein
We consider the problem of learning in the presence of irrelevant attributes in Valiants PAC model (1984). In the PAC model, the goal of the learner is to produce an approximately correct hypothesis from random sample data. If the number of relevant attributes in the target function is small, it may be desirable to produce a hypothesis that also depends on only a small number of variables. Haussler (1988) previously considered the problem of learning monomials of a small number of variables. He showed that the greedy set cover approximation algorithm can be used as a polynomial-time Occam algorithm for learning monomials on r of n variables. A outputs a monomial on r(ln q+1) variables, where q is the number of negative examples in the sample. We extend this result by showing that there is a polynomial-time Occam algorithm for learning k-term DNF formulas depending on r of n variables that outputs a DNF formula depending on O(r/sup k/log/sup k/q) variables, where q is the number of negative examples in the sample. We also give a polynomial-time Occam algorithm for learning decision lists (sometimes called 1-decision lists) with k alternations.<<ETX>>
foundations of computer science | 1992
Howard Aizenstein; Lisa Hellerstein; Leonard Pitt
A general technique is developed to obtain nonlearnability results in the model of exact learning from equivalence and membership queries. The technique is applied to show that, assuming NP not=co-NP, there does not exist a polynomial-time membership and equivalence query algorithm for exactly learning read-thrice DNF formulas-boolean formulas in disjunctive normal form where each variable appears at most three times. This result adds evidence to the conjecture that DNF is hard to learn in the membership and equivalence query model.<<ETX>>
conference on learning theory | 1995
Nader H. Bshouty; Thomas R. Hancock; Lisa Hellerstein
A formula is read-once if each variable appears on at most a single input. Previously, Angluin, Hellerstein, and Karpinski gave a polynomial time algorithm hat uses membership and equivalence queries to identify exactly read once boolean formulas over the basis {AND, OR, NOT}. In this paper we consider natural generalizations of this basis, and develop exact identification algorithms for more powerful classes of read-once formulas. We show that read-once formulas over the basis of arbitrary boolean functions of constant fan-in L (i.e., any ?: {0,1}1 ? c ? k ? {0,1}, where k is a constant) are exactly identifiable i polynomial time using membership and equivalence queries. We also show that read-once formulas over the basis of arbitrary symmetric boolean functions are exactly identifiable in polynomial time in this model. Given standard cryptographic assumptions, there is no polynomial time identification algorithm for read-twice formulas over either of these bases in the model. We further show that for any basis class B meeting certain technical conditions, any polynomial time identification algorithm for read-once formulas over B can be extended to a polynomial time identification algorithm for read-once formulas over the union of B and the arbitrary functions of constant fan-in. As a result, read-once formulas over the union of arbitrary symmetric and arbitrary constant fan-in gates are also exactly identifiable in polynomial time using membership and equivalence queries.
conference on learning theory | 1991
Thomas R. Hancock; Lisa Hellerstein
A formula is read-once if each variable in it occurs at most once. Angluin, Hellerstein, and Karpinski [AHK89] have shown that read-once formulas over the basis (AND, OR, NOT) are identifiable in polynomial time with membership and equivalence queries. We extend this result for boolean formulas to a larger basis including arbitrary threshold functions (generalizing AND and OR), NOT, parity, and functions computing congruence to a residue in some modulus up to a constant k. Note these functions are all symmetric, but are not all unate. We further examine arithmetic read-once formulas over multiplication and addition on an arbitrary field. We show these are identifiable in time polynomial in the number of variables using equivalence queries and the natural extension of membership queries to a non-boolean domain.
SIAM Journal on Computing | 1995
Nader H. Bshouty; Thomas R. Hancock; Lisa Hellerstein
A formula is read-once if each variable appears at most once in it. An arithmetic read-once formula is one in which the operators are addition, subtraction, multiplication, and division. We present polynomial time algorithms for exact learning of arithmetic read-once formulas over a field. We present a membership and equivalence query algorithm that identifies arithmetic read-once formulas over an arbitrary field. We present a randomized membership query algorithm (i. e. a randomized black box interpolation algorithm) that identifies such formulas over finite fields with at least