Shai Ben-David | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shai Ben-David is active.

Explore More

Publication

Featured researches published by Shai Ben-David.

very large data bases | 2004

Detecting change in data streams

Daniel Kifer; Shai Ben-David; Johannes Gehrke

Detecting changes in a data stream is an important area of research with many applications. In this paper, we present a novel method for the detection and estimation of change. In addition to providing statistical guarantees on the reliability of detected changes, our method also provides meaningful descriptions and quantification of these changes. Our approach assumes that the points in the stream are independently generated, but otherwise makes no assumptions on the nature of the generating distribution. Thus our techniques work for both continuous and discrete data. In an experimental study we demonstrate the power of our techniques.

Machine Learning | 2010

A theory of learning from different domains

Shai Ben-David; John Blitzer; Koby Crammer; Alex Kulesza; Fernando Pereira; Jennifer Wortman Vaughan

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?We address the first question by bounding a classifier’s target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.

Journal of the ACM | 1997

Scale-sensitive dimensions, uniform convergence, and learnability

Noga Alon; Shai Ben-David; Nicolò Cesa-Bianchi; David Haussler

Learnability in Valiants PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distribution-free convergence property of means to expectations uniformly over classes of random variables. Classes of real-valued functions enjoying such a property are also known as uniform Glivenko-Cantelli classes. In this paper, we prove, through a generalization of Sauers lemma that may be interesting in its own right, a new characterization of uniform Glivenko-Cantelli classes. Our characterization yields Dudley, Gine´, and Zinns previous characterization as a corollary. Furthermore, it is the first based on a Gine´, and Zinns previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the Vapnik-Chervonenkis dimension. We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or “agnostic”) framework. Furthermore, we find a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire. These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learners hypothesis class.

conference on learning theory | 2003

Exploiting Task Relatedness for Multiple Task Learning

Shai Ben-David; Reba Schuller

The approach of learning of multiple “related” tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point for previous work on multiple task learning has been that the tasks to be learned jointly are somehow “algorithmically related”, in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar. We offer an alternative approach, defining relatedness of tasks on the basis of similarity between the example generating distributions that underline these task.

symposium on the theory of computing | 1990

On the power of randomization in online algorithms

Shai Ben-David; Richard M. Karp; Gábor Tardos; Avi Wigderson

Against in adaptive adversary, we show that the power of randomization in on-line algorithms is severely limited! We prove the existence of an efficient “simulation” of randomized on-line algorithms by deterministic ones, which is best possible in general. The proof of the upper bound is existential. We deal with the issue of computing the efficient deterministic algorithm, and show that this is possible in very general cases.

Algorithmica | 1994

On the power of randomization in on-line algorithms

Shai Ben-David; Richard M. Karp; Gábor Tardos; Avi Wigderson

Algorithmica | 1994

A new measure for the study of on-line algorithms

Shai Ben-David

An accepted measure for the performance of an on-line algorithm is the “competitive ratio“ introduced by Sleator and Tarjan. This measure is well motivated and has led to the development of a mathematical theory for on-line algorithms.We investigate the behavior of this measure with respect to memory needs and benefits of lookahead and find some counterintuitive features. We present lower bounds on the size of memory devoted to recording the past. It is also observed that the competitive ratio reflects no improvement in the performance of an on-line algorithm due to any (finite) amount of lookahead.We offer an alternative measure that exhibits a different and, in some respects, more intuitive behavior. In particular, we demonstrate the use of our new measure by analyzing the tradeoff between the amortized cost of on-line algorithms for the paging problem and the amount of lookahead available to them. We also derive on-line algorithms for theK-server problem on any bounded metric space, which, relative to the new measure, are optimal among all on-line algorithms (up to a factor of 2) and are within a factor of 2K from the optimal off-line performance.

conference on learning theory | 2007

Stability of k-means clustering

Shai Ben-David; Dávid Pál; Hans Ulrich Simon

We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal solutions to the clustering optimization problem. Our results complement earlier work of Ben-David, von Luxburg and Pal, by settling the main problem left open there. Our analysis shows that, for probability distributions with finite support, the stability of k-means clusterings depends solely on the number of optimal solutions to the underlying optimization problem for the data distribution. These results challenge the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.

conference on learning theory | 2006

Learning bounds for support vector machines with learned kernels

Nathan Srebro; Shai Ben-David

Consider the problem of learning a kernel for use in SVM classification. We bound the estimation error of a large margin classifier when the kernel, relative to which this margin is defined, is chosen from a family of kernels based on the training sample. For a kernel family with pseudodimension d Φ , we present a bound of √O(d Φ , +1/γ 2 )/n on the estimation error for SVMs with margin γ. This is the first bound in which the relation between the margin term and the family-of-kernels term is additive rather then multiplicative. The pseudodimension of families of linear combinations of base kernels is the number of base kernels. Unlike in previous (multiplicative) bounds, there is no non-negativity requirement on the coefficients of the linear combinations. We also give simple bounds on the pseudodimension for families of Gaussian kernels.

conference on learning theory | 1997

Learning distributions by their density levels: a paradigm for learning without a teacher

Shai Ben-David; Michael Lindenbaum

We propose a mathematical model for learning the high-density areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the computational learnability literature, we believe that this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a few framework of unsupervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We investigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. We prove that classes whose VC-dimension is finite are learnable in a very strong sense, while on the other hand,�-covering numbers of a concept class impose lower bounds on the sample size needed for learning in our models. One direction of the proof involves a reduction of the density-level learnability to PAC learning with respect to fixed distributions (as well as some fundamental statistical lower bounds), while the sufficiency condition is proved through the introduction of a generic learning algorithm.

Explore More