Ohad Shamir
Weizmann Institute of Science
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ohad Shamir.
international conference on machine learning | 2009
Ofer Dekel; Ohad Shamir
We consider a supervised machine learning scenario where labels are provided by a heterogeneous set of teachers, some of which are mediocre, incompetent, or perhaps even malicious. We present an algorithm, built on the SVM framework, that explicitly attempts to cope with low-quality and malicious teachers by decreasing their influence on the learning process. Our algorithm does not receive any prior information on the teachers, nor does it resort to repeated labeling (where each example is labeled by multiple teachers). We provide a theoretical analysis of our algorithm and demonstrate its merits empirically. Finally, we present a second algorithm with promising empirical results but without a formal analysis.
Machine Learning | 2010
Ofer Dekel; Ohad Shamir; Lin Xiao
A common assumption in supervised machine learning is that the training examples provided to the learning algorithm are statistically identical to the instances encountered later on, during the classification phase. This assumption is unrealistic in many real-world situations where machine learning techniques are used. We focus on the case where features of a binary classification problem, which were available during the training phase, are either deleted or become corrupted during the classification phase. We prepare for the worst by assuming that the subset of deleted and corrupted features is controlled by an adversary, and may vary from instance to instance. We design and analyze two novel learning algorithms that anticipate the actions of the adversary and account for them when training a classifier. Our first technique formulates the learning problem as a linear program. We discuss how the particular structure of this program can be exploited for computational efficiency and we prove statistical bounds on the risk of the resulting classifier. Our second technique addresses the robust learning problem by combining a modified version of the Perceptron algorithm with an online-to-batch conversion technique, and also comes with statistical generalization guarantees. We demonstrate the effectiveness of our approach with a set of experiments.
allerton conference on communication, control, and computing | 2014
Ohad Shamir; Nathan Srebro
We consider the problem of distributed stochastic optimization, where each of several machines has access to samples from the same source distribution, and the goal is to jointly optimize the expected objective w.r.t. the source distribution, minimizing: (1) overall runtime; (2) communication costs; (3) number of samples used. We study this problem systematically, highlighting fundamental limitations, and differences versus distributed consensus problems where each machine has a different, independent, objective. We show how the best known guarantees are obtained by an accelerated mini-batched SGD approach, and contrast the runtime and sample costs of the approach with those of other distributed optimization algorithms.
computer vision and pattern recognition | 2013
Baoyuan Liu; Fereshteh Sadeghi; Marshall F. Tappen; Ohad Shamir; Ce Liu
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper, we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
SIAM Journal on Computing | 2011
Shai Shalev-Shwartz; Ohad Shamir; Karthik Sridharan
We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly
Machine Learning | 2010
Ohad Shamir; Naftali Tishby
(\exp(L\log(L/\epsilon)))
Nucleic Acids Research | 2013
Amnon Amir; Amit Zeisel; Or Zuk; Michael Elgart; Shay Stern; Ohad Shamir; Peter J. Turnbaugh; Yoav Soen; Noam Shental
, for any distribution, where
IEEE Transactions on Information Theory | 2011
Nicolò Cesa-Bianchi; Shai Shalev-Shwartz; Ohad Shamir
L
Archive | 2012
Jonathan Rubin; Ohad Shamir; Naftali Tishby
is a Lipschitz constant (which can be thought of as the reciprocal of the margin), and the learned classifier is worse than the optimal halfspace by at most
SIAM Journal on Computing | 2017
Noga Alon; Nicolò Cesa-Bianchi; Claudio Gentile; Shie Mannor; Yishay Mansour; Ohad Shamir
\epsilon