Noa Avigdor-Elgrabli
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Noa Avigdor-Elgrabli.
SIAM Journal on Computing | 2012
Nir Ailon; Noa Avigdor-Elgrabli; Edo Liberty; Anke van Zuylen
In this work we study the problem of bipartite correlation clustering (BCC), a natural bipartite counterpart of the well-studied correlation clustering (CC) problem [N. Bansal, A. Blum, and S. Chawla, Machine Learning, 56 (2004), pp. 89--113], also referred to as graph editing [R. Shamir, R. Sharan, and D. Tsur, Discrete Appl. Math., 144 (2004), pp. 173--182]. Given a bipartite graph, the objective of BCC is to generate a set of vertex disjoint bicliques (clusters) that minimizes the symmetric difference to the original graph. The best-known approximation algorithm for BCC due to Amit [N. Amit, The Bicluster Graph Editing Problem, Masters Thesis, Tel Aviv University, Tel Aviv, Israel, 2004] guarantees an
conference on information and knowledge management | 2016
Noa Avigdor-Elgrabli; Mark Cwalinski; Dotan Di Castro; Iftah Gamzu; Irena Grabovitch-Zuyev; Liane Lewin-Eytan; Yoelle Maarek
11
international colloquium on automata, languages and programming | 2015
Noa Avigdor-Elgrabli; Sungjin Im; Benjamin Moseley; Yuval Rabani
-approximation ratio. In this paper we present two algorithms. The first is a linear program based
international world wide web conferences | 2015
Yael Anava; Noa Avigdor-Elgrabli; Iftah Gamzu
4
conference on information and knowledge management | 2018
Noa Avigdor-Elgrabli; Roei Gelbhart; Irena Grabovitch-Zuyev; Ariel Raviv
-approximation algorithm. Like the previous approximation algorithm, it requires solving a large convex problem, which becomes prohibitive even for modestly sized tasks. The second algorithm, and our...
european symposium on algorithms | 2011
Nir Ailon; Noa Avigdor-Elgrabli; Edo Liberty; Anke van Zuylen
Several recent studies have presented different approaches for clustering and classifying machine-generated mail based on email headers. We propose to expand these approaches by considering email message bodies. We argue that our approach can help increase coverage and precision in several tasks, and is especially critical for mail extraction. We remind that mail extraction supports a variety of mail mining applications such as ad re-targeting, mail search, and mail summarization. We introduce new structural clustering methods that leverage the HTML structure that is common to messages generated by a same mass-sender script. We discuss how such structural clustering can be conducted at different levels of granularity, using either strict or flexible matching constraints, depending on the use cases. We present large scale experiments carried over real Yahoo mail traffic. For our first use case of automatic mail extraction, we describe novel flexible-matching clustering methods that meet the key requirements of high intra-cluster similarity, adequate clusters size, and relatively small overall number of clusters. We identify the precise level of flexibility that is needed in order to achieve extremely high extraction precision (close to 100%), while producing relatively small number of clusters. For our second use case, namely, mail classification, we show that strict structural matching is more adequate, achieving precision and recall rates between 85%-90%, while converging to a stable classification after a short learning cycle. This represents an increase of 10%-20% compared to the sender-based method described in previous work, when run over the same period length. Our work has been deployed in production in Yahoo mail backend.
symposium on discrete algorithms | 2010
Noa Avigdor-Elgrabli; Yuval Rabani
Reordering buffer management (RBM) is an elegant theoretical model that captures the tradeoff between buffer size and switching costs for a variety of reordering/sequencing problems. In this problem, colored items arrive over time, and are placed in a buffer of size \(k\). When the buffer becomes full, an item must be removed from the buffer. A penalty cost is incurred each time the sequence of removed items switches colors. In the non-uniform cost model, there is a weight \(w_c\) associated with each color \(c\), and the cost of switching to color \(c\) is \(w_c\). The goal is to minimize the total cost of the output sequence, using the buffer to rearrange the input sequence.
symposium on discrete algorithms | 2013
Noa Avigdor-Elgrabli; Yuval Rabani
We study a natural generalization of the correlation clustering problem to graphs in which the pairwise relations between objects are categorical instead of binary. This problem was recently introduced by Bonchi et al. under the name of chromatic correlation clustering, and is motivated by many real-world applications in data-mining and social networks, including community detection, link classification, and entity de-duplication. Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of correlation clustering. This result significantly progresses the current state of knowledge for the problem, improving on a previous result that only guaranteed linear approximation in the input size. We complement the above result by developing a linear programming-based algorithm that achieves an improved approximation ratio of 4. Although this algorithm cannot be considered to be practical, it further extends our theoretical understanding of chromatic correlation clustering. We also present a fast heuristic algorithm that is motivated by real-life scenarios in which there is a ground-truth clustering that is obscured by noisy observations. We test our algorithms on both synthetic and real datasets, like social networks data. Our experiments reinforce the theoretical findings by demonstrating that our algorithms generally outperform previous approaches, both in terms of solution cost and reconstruction of an underlying ground-truth clustering.
conference on recommender systems | 2015
Michal Aharon; Oren Anava; Noa Avigdor-Elgrabli; Dana Drachsler-Cohen; Shahar Golan; Oren Somekh
In the typical state of an ever growing mailbox, it becomes essential to assist the user to better organize and quickly look up the content of his electronic life. Our work addresses this challenge, by identifying related messages within a users mailbox. We study the notion of semantic relatedness between email messages and aim to offer the user with a wider context of the message he selects or reads. The context is represented by a small set of messages that are semantically related to the given message. We conduct experiments on a large-scale mail dataset obtained from a major Web mail service and demonstrate the effectiveness of our model in this task.
foundations of computer science | 2013
Noa Avigdor-Elgrabli; Yuval Rabani
In this work we study the problem of Bipartite Correlation Clustering (BCC), a natural bipartite counterpart of the well studied Correlation Clustering (CC) problem. Given a bipartite graph, the objective of BCC is to generate a set of vertex-disjoint bi-cliques (clusters) which minimizes the symmetric difference to it. The best known approximation algorithm for BCC due to Amit (2004) guarantees an 11-approximation ratio. In this paper we present two algorithms. The first is an improved 4-approximation algorithm. However, like the previous approximation algorithm, it requires solving a large convex problem which becomes prohibitive even for modestly sized tasks. The second algorithm, and our main contribution, is a simple randomized combinatorial algorithm. It also achieves an expected 4-approximation factor, it is trivial to implement and highly scalable. The analysis extends a method developed by Ailon, Charikar and Newman in 2008, where a randomized pivoting algorithm was analyzed for obtaining a 3-approximation algorithm for CC. For analyzing our algorithm for BCC, considerably more sophisticated arguments are required in order to take advantage of the bipartite structure. Whether it is possible to achieve (or beat) the 4-approximation factor using a scalable and deterministic algorithm remains an open problem.