Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ziv Bar-Yossef is active.

Publication


Featured researches published by Ziv Bar-Yossef.


Journal of Computer and System Sciences | 2004

An information statistics approach to data stream and communication complexity

Ziv Bar-Yossef; T. S. Jayram; Ravi Kumar; D. Sivakumar

We present a new method for proving strong lower bounds in communication complexity. This method is based on the notion of the conditional information complexity, of a function which is the minimum amount of information about the inputs that has to be revealed by a communication protocol for the function. While conditional information complexity is a lower bound on communication complexity, we show that it also admits a direct sum theorem. Direct sum decomposition reduces our task to that of proving conditional information complexity lower bounds for simple problems (such as the AND of two bits). For the latter, we develop novel techniques based on Hellinger distance and its generalizations.Our paradigm leads to two main results: (1) An improved lower bound for the multi-party set-disjointness problem in the general communication complexity model, and a nearly optimal lower bound in the one-way communication model. As a consequence, we show that for any real k > 2, approximating the kth frequency moment in the data stream model requires essentially Ω(n1-2/k) space; this resolves a conjecture of Alon et al. (J. Comput. System Sci. 58(1) (1999) 137).(2) A lower bound for the Lp approximation problem in the general communication model: this solves an open problem of Saks and Sun (in: Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), 2002, pp. 360-369). As a consequence, we show that for p > 2, approximating the Lp norm to within a factor of ne in the data stream model with constant number of passes requires Ω(n1-4e-2/p) space.


randomization and approximation techniques in computer science | 2002

Counting Distinct Elements in a Data Stream

Ziv Bar-Yossef; T. S. Jayram; Ravi Kumar; D. Sivakumar; Luca Trevisan

We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ?. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.


IEEE Transactions on Information Theory | 2011

Index Coding With Side Information

Ziv Bar-Yossef; T. S. Jayram; Tomer Kol

Motivated by a problem of transmitting supplemental data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R1,..., Rn. He holds an input x ∈ {0,01l}n and wishes to broadcast a single message so that each receiver Ri can recover the bit xi. Each Ri has prior side information about x, induced by a directed graph Grain nodes; Ri knows the bits of a; in the positions {j | (i,j) is an edge of G}.G is known to the sender and to the receivers. We call encoding schemes that achieve this goal INDEXcodes for {0,1}n with side information graph G. In this paper we identify a measure on graphs, the minrank, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes. We show that for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDEX codes. For arbitrary INDEX codes and arbitrary graphs, we obtain a lower bound in terms of the size of the maximum acyclic induced subgraph. This bound holds even for randomized codes, but has been shown not to be tight.


foundations of computer science | 2006

Index Coding with Side Information

Ziv Bar-Yossef; T. S. Jayram; Tomer Kol

Motivated by a problem of transmitting supplemental data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R1,..., Rn. He holds an input x ∈ {0,01l}n and wishes to broadcast a single message so that each receiver Ri can recover the bit xi. Each Ri has prior side information about x, induced by a directed graph Grain nodes; Ri knows the bits of a; in the positions {j | (i,j) is an edge of G}.G is known to the sender and to the receivers. We call encoding schemes that achieve this goal INDEXcodes for {0,1}n with side information graph G. In this paper we identify a measure on graphs, the minrank, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes. We show that for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDEX codes. For arbitrary INDEX codes and arbitrary graphs, we obtain a lower bound in terms of the size of the maximum acyclic induced subgraph. This bound holds even for randomized codes, but has been shown not to be tight.


international world wide web conferences | 2004

Sic transit gloria telae: towards an understanding of the web's decay

Ziv Bar-Yossef; Andrei Z. Broder; Ravi Kumar; Andrew Tomkins

The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on todays web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.


international world wide web conferences | 2011

Context-sensitive query auto-completion

Ziv Bar-Yossef; Naama Kraus

Query auto completion is known to provide poor predictions of the users query when her input prefix is very short (e.g., one or two characters). In this paper we show that context, such as the users recent queries, can be used to improve the prediction quality considerably even for such short prefixes. We propose a context-sensitive query auto completion algorithm, NearestCompletion, which outputs the completions of the users input that are most similar to the context queries. To measure similarity, we represent queries and contexts as high-dimensional term-weighted vectors and resort to cosine similarity. The mapping from queries to vectors is done through a new query expansion technique that we introduce, which expands a query by traversing the query recommendation tree rooted at the query. In order to evaluate our approach, we performed extensive experimentation over the public AOL query log. We demonstrate that when the recent users queries are relevant to the current query she is typing, then after typing a single character, NearestCompletions MRR is 48% higher relative to the MRR of the standard MostPopularCompletion algorithm on average. When the context is irrelevant, however, NearestCompletions MRR is essentially zero. To mitigate this problem, we propose HybridCompletion, which is a hybrid of NearestCompletion with MostPopularCompletion. HybridCompletion is shown to dominate both NearestCompletion and MostPopularCompletion, achieving a total improvement of 31.5% in MRR relative to MostPopularCompletion on average.


symposium on the theory of computing | 2001

Sampling algorithms: lower bounds and applications

Ziv Bar-Yossef; Ravi Kumar; D. Sivakumar

We develop a framework to study probabilistic sampling algorithms that approximate general functions of the form <italic>\genfunc</italic>, where <italic>\domain</italic> and <italic>\range</italic> are arbitrary sets. Our goal is to obtain lower bounds on the query complexity of functions, namely the number of input variables <italic>x_i</italic> that any sampling algorithm needs to query to approximate <italic>f(x_1,\ldots,x_n)</italic>. We define two quantitative properties of functions --- the <italic>it block sensitivity</italic> and the <italic>minimum Hellinger distance</italic> --- that give us techniques to prove lower bounds on the query complexity. These techniques are quite general, easy to use, yet powerful enough to yield tight results. Our applications include the mean and higher statistical moments, the median and other selection functions, and the frequency moments, where we obtain lower bounds that are close to the corresponding upper bounds. We also point out some connections between sampling and streaming algorithms and lossy compression schemes.


Journal of the ACM | 2008

Random sampling from a search engine's index

Ziv Bar-Yossef; Maxim Gurevich

We revisit a problem introduced by Bharat and Broder almost a decade ago: How to sample random pages from the corpus of documents indexed by a search engine, using only the search engines public interfaceq Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffers from a well-recorded bias: it favors long documents. In this article we introduce two novel sampling algorithms: a lexicon-based algorithm and a random walk algorithm. Our algorithms produce biased samples, but each sample is accompanied by a weight, which represents its bias. The samples, in conjunction with the weights, are then used to simulate near-uniform samples. To this end, we resort to four well-known Monte Carlo simulation methods: rejection sampling, importance sampling, the Metropolis--Hastings algorithm, and the Maximum Degree method. The limited access to search engines force our algorithms to use bias weights that are only “approximate”. We characterize analytically the effect of approximate bias weights on Monte Carlo methods and conclude that our algorithms are guaranteed to produce near-uniform samples from the search engines corpus. Our study of approximate Monte Carlo methods could be of independent interest. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long documents. We use our algorithms to collect comparative statistics about the corpora of the Google, MSN Search, and Yahooe search engines.


mobile ad hoc networking and computing | 2006

RaWMS -: random walk based lightweight membership service for wireless ad hoc network

Ziv Bar-Yossef; Roy Friedman; Gabriel Kliot

RaWMS is a novel lightweight random membership service for ad hoc networks. The service provides each node with a partial uniformly chosen view of network nodes. Such a membership service is useful, e.g., in data dissemination algorithms, lookup and discovery services, peer sampling services, and complete member-ship construction. The design of RaWMS is based on a novel re-verse random walk (RW) sampling technique. The paper includes a formal analysis of both the reverse RW sampling technique and RaWMS and verifies it through a detailed simulation study. In addition, RaWMS is compared with a number of other known methods such as flooding and gossip-based techniques.


international world wide web conferences | 2006

Do not crawl in the DUST: different URLs with similar text

Uri Schonfeld; Ziv Bar-Yossef; Idit Keidar

We consider the problem of dust : Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, translates URLs to some canonical form, and dynamically generates the same page from various different URL requests. We present a novel algorithm, DustBuster, for uncovering dust ; that is, for discovering rules for transforming a given URL to others that are likely to have similar content. DustBuster is able to detect dust effectively from previous crawl logs or web server logs, without examining page contents. Verifying these rules via sampling requires fetching few actual web pages. Search engines can benefit from this information to increase the effectiveness of crawling, reduce indexing overhead as well as improve the quality of popularity statistics such as PageRank.

Collaboration


Dive into the Ziv Bar-Yossef's collaboration.

Top Co-Authors

Avatar

Maxim Gurevich

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert Krauthgamer

Weizmann Institute of Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

D. Sivakumar

Vikram Sarabhai Space Centre

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Avi Wigderson

Institute for Advanced Study

View shared research outputs
Researchain Logo
Decentralizing Knowledge