Is this you? Create Your Porfile

Maxim Gurevich

Technion – Israel Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maxim Gurevich is active.

Explore More

Publication

Featured researches published by Maxim Gurevich.

Journal of the ACM | 2008

Random sampling from a search engine's index

Ziv Bar-Yossef; Maxim Gurevich

We revisit a problem introduced by Bharat and Broder almost a decade ago: How to sample random pages from the corpus of documents indexed by a search engine, using only the search engines public interfaceq Such a primitive is particularly useful in creating objective benchmarks for search engines. The technique of Bharat and Broder suffers from a well-recorded bias: it favors long documents. In this article we introduce two novel sampling algorithms: a lexicon-based algorithm and a random walk algorithm. Our algorithms produce biased samples, but each sample is accompanied by a weight, which represents its bias. The samples, in conjunction with the weights, are then used to simulate near-uniform samples. To this end, we resort to four well-known Monte Carlo simulation methods: rejection sampling, importance sampling, the Metropolis--Hastings algorithm, and the Maximum Degree method. The limited access to search engines force our algorithms to use bias weights that are only “approximate”. We characterize analytically the effect of approximate bias weights on Monte Carlo methods and conclude that our algorithms are guaranteed to produce near-uniform samples from the search engines corpus. Our study of approximate Monte Carlo methods could be of independent interest. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long documents. We use our algorithms to collect comparative statistics about the corpora of the Google, MSN Search, and Yahooe search engines.

Computer Networks | 2009

Brahms: Byzantine resilient random membership sampling

Edward Bortnikov; Maxim Gurevich; Idit Keidar; Gabriel Kliot; Alexander Shraer

We present Brahms, an algorithm for sampling random nodes in a large dynamic system prone to malicious behavior. Brahms stores small membership views at each node, and yet overcomes Byzantine attacks by a linear portion of the system. Brahms is composed of two components. The first is an attack-resilient gossip-based membership protocol. The second component extracts independent uniformly random node samples from the stream of node ids gossiped by the first. We evaluate Brahms using rigorous analysis, backed by simulations, which show that our theoretical model captures the protocols essentials. We study two representative attacks, and show that with high probability, an attacker cannot create a partition between correct nodes. We further prove that each nodes sample converges to an independent uniform one over time. To our knowledge, no such properties were proven for gossip protocols in the past.

very large data bases | 2008

Mining search engine query logs via suggestion sampling

Ziv Bar-Yossef; Maxim Gurevich

Many search engines and other web applications suggest auto-completions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query logs, directories, and lexicons. These databases consist of interesting and useful information, but they are typically not directly accessible. In this paper we describe two algorithms for sampling suggestions using only the public suggestion interface. One of the algorithms samples suggestions uniformly at random and the other samples suggestions proportionally to their popularity. These algorithms can be used to mine the hidden suggestion databases. Example applications include comparison of popularity of given keywords within a search engines query log, estimation of the volume of commercially-oriented queries in a query log, and evaluation of the extent to which a search engine exposes its users to negative content. Our algorithms employ Monte Carlo methods in order to obtain unbiased samples from the suggestion database. Empirical analysis using a publicly available query log demonstrates that our algorithms are efficient and accurate. Results of experiments on two major suggestion services are also provided.

principles of distributed computing | 2008

Brahms: byzantine resilient random membership sampling

Edward Bortnikov; Maxim Gurevich; Idit Keidar; Gabriel Kliot; Alexander Shraer

We present Brahms, an algorithm for sampling random nodes in a large dynamic system prone to malicious behavior. Brahms stores small membership views at each node, and yet overcomes Byzantine attacks by a linear portion of the system. Brahms is composed of two components. The first one is a resilient gossip-based membership protocol. The second one uses a novel memory-efficient approach for uniform sampling from a possibly biased stream of ids that traverse the node. We evaluate Brahms using rigorous analysis, backed by simulations, which show that our theoretical model captures the protocols essentials. We study two representative attacks, and show that with high probability, an attacker cannot create a partition between correct nodes. We further prove that each nodes sample converges to a uniform one over time. To our knowledge, no such properties were proven for gossip protocols in the past.

international world wide web conferences | 2007

Efficient search engine measurements

Ziv Bar-Yossef; Maxim Gurevich

We address the problem of measuring global quality met-rics of search engines, like corpus size, index freshness, anddensity of duplicates in the corpus. The recently proposedestimators for such metrics [2, 6] suffer from significant biasand/or poor performance, due to inaccurate approximationof the so called .document degrees..We present two new estimators that are able to overcomethe bias introduced by approximate degrees. Our estimatorsare based on a careful implementation of an approximateimportance sampling procedure. Comprehensive theoreti-cal and empirical analysis of the estimators demonstratesthat they have essentially no bias even in situations wheredocument degrees are poorly approximated.Building on an idea from [6], we discuss Rao Blackwelliza-tion as a generic method for reducing variance in searchengine estimators. We show that Rao-Blackwellizing ourestimators results in significant performance improvements,while not compromising accuracy.

principles of distributed computing | 2009

Correctness of gossip-based membership under message loss

Maxim Gurevich; Idit Keidar

Due to their simplicity and effectiveness, gossip-based membership protocols have become the method of choice for maintaining partial membership in large P2P systems. A variety of gossip-based membership protocols were proposed. Some were shown to be effective empirically, lacking analytic understanding of their properties. Others were analyzed under simplifying assumptions, such as lossless and delay-less network. It is not clear whether the analysis results hold in dynamic networks where both nodes and network links can fail. In this paper we try to bridge this gap. We first enumerate the desirable properties of a gossip-based membership protocol, such as view uniformity, independence, and load balance. We then propose a simple Send & Forget protocol, and show that even in the presence of message loss, it achieves the desirable properties.

international world wide web conferences | 2009

Estimating the impressionrank of web pages

Ziv Bar-Yossef; Maxim Gurevich

The ImpressionRank of a web page (or, more generally, of a web site) is the number of times users viewed the page while browsing search results. ImpressionRank captures the visibility of pages and sites in search engines and is thus an important measure, which is of interest to web site owners, competitors, market analysts, and end users. All previous approaches to estimating the ImpressionRank of a page rely on privileged access to private data sources, like the search engines query log. In this paper we present the first external algorithm for estimating the ImpressionRank of a web page. This algorithm relies on access to three public data sources: the search engine, the query suggestion service of the search engine, and the web. In addition, the algorithm is local and uses modest resources. It can therefore be used by almost any party to estimate the ImpressionRank of any page on any search engine. En route to estimating the ImpressionRank of a page, our algorithm solves a novel variant of the keyword extraction problem: it finds the most popular search keywords that drive impressions of a page. Empirical analysis of the algorithm on the Google and Yahoo! search engines indicates that it is accurate and provides interesting insights about sites and search queries.

Journal of Functional Analysis | 2012

Subproduct systems over N×N

Maxim Gurevich

Abstract We develop the theory of subproduct systems over the monoid N × N , and the non-self-adjoint operator algebras associated with them. These are double sequences of Hilbert spaces { X ( m , n ) } m , n = 0 ∞ equipped with a multiplication given by coisometries from X ( i , j ) ⊗ X ( k , l ) to X ( i + k , j + l ) . We find that the character space of the norm-closed algebra generated by left multiplication operators (the tensor algebra) is homeomorphic to a complex homogeneous affine algebraic variety intersected with a unit ball. Certain conditions are isolated under which subproduct systems whose tensor algebras are isomorphic must be isomorphic themselves. In the absence of these conditions, we show that two numerical invariants must agree on such subproduct systems. Additionally, we classify the subproduct systems over N × N by means of ideals in algebras of non-commutative polynomials.

Forum Mathematicum | 2018

On two questions concerning representations distinguished by the Galois involution

Maxim Gurevich; Jia-Jun Ma; Arnab Mitra

Abstract Let E / F {E/F} be a quadratic extension of non-archimedean local fields of characteristic 0. In this paper, we investigate two approaches which attempt to describe the irreducible smooth representations of GL n ⁢ ( E ) {\mathrm{GL}_{n}(E)} that are distinguished by its subgroup GL n ⁢ ( F ) {\mathrm{GL}_{n}(F)} . One relates this class to representations which come as base change lifts from a quasi-split unitary group over F, while another deals with a certain symmetry condition. By characterizing the union of images of the base change maps, we show that these two approaches are closely related. Using this observation, we are able to prove a statement relating base change and distinction for ladder representations. We then produce a wide family of examples in which the symmetry condition does not impose GL n ⁢ ( F ) {\mathrm{GL}_{n}(F)} -distinction, and thus exhibit the limitations of these two approaches.

Mathematische Zeitschrift | 2015