Jérémy Barbay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jérémy Barbay is active.

Explore More

Publication

Featured researches published by Jérémy Barbay.

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms | 2006

Faster adaptive set intersections for text searching

Jérémy Barbay; Alejandro López-Ortiz; Tyler Lu

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and Lopez-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm.

international symposium on algorithms and computation | 2010

Alphabet Partitioning for Compressed Rank/Select and Applications

Jérémy Barbay; Travis Gagie; Gonzalo Navarro; Yakov Nekrich

We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH 0(s) + o(n)(H 0(s) + 1) bits, where H 0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time \(({\mathcal O}{{\rm lg lg}\sigma})\), and the select query in constant time. This result improves on previously known data structures using \(nH_0(s)+o(n\lg\sigma)\) bits, where on highly compressible instances the redundancy \(o(n\lg\sigma)\) cease to be negligible compared to the nH 0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing full-text self-index; (ii) compressed permutations π with times for π() and π − 1() improved to log-logarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets.

ACM Journal of Experimental Algorithms | 2009

An experimental investigation of set intersection algorithms for text searching

Jérémy Barbay; Alejandro López-Ortiz; Tyler Lu; Alejandro Salinger

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this article, we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, López-Ortiz, and Munro [ALENEX 2001] and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, we implement and test the intersection algorithm from Barbay and Kenyon [SODA 2002] and its randomized variant [SAGA 2003]. We consider both the random data set from Baeza-Yates and Salinger, the Google queries used by Demaine et al., a corpus provided by Google, and a larger corpus from the TREC Terabyte 2006 efficiency query stream, along with its own query log. We measure the performance both in terms of the number of comparisons and searches performed, and in terms of the CPU time on two different architectures. Our results confirm or improve the results from both previous studies in their respective context (comparison model on real data, and CPU measures on random data) and extend them to new contexts. In particular, we show that value-based search algorithms perform well in posting lists in terms of the number of comparisons performed.

latin american symposium on theoretical informatics | 2010

Compact rich-functional binary relation representations

Jérémy Barbay; Francisco Claude; Gonzalo Navarro

Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We show how to support more general operations efficiently, while taking better advantage of some forms of redundancy in practical instances. As a basis for a more general discussion on binary relation data structures, we list the operations of potential interest for practical applications, and give reductions between operations. We identify a set of operations that yield the support of all others. As a first contribution to the discussion, we present two data structures for binary relations, each of which achieves a distinct tradeoff between the space used to store and index the relation, the set of operations supported in sublinear time, and the time in which those operations are supported. The experimental performance of our data structures shows that they not only offer good time complexities to carry out many operations, but also take advantage of regularities that arise in practical instances to reduce space usage.

ACM Transactions on Algorithms | 2008

Alternation and redundancy analysis of the intersection problem

Jérémy Barbay; Claire Kenyon

The intersection of sorted arrays problem has applications in search engines such as Google. Previous work has proposed and compared deterministic algorithms for this problem, in an adaptive analysis based on the encoding size of a certificate of the result (cost analysis). We define the alternation analysis, based on the nondeterministic complexity of an instance. In this analysis we prove that there is a deterministic algorithm asymptotically performing as well as any randomized algorithm in the comparison model. We define the redundancy analysis, based on a measure of the internal redundancy of the instance. In this analysis we prove that any algorithm optimal in the redundancy analysis is optimal in the alternation analysis, but that there is a randomized algorithm which performs strictly better than any deterministic algorithm in the comparison model. Finally, we describe how these results can be extended beyond the comparison model.

Algorithmica | 2012

Succinct Representation of Labeled Graphs

Jérémy Barbay; Luca Castelli Aleardi; Meng He; J. Ian Munro

In many applications, the properties of an object being modeled are stored as labels on vertices or edges of a graph. In this paper, we consider succinct representation of labeled graphs. Our main results are the succinct representations of labeled and multi-labeled graphs (we consider planar triangulations, planar graphs and k-page graphs) to support various label queries efficiently. The additional space cost to store the labels is essentially the information-theoretic minimum. As far as we know, our representations are the first succinct representations of labeled graphs. We also have two preliminary results to achieve the main contribution. First, we design a succinct representation of unlabeled planar triangulations to support the rank/select of edges in ccw (counter clockwise) order in addition to the other operations supported in previous work. Second, we design a succinct representation for a k-page graph when k is large to support various navigational operations more efficiently. In particular, we can test the adjacency of two vertices in O(lg k) time, while previous work uses O(k) time.

ACM Transactions on Algorithms | 2011

Succinct indexes for strings, binary relations and multilabeled trees

Jérémy Barbay; Meng He; J. Ian Munro; Srinivasa Rao Satti

We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. The main advantage of succinct indexes as opposed to succinct (integrated data/index) encodings is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this article, we present succinct indexes for various data types, namely strings, binary relations and multilabeled trees. Given the support for the interface of the ADTs of these data types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data. Using our techniques, we design a succinct encoding that represents a string of length n over an alphabet of size σ using nHk(S) + lg σ · o(n) + O(n lg σ/lg lg lg σ) bits to support access/rank/select operations in o((lg lg σ)1+ε) time, for any fixed constant ε > 0. We also design a succinct text index using n H0(S) + O(n lg σ/lg lg σ) bits that supports finding all the occ occurrences of a given pattern of length m in O(m lg lg σ + occ lg n/lgε σ) time, for any fixed constant 0 < ε < 1. Previous results on these two problems either have a lg σ factor instead of lg lg σ in the running time, or are not compressed. Finally, we present succinct encodings of binary relations and multi-labeled trees that are more compact than previous structures.

Information & Computation | 2013

Compact binary relation representations with rich functionality

Jérémy Barbay; Francisco Claude; Gonzalo Navarro

Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are space-efficient but also efficiently support a large subset of the desired queries.

Information Processing Letters | 2014

Maximum-Weight Planar Boxes in O(n 2 ) Time (and Better)

Jérémy Barbay; Timothy M. Chan; Gonzalo Navarro; Pablo Pérez-Lantero

Given a set P of n points in R d , where each point p of P is associated with a weight w(p) (positive or negative), the Maximum-Weight Box problem consists in nding an axis-aligned box B maximizing P p2B\P w(p). We describe algorithms for this problem in two dimensions that run in the worst case inO(n 2 ) time, and much less on more specic classes of instances. In particular, these results imply similar ones for the Maximum Bichromatic Discrepancy Box problem. These improve by a factor of (log

international conference on stochastic algorithms: foundations and applications | 2003

Optimality of Randomized Algorithms for the Intersection Problem

Jérémy Barbay

The ”Intersection of sorted arrays” problem has applications in indexed search engines such as Google. Previous works propose and compare deterministic algorithms for this problem, and offer lower bounds on the randomized complexity in different models (cost model, alternation model).

Explore More