Is this you? Create Your Porfile

Fabiano C. Botelho

Universidade Federal de Minas Gerais

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabiano C. Botelho is active.

Explore More

Publication

Featured researches published by Fabiano C. Botelho.

workshop on algorithms and data structures | 2007

Simple and space-efficient minimal perfect hash functions

Fabiano C. Botelho; Rasmus Pagh; Nivio Ziviani

A perfect hash function (PHF) h : U → [0,m - 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n2/m bits, where n = |S|. In this paper we present new algorithms for construction and evaluation of PHFs of a given set (for m = n and m = 1.23n), with the following properties: Evaluation of a PHF requires constant time.

Lecture Notes in Computer Science | 2005

A practical minimal perfect hashing method

Fabiano C. Botelho; Yoshiharu Kohayakawa; Nivio Ziviani

We propose a novel algorithm based on random graphs to construct minimal perfect hash functions h. For a set of n keys, our algorithm outputs h in expected time O(n). The evaluation of h(x) requires two memory accesses for any key x and the description of h takes up 1.15n words. This improves the space requirement to 55% of a previous minimal perfect hashing scheme due to Czech, Havas and Majewski. A simple heuristic further reduces the space requirement to 0.93n words, at the expense of a slightly worse constant in the time complexity. Large scale experimental results are presented.

Information Systems | 2013

Practical perfect hashing in nearly optimal space

Fabiano C. Botelho; Rasmus Pagh; Nivio Ziviani

A hash function is a mapping from a key universe U to a range of integers, i.e., h:U@?{0,1,...,m-1}, where m is the ranges size. A perfect hash function for some set S@?U is a hash function that is one-to-one on S, where m>=|S|. A minimal perfect hash function for some set S@?U is a perfect hash function with a range of minimum size, i.e., m=|S|. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n+o(n) to 3.3n+o(n) bits, and for m=1.23n it ranges from 1.95n+o(n) to 2.7n+o(n) bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46min using a commodity PC. An open source implementation of the algorithms is available at http://cmph.sf.net under the GNU Lesser General Public License (LGPL).

Information Sciences | 2011

Minimal perfect hashing: A competitive method for indexing internal memory

Fabiano C. Botelho; Anisio Lacerda; Guilherme Vale Menezes; Nivio Ziviani

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values. Since no collisions occur, each key can be retrieved from a hash table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are widely used for memory efficient storage and fast retrieval of items from static sets. Differently from other hashing schemes, MPHFs completely avoid the problem of wasted space and wasted time to deal with collisions. Until recently, the amount of space to store an MPHF description for practical implementations found in the literature was O(logn) bits per key and therefore similar to the overhead of space of other hashing schemes. Recent results on MPHFs presented in the literature changed this scenario: an MPHF can now be described by approximately 2.6 bits per key. The objective of this paper is to show that MPHFs are, after the new recent results, a good option to index internal memory when static key sets are involved and both successful and unsuccessful searches are allowed. We have shown that MPHFs provide the best tradeoff between space usage and lookup time when compared with other open addressing and chaining hash schemes such as linear hashing, quadratic hashing, double hashing, dense hashing, cuckoo hashing, sparse hashing, hopscotch hashing, chaining with move to front heuristic and exact fit. We considered lookup time for successful and unsuccessful searches in two scenarios: (i) the MPHF description fits in the CPU cache and (ii) the MPHF description does not fit entirely in the CPU cache. Considering lookup time, the minimal perfect hashing outperforms the other hashing schemes in the two scenarios and, in the first scenario, the performance is better even when the compared methods leave more than 80% of the hash table entries free. Considering space overhead (the amount of used space other than the key-value pairs), the minimal perfect hashing is within a factor of O(logn) bits lower than the other hashing schemes for both scenarios.

scalable information systems | 2008

Distributed perfect hashing for very large key sets

Fabiano C. Botelho; Daniel Galinkin; Wagner Meira; Nivio Ziviani

A perfect hash function (PHF) h: S → [0, m -- 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC. The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.

conference on information and knowledge management | 2007