Susana Ladra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Susana Ladra is active.

Explore More

Publication

Featured researches published by Susana Ladra.

string processing and information retrieval | 2009

k2-Trees for Compact Web Graph Representation

Nieves R. Brisaboa; Susana Ladra; Gonzalo Navarro

This paper presents a Web graph representation based on a compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph. Our results show that our method is competitive with the best alternatives in the literature, offering a very good compression ratio (3.3---5.3 bits per link) while permitting fast navigation on the graph to obtain direct as well as reverse neighbors (2---15 microseconds per neighbor delivered). Moreover, it allows for extended functionality not usually considered in compressed graph representations.

Information Processing and Management | 2013

DACs: Bringing direct access to variable-length codes

Nieves R. Brisaboa; Susana Ladra; Gonzalo Navarro

We present a new variable-length encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in the encoded sequence without using asymptotically any extra space. We show some experiments demonstrating that the technique is not only simple, but also competitive in time and space with existing solutions in several applications, such as the representation of LCP arrays or high-order entropy-compressed sequences.

string processing and information retrieval | 2009

Directly Addressable Variable-Length Codes

Nieves R. Brisaboa; Susana Ladra; Gonzalo Navarro

We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i -th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to name just a few. We show experimentally that the technique offers a competitive alternative to other data structures that handle this problem.

international acm sigir conference on research and development in information retrieval | 2008

Reorganizing compressed text

Nieves R. Brisaboa; Antonio Fariña; Susana Ladra; Gonzalo Navarro

Recent research has demonstrated beyond doubts the benefits of compressing natural language texts using word-based statistical semistatic compression. Not only it achieves extremely competitive compression rates, but also direct search on the compressed text can be carried out faster than on the original text; indexing based on inverted lists benefits from compression as well. Such compression methods assign a variable-length codeword to each different text word. Some coding methods (Plain Huffman and Restricted Prefix Byte Codes) do not clearly mark codeword boundaries, and hence cannot be accessed at random positions nor searched with the fastest text search algorithms. Other coding methods (Tagged Huffman, End-Tagged Dense Code, or (s, c)-Dense Code) do mark codeword boundaries, achieving a self-synchronization property that enables fast search and random access, in exchange for some loss in compression effectiveness. In this paper, we show that by just performing a simple reordering of the target symbols in the compressed text (more precisely, reorganizing the bytes into a wavelet-treelike shape) and using little additional space, searching capabilities are greatly improved without a drastic impact in compression and decompression times. With this approach, all the codes achieve synchronism and can be searched fast and accessed at arbitrary points. Moreover, the reordered compressed text becomes an implicitly indexed representation of the text, which can be searched for words in time independent of the text length. That is, we achieve not only fast sequential search time, but indexed search time, for almost no extra space cost. We experiment with three well-known word-based compression techniques with different characteristics (Plain Huffman, End-Tagged Dense Code and Restricted Prefix Byte Codes), and show the searching capabilities achieved by reordering the compressed representation on several corpora. We show that the reordered versions are not only much more efficient than their classical counterparts, but also more efficient than explicit inverted indexes built on the collection, when using the same amount of space.

Information Retrieval | 2012

Implicit indexing of natural language text by reorganizing bytecodes

Nieves R. Brisaboa; Antonio Fariña; Susana Ladra; Gonzalo Navarro

Word-based byte-oriented compression has succeeded on large natural language text databases, by providing competitive compression ratios, fast random access, and direct sequential searching. We show that by just rearranging the target symbols of the compressed text into a tree-shaped structure, and using negligible additional space, we obtain a new implicitly indexed representation of the compressed text, where search times are drastically improved. The occurrences of a word can be listed directly, without any text scanning, and in general any inverted-index-like capability, such as efficient phrase searches, can be emulated without storing any inverted list information. We experimentally show that our proposal performs not only much more efficiently than sequential searches over compressed text, but also than explicit inverted indexes and other types of indexes, when using little extra space. Our representation is especially successful when searching for single words and short phrases.

mining and learning with graphs | 2010

A compact representation of graph databases

Sandra Álvarez; Nieves R. Brisaboa; Susana Ladra; Oscar Pedreira

Graph databases have emerged as an alternative data model with applications in many complex domains. Typically, the problems to be solved in such domains involve managing and mining huge graphs. The need for efficient processing in such applications has motivated the development of methods for graph compression and indexing. However, most methods aim at an efficient representation and processing of simple graphs (without attributes in nodes or edges, or multiple edges for a given pair of nodes). In this paper we present a model for compact representation of general graph databases. It represents an attractive alternative due to the compression rates it achieves and its efficient navigation operations.

advances in databases and information systems | 2012

Exploiting SIMD instructions in current processors to improve classical string algorithms

Susana Ladra; Oscar Pedreira; José Duato; Nieves R. Brisaboa

Current processors include instruction set extensions especially designed for improving the performance of media, imaging, and 3D workloads. These instructions are rarely considered when implementing practical solutions for algorithms and compressed data structures, mostly because they are not directly generated by the compiler. In this paper, we proclaim their benefits and encourage their use, as they are an unused asset included in almost all general-purpose computers. As a proof of concept, we perform an experimental evaluation by straightforwardly including some of these complex instructions in basic string algorithms used for indexing and search, obtaining significant speedups. This opens a new interesting line of research: designing new algorithms and data structures by taking into account the existence of these sets of instructions, in order to achieve significant speedups at no extra cost.

ieee international conference on fuzzy systems | 2010

Evaluation of information loss for privacy preserving data mining through comparison of fuzzy partitions

Isaac Cano; Susana Ladra; Vicenç Torra

In this paper, we focus on the problem of preserving the data confidentiality when sharing the data for clustering. This problem poses new challenges for novel uses of privacy preserving data mining (PPDM) techniques. Specifically, this paper considers the synthetic data generation as a way to preserve the data privacy. One of the state of the art synthetic data generators is the IPSO family of methods. It has been stated that the use of IPSO to generate synthetic data is appropriate when the user plans to apply clustering to the data. Moreover, this paper aims to associate the same property to the FCRM synthetic data generator, and at the same time, to assess the relationship between the information loss produced when generating synthetic data with FCRM and the clustering similarity between the original and synthetic data.

combinatorial pattern matching | 2010

Approximate all-pairs suffix/prefix overlaps

Niko Välimäki; Susana Ladra; Veli Mäkinen

Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of r strings of total length n and an error-rate e, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = ⌈el⌉, where l is the length of the overlap. We propose new solutions for this problem based on backward backtracking (Lam et al. 2008) and suffix filters (Karkkainen and Na, 2008). Techniques use nHk + o(n log σ) + r log r bits of space, where Hk is the k-th order entropy and σ the alphabet size. In practice, methods are easy to parallelize and scale up to millions of DNA reads.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2008

On the comparison of generic information loss measures and cluster-specific ones

Susana Ladra; Vicenç Torra

Masking methods are to protect data bases prior to their public release. They mask an original data file so that the new file ensures the privacy of data respondents. Information loss measures have been developed to evaluate in which extent the masked file diverges from the corresponding original file, and in what extent the same analyses on both files lead to the same results. Generic information loss measures ignore the intended data use of the file. These are the standard measures when data has to be released (e.g. published in the web) and there is no control on what kind of analyses users would perform. In this paper we study generic information loss measures, and we compare such measures with respect to cluster-specific ones. That is, measures specifically defined for the case in which the user will do clustering with the original data. To do so, we define such measures and then we do an extensive comparison of the two measures. The paper shows that the generic measures can cope with the information loss related to clustering.

Explore More