José R. Paramá | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José R. Paramá is active.

Explore More

Publication

Featured researches published by José R. Paramá.

Information Retrieval | 2007

Lightweight natural language text compression

Nieves R. Brisaboa; Antonio Fariña; Gonzalo Navarro; José R. Paramá

Variants of Huffman codes where words are taken as the source symbols are currently the most attractive choices to compress natural language text databases. In particular, Tagged Huffman Code by Moura et al. offers fast direct searching on the compressed text and random access capabilities, in exchange for producing around 11% larger compressed files. This work describes End-Tagged Dense Code and (s, c)-Dense Code, two new semistatic statistical methods for compressing natural language texts. These techniques permit simpler and faster encoding and obtain better compression ratios than Tagged Huffman Code, while maintaining its fast direct search and random access capabilities. We show that Dense Codes improve Tagged Huffman Code compression ratio by about 10%, reaching only 0.6% overhead over the optimal Huffman compression ratio. Being simpler, Dense Codes are generated 45% to 60% faster than Huffman codes. This makes Dense Codes a very attractive alternative to Huffman code variants for various reasons: they are simpler to program, faster to build, of almost optimal size, and as fast and easy to search as the best Huffman variants, which are not so close to the optimal size.

european conference on information retrieval | 2003

An efficient compression code for text databases

Nieves R. Brisaboa; Eva Lorenzo Iglesias; Gonzalo Navarro; José R. Paramá

We present a new compression format for natural language texts, allowing both exact and approximate search without decompression. This new code -called End-Tagged Dense Code- has some advantages with respect to other compression techniques with similar features such as the Tagged Huffman Code of [Moura et al., ACM TOIS 2000]. Our compression method obtains (i) better compression ratios, (ii) a simpler vocabulary representation, and (iii) a simpler and faster encoding. At the same time, it retains the most interesting features of the method based on the Tagged Huffman Code, i.e., exact search for words and phrases directly on the compressed text using any known sequential pattern matching algorithm, efficient word-based approximate and extended searches without any decoding, and efficient decompression of arbitrary portions of the text. As a side effect, our analytical results give new upper and lower bounds for the redundancy of d-ary Huffman codes.

ACM Transactions on Information Systems | 2010

Dynamic lightweight text compression

Nieves R. Brisaboa; Antonio Fariña; Gonzalo Navarro; José R. Paramá

We address the problem of adaptive compression of natural language text, considering the case where the receiver is much less powerful than the sender, as in mobile applications. Our techniques achieve compression ratios around 32% and require very little effort from the receiver. Furthermore, the receiver is not only lighter, but it can also search the compressed text with less work than that necessary to decompress it. This is a novelty in two senses: it breaks the usual compressor/decompressor symmetry typical of adaptive schemes, and it contradicts the long-standing assumption that only semistatic codes could be searched more efficiently than the uncompressed text. Our novel compression methods are preferable in several aspects over the existing adaptive and semistatic compressors for natural language texts.

data compression conference | 2008

Word-Based Statistical Compressors as Natural Language Compression Boosters

Antonio Fariña; Gonzalo Navarro; José R. Paramá

Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to compress natural language texts. With compression ratios around 30%, they allow direct pattern searching on the compressed text up to 8 times faster than on its uncompressed version. In this paper we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors such as the block-wise bzip2, those from the Ziv-Lempel family, and the predictive ppm-based ones, can benefit from compressing not the original text, but its compressed representation obtained by a word-based byte-oriented statistical compressor. In particular, our experimental results show that using Dense-Code-based compression as a preprocessing step to classical compressors like bzip2, gzip, or ppmdi, yields several important benefits. For example, the ppm family is known for achieving the best compression ratios. With a Dense coding preprocessing, ppmdi achieves even better compression ratios (the best we know of on natural language) and much faster compression/decompression than ppmdi alone. Text indexing also profits from our preprocessing step. A compressed self-index achieves much better space and time performance when preceded by a semistatic word-based compression step. We show, for example, that the AF-FMindex coupled with Tagged Huffman coding is an attractive alternative index for natural language texts.

international acm sigir conference on research and development in information retrieval | 2005

Efficiently decodable and searchable natural language adaptive compression

Nieves R. Brisaboa; Antonio Fariña; Gonzalo Navarro; José R. Paramá

We address the problem of adaptive compression of natural language text, focusing on the case where low bandwidth is available and the receiver has little processing power, as in mobile applications. Our technique achieves compression ratios around 32% and requires very little effort from the receiver. This tradeoff, not previously achieved with alternative techniques, is obtained by breaking the usual symmetry between sender and receiver dominant in statistical adaptive compression. Moreover, we show that our technique can be adapted to avoid decompression at all in cases where the receiver only wants to detect the presence of some keywords in the document. This is useful in scenarios such as selective dissemination of information, news clipping, alert systems, text categorization, and clustering. Thanks to the asymmetry we introduce, the receiver can search the compressed text much faster than the plain text. This was previously achieved only in semistatic compression scenarios.

Online Information Review | 2007

The Galician virtual library

Ángeles S. Places; Nieves R. Brisaboa; Antonio Fariña; Miguel Rodríguez Luaces; José R. Paramá; Miguel R. Penabad

Purpose – This study aims to present the digital library Galician virtual library (BVG, for “Biblioteca Virtual Galega”) in Galician.Design/methodology/approach – The paper shows the objectives pursued by the BVG, its development, putting special emphasis on the main technological challenges, and presents some data about its usage.Findings – A digital library can be used to stimulate a lesser‐used language and to promote the culture and tourism of a region.Originality/value – The paper shows how a digital library can be used to strengthen the Galician language, which is currently categorised as a “Lesser Used Language” in the European Community and to contribute to the preservation and spreading of Galician culture and literary works, either from current authors or from previous documents. It also provides a digital publishing house for new authors and opens a communication channel between current authors and their readers. Finally, it helps to connect a scattered community like the Galician, offering a c...

web and wireless geographical information systems | 2004

A generic framework for GIS applications

Miguel Rodríguez Luaces; Nieves R. Brisaboa; José R. Paramá; José Ramon Rios Viqueira

Geographic information systems (GIS) are becoming more usual due to the improved performance of computer systems. GIS applications are being developed using the three-tier software architecture traditionally used for general-purpose information systems. Even though this architecture is suitable for GIS applications, the special nature and exclusive characteristics of geographic information pose special functional requirements on the architecture in terms of conceptual and logical models, data structures, access methods, analysis techniques, or visualization procedures. In this paper, we propose a generic architecture for GIS that provides support for the special nature of geographic information and conforms with the specifications proposed by the ISO/TC 211 and the OGC. Our strategy to achieve this goal consists of two steps: (i) we analyze the special characteristics of GIS with respect to traditional information systems, (ii) and we adapt the traditional three-tier architecture for information systems to take into account the special characteristics of GIS. Finally, we have tried to apply the architecture that we propose in the development of a complete and complex real-life GIS application using commercial tools in the analysis, design and implementation. We describe this application, and we use it to describe the limitations of current commercial GIS development tools by analyzing the differences in the architecture of the resulting system with respect to our proposal.

string processing and information retrieval | 2004

Simple, Fast, and Efficient Natural Language Adaptive Compression

Nieves R. Brisaboa; Antonio Fariña; Gonzalo Navarro; José R. Paramá

One of the most successful natural language compression methods is word-based Huffman. However, such a two-pass semi-static compressor is not well suited to many interesting real-time transmission scenarios. A one-pass adaptive variant of Huffman exists, but it is character-oriented and rather complex. In this paper we implement word-based adaptive Huffman compression, showing that it obtains very competitive compression ratios. Then, we show how End-Tagged Dense Code, an alternative to word-based Huffman, can be turned into a faster and much simpler adaptive compression method which obtains almost the same compression ratios.

statistical and scientific database management | 2008

An Ontology-Based Index to Retrieve Documents with Geographic Information

Miguel Rodríguez Luaces; José R. Paramá; Oscar Pedreira; Diego Seco

Both Geographic Information Systemsand Information Retrievalhave been very active research fields in the last decades. Lately, a new research field called Geographic Information Retrievalhas appeared from the intersection of these two fields. The main goal of this field is to define index structures and techniques to efficiently store and retrieve documents using both the text and the geographic references contained within the text. We present in this paper a new index structure that combines an inverted index, a spatial index, and an ontology-based structure. This structure improves the query capabilities of other proposals. In addition, we describe the architecture of a system for geographic information retrieval that uses this new index structure. This architecture defines a workflow for the extraction of the geographic references in the document.

database and expert systems applications | 2005

Improving Accessibility of Web-Based GIS Applications

Nieves R. Brisaboa; Miguel Rodríguez Luaces; José R. Paramá; David Trillo; José Ramon Rios Viqueira

A major problem of vector active map formats such as WebCGM and scalable vector graphics (SVG) is that, in order to display them in most Web browsers, either a plug-in has to be installed or an applet has to be downloaded. In this paper, a Web service is presented whose functionality enables the transformation of vector active maps from SVG to a new DHTML (HTML+JavaScript) active map representation, improving this way the accessibility of Web-based GIS applications. This new representation, which is also part of the present work, includes a raster representation of the map and a vector representation of its geographic objects. The former is used as a background image of the map whereas the latter enables the response to user-triggered events. An R-tree spatial index structure is used to access the geographic objects affected by each event in order to execute the relevant action

Explore More