Senén González | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Senén González is active.

Explore More

Publication

Featured researches published by Senén González.

international acm sigir conference on research and development in information retrieval | 2012

To index or not to index: time-space trade-offs in search engines with positional ranking functions

Diego Arroyuelo; Senén González; Mauricio Marin; Mauricio Oyarzún; Torsten Suel

Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

international acm sigir conference on research and development in information retrieval | 2013

Document identifier reassignment and run-length-compressed inverted indexes for improved search performance

Diego Arroyuelo; Senén González; Mauricio Oyarzún; Victor Sepulveda

Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: the inverted indexes. Currently, inverted indexes can be represented very efficiently using index compression schemes. Recent investigations also study how an optimized document ordering can be used to assign document identifiers (docIDs) to the document database. This yields important improvements in index compression and query processing time. In this paper we follow this line of research, yet from a different perspective. We propose a docID reassignment method that allows one to focus on a given subset of inverted lists to improve their performance. We then use run-length encoding to compress these lists (as many consecutive 1s are generated). We show that by using this approach, not only the performance of the particular subset of inverted lists is improved, but also that of the whole inverted index. Our experimental results indicate a reduction of about 10% in the space usage of the whole index docID reassignment was focused. Also, decompression speed is up to 1.22 times faster if the runs must be explicitly decompressed and up to 4.58 times faster if implicit decompression of runs is allowed. Finally, we also improve the Document-at-a-Time query processing time of AND queries (by up to 12%), WAND queries (by up to 23%) and full (non-ranked) OR queries (by up to 86%).

european conference on parallel processing | 2008

Scheduling Intersection Queries in Term Partitioned Inverted Files

Mauricio Marin; Carlos Gómez-Pantoja; Senén González; Veronica Gil-Costa

This paper proposes and presents a comparison of scheduling algorithms applied to the context of load balancing the query traffic on distributed inverted files. We put emphasis on queries requiring intersection of posting lists, which is a very demanding case for the term partitioned inverted file and a case in which the document partitioned inverted file used by current search engines can perform very efficiently. We show that with proper scheduling of queries the term partitioned approach can outperform the document partitioned approach.

Information Processing and Management | 2012

Distributed search based on self-indexed compressed text

Diego Arroyuelo; Veronica Gil-Costa; Senén González; Mauricio Marin; Mauricio Oyarzún

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.

international conference on conceptual structures | 2010

A vector model for routing queries in web search engines

Mauricio Oyarzún; Senén González; Marcelo Mendoza; Flavio Ferrarotti; Max Chacón; Mauricio Marin

Abstract This paper proposes a method for reducing the number of search nodes involved in the solution of queries arriving to a Web search engine. The method is applied by the query receptionist machine during situations of sudden peaks in query trafic to reduce the load on the search nodes. The experimental evaluation based on actual traces from users of a major search engine, shows that the proposed method outperforms alternative strategies. This is more evident for systems composed of a large number of search nodes which indicates that the method is also more scalable than the alternative strategies.

workshop on logic, language, information and computation | 2017

On Fragments of Higher Order Logics that on Finite Structures Collapse to Second Order

Flavio Ferrarotti; Senén González; José María Turull-Torres

We define new fragments of higher-order logics of order three and above, and investigate their expressive power over finite models. The key unifying property of these fragments is that they all admit inexpensive algorithmic translations of their formulae to equivalent second-order logic formulae. That is, within these fragments we can make use of third- and higher-order quantification without paying the extremely high complexity price associated with them. Although theoretical in nature, the results reported here are more significant from a practical perspective. It turns out that there are many examples of properties of finite models (queries from the perspective of relational databases) which can be simply and elegantly defined by formulae of the higher-order fragments studied in this work. For many of those properties, the equivalent second-order formulae can be very complicated and unintuitive. In particular when they concern properties of complex objects, such as hyper-graphs, and the equivalent second-order expressions require the encoding of those objects into plain relations.

advances in databases and information systems | 2018

Efficient SPARQL Evaluation on Stratified RDF Data with Meta-data

Flavio Ferrarotti; Senén González; Klaus-Dieter Schewe

The Resource Description Framework (RDF) is a simple, but frequently used W3C standard, which uses triplets to define relationships between resources. In this paper the evaluation of queries in the query language SPARQL on RDF data with meta-data is investigated. We first show that if the data are stratified, i.e. a particular partial order can be defined on the meta-data labels, then a nesting procedure can be applied, which induces a rewriting of the query. Based on a specification by an Abstract State Machine we show that the result of the rewritten query equals the one that would have resulted from the evaluation of the original query. We further investigate the reduction of complexity by using data and query nesting.

International Conference on Abstract State Machines, Alloy, B, TLA, VDM, and Z | 2018

Systematic Refinement of Abstract State Machines with Higher-Order Logic

Flavio Ferrarotti; Senén González; Klaus-Dieter Schewe; José María Turull-Torres

Graph algorithms that involve complex conditions on subgraphs can be specified much easier, if the specification allows expressions in higher-order logic to be used. In this paper an extension of Abstract State Machines by such expressions is introduced and its usefulness is demonstrated by examples of computations on graphs, such as graph factoring and checking self-similarity. In a naive way these high-level specifications can be refined using submachines for the evaluation of the higher-order expressions. We show that refinements can be obtained in an automatic way for well-defined fragments of higher-order logic that collapse to second-order, by means of which the naive refinement is only necessary for second-order logic expressions.

Information Processing and Management | 2018

Hybrid compression of inverted lists for reordered document collections

Diego Arroyuelo; Mauricio Oyarzún; Senén González; Victor Sepulveda

Abstract Text search engines are a fundamental tool nowadays. Their efficiency relies on a popular and simple data structure: inverted indexes. They store an inverted list per term of the vocabulary. The inverted list of a given term stores, among other things, the document identifiers (docIDs) of the documents that contain the term. Currently, inverted indexes can be stored efficiently using integer compression schemes. Previous research also studied how an optimized document ordering can be used to assign docIDs to the document database. This yields important improvements in index compression and query processing time. In this paper we show that using a hybrid compression approach on the inverted lists is more effective in this scenario, with two main contributions: • First, we introduce a document reordering approach that aims at generating runs of consecutive docIDs in a properly-selected subset of inverted lists of the index. • Second, we introduce hybrid compression approaches that combine gap and run-length encodings within inverted lists, in order to take advantage not only from small gaps, but also from long runs of consecutive docIDs generated by our document reordering approach. Our experimental results indicate a reduction of about 10%–30% in the space usage of the whole index (just regarding docIDs), compared with the most efficient state-of-the-art results. Also, decompression speed is up to 1.22 times faster if the runs of consecutive docIDs must be explicitly decompressed, and up to 4.58 times faster if implicit decompression of these runs is allowed (e.g., representing the runs as intervals in the output). Finally, we also improve the query processing time of AND queries (by up to 12%), WAND queries (by up to 23%), and full (non-ranked) OR queries (by up to 86%), outperforming the best existing approaches.

string processing and information retrieval | 2010