Mauricio Marin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauricio Marin is active.

Explore More

Publication

Featured researches published by Mauricio Marin.

latin american web congress | 2004

Scheduling algorithms for Web crawling

Carlos Castillo; Mauricio Marin; M. Andrea Rodríguez; Ricardo A. Baeza-Yates

This paper presents a comparative study of strategies for Web crawling. We show that a combination of breadth-first ordering with the largest sites first is a practical alternative since it is fast, simple to implement, and able to retrieve the best ranked pages at a rate that is closer to the optimal than other alternatives. Our study was performed on a large sample of the Chilean Web which was crawled by using simulators, so that all strategies were compared under the same conditions, and actual crawls to validate our conclusions. We also explored the effects of large scale parallelism in the page retrieval task and multiple-page requests in a single connection for effective amortization of latency times.

international conference on parallel processing | 2011

kNN query processing in metric spaces using GPUs

Ricardo J. Barrientos; José Ignacio Gómez; Christian Tenllado; Manuel Prieto Matias; Mauricio Marin

Information retrieval from large databases is becoming crucial for many applications in different fields such as content searching in multimedia objects, text retrieval or computational biology. These databases are usually indexed off-line to enable an acceleration of on-line searches. Furthermore, the available parallelism has been exploited using clusters to improve query throughput. Recently some authors have proposed the use of Graphic Processing Units (GPUs) to accelerate bruteforce searching algorithms for metric-space databases. In this work we improve existing GPU brute-force implementations and explore the viability of GPUs to accelerate indexing techniques. This exploration includes an interesting discussion about the performance of both bruteforce and indexing-based algorithms that takes into account the intrinsic dimensionality of the element of the database.

high performance distributed computing | 2010

New caching techniques for web search engines

Mauricio Marin; Veronica Gil-Costa; Carlos Gómez-Pantoja

This paper proposes a cache hierarchy that enables Web search engines to efficiently process user queries. The different caches in the hierarchy are used to store pieces of data which are useful to solve frequent queries. Cached items range from specific data such as query answers to generic data such as segments of index retrieved from secondary memory. The paper also presents a comparative study based on discrete-event simulation and bulk-synchronous parallelism. The studied performance metrics include overall query throughput, single-user query latency and power consumption. In all cases, the results show that the proposed cache hierarchy leads to better performance than a baseline approach built on state of the art caching techniques.

Journal of Discrete Algorithms | 2009

Parallel query processing on distributed clustering indexes

Veronica Gil-Costa; Mauricio Marin; Nora Reyes

Similarity search has been proved suitable for searching in large collections of unstructured data objects. A number of practical index data structures for this purpose have been proposed. All of them have been devised to process single queries sequentially. However, in large-scale systems such as Web Search Engines indexing multi-media content, it is critical to deal efficiently with streams of queries rather than with single queries. In this paper we show how to achieve efficient and scalable performance in this context. To this end we transform a sequential index based on clustering into a distributed one and devise algorithms and optimizations specially tailored to support high-performance parallel query processing.

conference on information and knowledge management | 2007

High-performance distributed inverted files

Mauricio Marin; Veronica Gil-Costa

We present a general method of parallel query processing that allows scalable performance on distributed inverted files. The method allows the realization of a hybrid that combines the advantages of the document and term partitioned inverted files.

Computer Physics Communications | 1995

An empirical assessment of priority queues in event-driven molecular dynamics simulation

Mauricio Marin; Patricio Cordero

In the last decades a number of near optimal priority queues have been developed. Many of these priority queues are suitable for the efficient management of events generated during simulations of hard-particle systems. In this paper we compare the execution times of the fastest priority queues known today as well as some forms of binary search trees used as priority queues. We conclude that an unusual adaptation of a strictly balanced binary tree has the best performance for this class of simulations.

parallel computing | 2010

Sync/Async parallel search for the efficient design and construction of web search engines

Mauricio Marin; Veronica Gil-Costa; Carolina Bonacic; Ricardo A. Baeza-Yates; Isaac D. Scherson

A parallel query processing method is proposed for the design and construction of web search engines to efficiently deal with dynamic variations in query traffic. The method allows for the efficient use of different distributed indexing and query processing strategies in server clusters consisting of multiple computational/storage nodes. It also enables a better utilization of local and distributed hardware resources as it automatically re-organizes parallel computations to benefit from the advantages of two mixed modes of operation, namely: a newly proposed synchronous mode and the standard asynchronous computing mode. Switching between modes is facilitated by a round-robin strategy devised to grant each query a fair share of the hardware resources and properly predict query throughput. Performance is evaluated by experimental methods and two case studies serve to show how to develop efficient parallel query processing algorithms for large-scale search engines based on the proposed paradigm.

web information and data management | 2008

High-performance priority queues for parallel crawlers

Mauricio Marin; Rodrigo Paredes; Carolina Bonacic

Large scale data centers for crawlers are able to maintain a very large number of active http connections in order to download as fast as possible the usually huge number of web pages from given sections of the WWW. This generates a continuous stream of new URLs of documents to be downloaded and it is clear that the associated work-load can only be served efficiently with proper parallel computing techniques. The incoming new URLs have to be organized by a priority measure in order to download the most relevant documents first. Efficiently managing them along with other synchronization issues such as URLs downloaded by different processing nodes forming a cluster of computers are the matters of this paper. We propose efficient and scalable strategies which consider intra-node multi-core multi-threading on an inter-nodes distributed memory environment, including efficient use of secondary memory.

european conference on parallel processing | 2008

A Search Engine Index for Multimedia Content

Mauricio Marin; Veronica Gil-Costa; Carolina Bonacic

We present a distributed index data structure and algorithms devised to support parallel query processing of multimedia content in search engines. We present a comparative study with a number of data structures used as indexes for metric space databases. Our optimization criteria are based on requirements for high-performance search engines. The main advantages of our proposal are efficient performance with respect to other approaches (sequentially and in parallel), suitable treatment of secondary memory, and support for OpenMP multithreading. We presents experiments for the asynchronous (MPI) and bulk-synchronous (BSP) message passing models of parallel computing showing that in both models our approach outperforms others consistently.

international acm sigir conference on research and development in information retrieval | 2012

To index or not to index: time-space trade-offs in search engines with positional ranking functions

Diego Arroyuelo; Senén González; Mauricio Marin; Mauricio Oyarzún; Torsten Suel

Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

Explore More