Carolina Bonacic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carolina Bonacic is active.

Explore More

Publication

Featured researches published by Carolina Bonacic.

parallel computing | 2010

Sync/Async parallel search for the efficient design and construction of web search engines

Mauricio Marin; Veronica Gil-Costa; Carolina Bonacic; Ricardo A. Baeza-Yates; Isaac D. Scherson

A parallel query processing method is proposed for the design and construction of web search engines to efficiently deal with dynamic variations in query traffic. The method allows for the efficient use of different distributed indexing and query processing strategies in server clusters consisting of multiple computational/storage nodes. It also enables a better utilization of local and distributed hardware resources as it automatically re-organizes parallel computations to benefit from the advantages of two mixed modes of operation, namely: a newly proposed synchronous mode and the standard asynchronous computing mode. Switching between modes is facilitated by a round-robin strategy devised to grant each query a fair share of the hardware resources and properly predict query throughput. Performance is evaluated by experimental methods and two case studies serve to show how to develop efficient parallel query processing algorithms for large-scale search engines based on the proposed paradigm.

web information and data management | 2008

High-performance priority queues for parallel crawlers

Mauricio Marin; Rodrigo Paredes; Carolina Bonacic

Large scale data centers for crawlers are able to maintain a very large number of active http connections in order to download as fast as possible the usually huge number of web pages from given sections of the WWW. This generates a continuous stream of new URLs of documents to be downloaded and it is clear that the associated work-load can only be served efficiently with proper parallel computing techniques. The incoming new URLs have to be organized by a priority measure in order to download the most relevant documents first. Efficiently managing them along with other synchronization issues such as URLs downloaded by different processing nodes forming a cluster of computers are the matters of this paper. We propose efficient and scalable strategies which consider intra-node multi-core multi-threading on an inter-nodes distributed memory environment, including efficient use of secondary memory.

european conference on parallel processing | 2008

A Search Engine Index for Multimedia Content

Mauricio Marin; Veronica Gil-Costa; Carolina Bonacic

We present a distributed index data structure and algorithms devised to support parallel query processing of multimedia content in search engines. We present a comparative study with a number of data structures used as indexes for metric space databases. Our optimization criteria are based on requirements for high-performance search engines. The main advantages of our proposal are efficient performance with respect to other approaches (sequentially and in parallel), suitable treatment of secondary memory, and support for OpenMP multithreading. We presents experiments for the asynchronous (MPI) and bulk-synchronous (BSP) message passing models of parallel computing showing that in both models our approach outperforms others consistently.

parallel, distributed and network-based processing | 2010

Scheduling Metric-Space Queries Processing on Multi-Core Processors

Veronica Gil-Costa; Ricardo J. Barrientos; Mauricio Marin; Carolina Bonacic

This paper proposes a strategy to organize metricspace query processing in multicore search nodes as understood in the context of search engines running on clusters of computers. The strategy is applied in each search node to process all active queries visiting the node as part of their solution which, in general, for each query is computed from the contribution of each search node. When query traffic is high enough, the proposed strategy assigns one thread to each query and lets them work in a fully asynchronous manner. When query traffic is moderate or low, some threads start to idle so they are put to work on queries being processed by other threads. The strategy solves the associated synchronization problem among threads by switching query processing into a bulk-synchronous mode of operation. This simplifies the dynamic re-organization of threads and overheads are very small with the advantage that the overall work-load is evenly distributed across all threads.

international conference on parallel processing | 2007

A Search Engine Accepting On-Line Updates

Mauricio Marin; Carolina Bonacic; Veronica Gil Costa; Carlos Gomez

We describe and evaluate the performance of a parallel search engine that is able to cope efficiently with concurrent read/write operations. Read operations come in the usual form of queries submitted to the search engine and write ones come in the form of new documents added to the text collection in an on-line manner, namely the insertions are embedded into the main stream of user queries in an unpredictable arrival order but with query results respecting causality. The search engine is built upon distributed inverted files for which we propose generic strategies for load balance and concurrency control.

Fundamenta Informaticae | 2014

Modelling Search Engines Performance Using Coloured Petri Nets

Veronica Gil-Costa; Mauricio Marin; Alonso Inostrosa-Psijas; Jair Lobos; Carolina Bonacic

This paper proposes using Coloured Petri Nets to model performance of vertical search engines for Web search. In such systems, queries submitted by users or client systems are handled by different components implemented as services deployed on large clusters of dedicated processors. We propose models that represent key features of components running time cost at a suitable level of detail. A comprehensive evaluation study is presented to reveal good precision of models when compared against actual implementations and complex process-oriented simulators of the same search engine instances. A C++ class library is proposed to enable rapid model construction by using a hierarchical and scalable approach, and to enable transparent generation and efficient execution of respective simulation programs either sequentially or in parallel.

high performance computing for computational science (vector and parallel processing) | 2008

Improving Search Engines Performance on Multithreading Processors

Carolina Bonacic; Carlos García; Mauricio Marin; Manuel Prieto; Francisco Tirado; Cesar Vicente

In this paper we present strategies and experiments that show how to take advantage of the multi-threading parallelism available in Chip Multithreading (CMP) processors in the context of efficient query processing for search engines. We show that scalable performance can be achieved by letting the search engine go synchronous so that batches of queries can be processed concurrently in a simple but very efficient manner. Furthermore, our results indicate that the multithreading capabilities of modern CMP systems are not fully exploited when the search engine operates on a conventional asynchronous mode due to the moderate thread level parallelism that can be extracted from a single query.

quantitative evaluation of systems | 2004

Approximate computation of transient results for large Markov chains

Carolina Bonacic; Antonio Fariña; Mauricio Marin; Nieves R. Brisaboa

Systems able to cope with very large text collections are making intensive use of distributed memory parallel computing platforms such as clusters of PCs. This is particularly evident in Web search engines which must resort to parallelism in order to deal efficiently with both high rates of queries per unit time and high space requirements in the form of large numbers of small documents stored in secondary memory. Those documents can be stored in compressed format to reduce memory space and communication time. This paper proposes a parallel algorithm for compressing text in such a distributed memory environment. We show efficient performance against the usual-practice alternative of compressing the whole text on a single machine.This paper presents a new approach for the computation of transient measures in large continuous time Markov chains (CTMCs). The approach combines the randomization approach for transient analysis of CTMCs with a new representation of probability vectors as Kronecker products of small component vectors. This representation is an approximation that allows an extremely space- and time-efficient computation of transient vectors. Usually, the resulting approximation is very good and introduces errors that are comparable to those found with existing approximation techniques for stationary analysis. By increasing the space and time requirements of the approach, we can represent parts of the solution vector in detail and reduce the approximation error, yielding exact solutions in the limiting case.

principles of advanced discrete simulation | 2013

Approximate parallel simulation of web search engines

Mauricio Marin; Veronica Gil-Costa; Carolina Bonacic; Roberto Solar

Large scale Web search engines are complex and highly optimized systems devised to operate on dedicated clusters of processors. Any, even a small, gain in performance is beneficial to economical operation given the large amount of hardware resources deployed in the respective data centers. Performance is fully dependent on users behavior which is featured by unpredictable and drastic variations in trending topics and arrival rate intensity. In this context, discrete-event simulation is a powerful tool either to predict performance of new optimizations introduced in search engine components or to evaluate different scenarios under which alternative component configurations are able to process demanding workloads. These simulators must be fast, memory efficient and parallel to cope with the execution of millions of events in small running time on few processors. In this paper we propose achieving this objective at the expense of performing approximate parallel simulation.

european conference on parallel processing | 2008

Exploiting Hybrid Parallelism in Web Search Engines

Carolina Bonacic; Carlos García; Mauricio Marin; Manuel Prieto; Francisco Tirado

With the emergence of multi-core CPU (or Chip-level MultiProcessor -CMP-), it is essential to develop techniques that capitalize on CMPs advantages to speed up very demanding applications of parallel computing such as Web search engines. In particular, for this application and given the huge amount of computational resources deployed at data centers, it is of paramount importance to come out with strategies able to get the best performance from hardware. This is specially critical when we consider how we organize hardware to cope with sustained periods of very high traffic of user queries. In this paper, we propose an hybrid technique based on MPIand OpenMPwhich has been devised to take advantage of the multithreading facilities provided by CMP nodes for search engines under high query traffic.

Explore More