Romulo Goncalves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Romulo Goncalves is active.

Explore More

Publication

Featured researches published by Romulo Goncalves.

extending database technology | 2009

Exploiting the power of relational databases for efficient stream processing

Erietta Liarou; Romulo Goncalves; Stratos Idreos

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.

statistical and scientific database management | 2007

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

Milena Ivanova; Niels Nes; Romulo Goncalves; Martin L. Kersten

This paper presents our experiences in porting the Sloan Digital Sky Survey(SDSS)/ SkyServer to the state-of- the-art open source database system MonetDB/SQL. SDSS acts as a well-documented benchmark for scientific database management. We have achieved a fully functional prototype for the personal SkyServer, to be downloaded from our site. The lessons learned are 1) the column store approach of MonetDB demonstrates a great potential in the world of scientific databases. However, the application also challenged the functionality of our implementation and revealed that a fully operational SQL environment is needed, e.g. including persistent stored modules; 2) the initial performance is competitive to the reference platform, MS SQL Server 2005, and 3) the analysis of SDSS query traces hints at several techniques to boost performance by utilizing repetitive behavior and zoom-in/zoom-out access patterns, that are currently not captured by the system.

Sigspatial Special | 2015

Benchmarking and improving point cloud data management in MonetDB

Oscar Martinez-Rubi; Peter van Oosterom; Romulo Goncalves; T.P.M. Tijssen; Milena Ivanova; Martin L. Kersten; Foteini Alvanaki

The popularity, availability and sizes of point cloud data sets are increasing, thus raising interesting data management and processing challenges. Various software solutions are available for the management of point cloud data. A benchmark for point cloud data management systems was defined and it was executed for several solutions. In this paper we focus on the solutions based on the column-store MonetDB, the generic out-of-the-box approach is compared with two alternative approaches that exploit the spatial coherence of the data to improve the data access and to minimize the storage requirements.

extending database technology | 2010

The Data Cyclotron query processing scheme

Romulo Goncalves; Martin L. Kersten

Distributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot data set, minimize query response time, and maximize throughput without global co-ordination. In this paper, we introduce the Data Cyclotron architecture which addresses the challenges using turbulent data movement through a storage ring built from distributed main memory capitalizing modern remote-DMA facilities. Queries assigned to individual nodes interact with the Data Cyclotron by picking up data fragments continuously flowing around, i.e., the hot set. Each data fragment carries a level of interest (LOI) metric, which represents the cumulative query interest as the fragment passes around the ring multiple times. A fragment with a LOI below a given threshold, inversely proportional to the ring load, is pulled out to free up resources. This threshold is dynamically adjusted in a distributed manor based on ring characteristics and query needs. It optimizes the resource utilization keeping the average data access delay low. The proposed architecture has a modest impact on existing query execution engines. This is illustrated using an extensive validated simulation study for the Data Cyclotron protocols. The results underpin their robustness in turbulent workload scenarios as well as in the TPC-H scenario. Furthermore, we think that using state-of-the-art network technology, e.g., RDMA, could lead to even more promising results. The Data Cyclotron architecture opens a new vista for modern distributed database architectures with a plethora of research challenges barely scratched upon.

data management on new hardware | 2013

Peak performance: remote memory revisited

Hannes Mühleisen; Romulo Goncalves; Martin L. Kersten

Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput network connections such as InfiniBand prompts us to revisit the long-proposed usage of memory provided by remote machines. In this paper, we present a solution to make use of remote memory without manipulation of the operating system, and investigate the impact on database performance.

ACM Transactions on Database Systems | 2011

The data cyclotron query processing scheme

Romulo Goncalves; Martin L. Kersten

A grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron architecture [Goncalves and Kersten 2010] addresses this challenge using turbulent data movement through a storage ring built from distributed main memory and capitalizing on the functionality offered by modern remote-DMA network facilities. Queries assigned to individual nodes interact with the storage ring by picking up data fragments, which are continuously flowing around, that is, the hot set. The storage ring is steered by the Level Of Interest (LOI) attached to each data fragment, which represents the cumulative query interest as it passes around the ring multiple times. A fragment with LOI below a given threshold, inversely proportional to the ring load, is pulled out to free up resources. This threshold is dynamically adjusted in a fully distributed manner based on ring characteristics and locally observed query behavior. It optimizes resource utilization by keeping the average data access latency low. The approach is illustrated using an extensive and validated simulation study. The results underpin the fragment hot set management robustness in turbulent workload scenarios. A fully functional prototype of the proposed architecture has been implemented using modest extensions to MonetDB and runs within a multirack cluster equipped with Infiniband. Extensive experimentation using both microbenchmarks and high-volume workloads based on TPC-H demonstrates its feasibility. The Data Cyclotron architecture and experiments open a new vista for modern distributed database architectures with a plethora of new research challenges.

very large data bases | 2015

GIS navigation boosted by column stores

Foteini Alvanaki; Romulo Goncalves; Milena Ivanova; Martin L. Kersten; Kostis Kyzirakos

Earth observation sciences, astronomy, and seismology have large data sets which have inherently rich spatial and geospatial information. In combination with large collections of semantically rich objects which have a large number of thematic properties, they form a new source of knowledge for urban planning, smart cities and natural resource management. Modeling and storing these properties indicating the relationships between them is best handled in a relational database. Furthermore, the scalability requirements posed by the latest 26-attribute light detection and ranging (LIDAR) data sets are a challenge for file-based solutions. In this demo we show how to query a 640 billion point data set using a column store enriched with GIS functionality. Through a lightweight and cache conscious secondary index called Imprints, spatial queries performance on a flat table storage is comparable to traditional file-based solutions. All the results are visualised in real time using QGIS.

MethodsX | 2016

Voxelization Algorithms for Geospatial Applications : Computational methods for voxelating spatial datasets of 3D city models containing 3D surface, curve and point data models

Pirouz Nourian; Romulo Goncalves; Sisi Zlatanova; Ken Arroyo Ohori; Anh Vu Vo

Graphical abstract

Archive | 2017

Realistic Benchmarks for Point Cloud Data Management Systems

Peter van Oosterom; Oscar Martinez-Rubi; T.P.M. Tijssen; Romulo Goncalves

Lidar, photogrammetry, and various other survey technologies enable the collection of massive point clouds. Faced with hundreds of billions or trillions of points the traditional solutions for handling point clouds usually under-perform even for classical loading and retrieving operations. To obtain insight in the features affecting performance the authors carried out single-user tests with different storage models on various systems, including Oracle Spatial and Graph, PostgreSQL-PostGIS, MonetDB and LAStools (during the second half of 2014). In the summer of 2015, the tests are further extended with the latest developments of the systems, including the new version of Point Data Abstraction Library (PDAL) with efficient compression. Web services based on point cloud data are becoming popular and they have requirements that most of the available point cloud data management systems can not fulfil. This means that specific custom-made solutions are constructed. We identify the requirements of these web services and propose a realistic benchmark extension, including multi-user and level-of-detail queries. This helps in defining the future lines of work for more generic point cloud data management systems, supporting such increasingly demanded web services.

advances in geographic information systems | 2016

A spatial column-store to triangulate the Netherlands on the fly.

Romulo Goncalves; Tom van Tilburg; Kostis Kyzirakos; Foteini Alvanaki; Panagiotis Koutsourakis; Ben van Werkhoven; Willem Robert van Hage

3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS.

Explore More