Ricardo Manuel Pereira Vilaça
University of Minho
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ricardo Manuel Pereira Vilaça.
network computing and applications | 2007
Alfrânio Correia; José Pereira; Luís E. T. Rodrigues; Nuno Carvalho; Ricardo Manuel Pereira Vilaça; Rui Carlos Mendes de Oliveira; Susana Guedes
Database replication has been a common feature in database management systems (DBMSs) for a long time. In particular, asynchronous or lazy propagation of updates provides a simple yet efficient way of increasing performance and data availability and is widely available across the DBMS product spectrum. High end systems additionally offer sophisticated conflict resolution and data propagation options as well as, synchronous replication based on distributed locking and two-phase commit protocols. This paper presents GORDA architecture and programming interface (GAPI), that enables different replication strategies to be implemented once and deployed in multiple DBMSs. This is achieved by proposing a reflective interface to transaction processing instead of relying on-client interfaces or ad-hoc server extensions. The proposed approach is thus cost-effective, in enabling reuse of replication protocols or components in multiple DBMSs, as well as potentially efficient, as it allows close coupling with DBMS internals.
Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management | 2012
Leander Beernaert; Miguel Matos; Ricardo Manuel Pereira Vilaça; Rui Carlos Mendes de Oliveira
Cloud computing infrastructures are the most recent approach to the development and conception of computational systems. Cloud infrastructures are complex environments with various subsystems, each one with their own challenges. Cloud systems should be able to provide the following fundamental property: elasticity. Elasticity is the ability to automatically add and remove instances according to the needs of the system. This is a requirement for pay-per-use billing models. Various open source software solutions allow companies and institutions to build their own Cloud infrastructure. However, in most of these, the elasticity feature is quite immature. Monitoring and timely adapting the active resources of a Cloud computing infrastructure is key to provide the elasticity required by diverse, multi-tenant and pay-per-use business models. In this paper, we propose Elastack, an automated monitoring and adaptive system, generic enough to be applied to existing IaaS frameworks, and intended to enable the elasticity they currently lack. Our approach offers any Cloud infrastructure the mechanisms to implement automated monitoring and adaptation as well as the flexibility to go beyond these. We evaluate Elastack by integrating it with the OpenStack showing how easy it is to add these important features with a minimum, almost imperceptible, amount of modifications to the default installation.
distributed applications and interoperable systems | 2013
Ricardo Manuel Pereira Vilaça; Francisco Cruz; José Pereira; Rui Carlos Mendes de Oliveira
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off.
distributed applications and interoperable systems | 2011
Ricardo Manuel Pereira Vilaça; Rui Carlos Mendes de Oliveira; José Pereira
Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies. In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of complex queries common in social network read-intensive workloads. We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network application and show that the proposed correlation-aware data placement strategy offers a major improvement on the systems overall response time and network requirements.
international conference on move to meaningful internet systems | 2010
Ricardo Manuel Pereira Vilaça; Francisco Cruz; Rui Carlos Mendes de Oliveira
Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing. This is being acknowledged by several major Internet players embracing the cloud computing model and offering first generation distributed tuple stores. Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements. Furthermore, while availability is commonly assumed to be sustained by the massive scale itself, data consistency and freshness is usually severely hindered. By doing so, these services focus on a specific narrow trade-off between consistency, availability, performance, scale, and migration cost, that is much less attractive to common business needs. In this paper we introduce DataDroplets, a novel tuple store that shifts the current trade-off towards the needs of common business users, providing additional consistency guarantees and higher level data processing primitives smoothing the migration path for existing applications. We present a detailed comparison between DataDroplets and existing systems regarding their data model, architecture and trade-offs. Preliminary results of the systems performance under a realistic workload are also presented.
Proceedings of the Third Workshop on Dependable Distributed Data Management | 2009
Ricardo Manuel Pereira Vilaça; Rui Carlos Mendes de Oliveira
The current exponential growth of data calls for massive-scale capabilities of storage and processing. Such large volumes of data tend to disallow their centralized storage and processing making extensive and flexible data partitioning unavoidable. This is being acknowledged by several major Internet players embracing the Cloud computing model and offering first generation remote storage services with simple processing capabilities. In this position paper we present preliminary ideas for the architecture of a flexible, efficient and dependable fully decentralized object store able to manage very large sets of variable size objects and to coordinate in place processing. Our target are local area large computing facilities composed of tens of thousands of nodes under the same administrative domain. The system should be capable of leveraging massive replication of data to balance read scalability and fault tolerance.
symposium on reliable distributed systems | 2014
Pascal Felber; Marcelo Pasin; Etienne Rivière; Valerio Schiavoni; Pierre Sutra; Fábio Coelho; Rui Pedro Soares de Oliveira; Miguel Matos; Ricardo Manuel Pereira Vilaça
The ability to access and query data stored in multiple versions is an important asset for many applications, such as Web graph analysis, collaborative editing platforms, data forensics, or correlation mining. The storage and retrieval of versioned data requires a specific API and support from the storage layer. The choice of the data structures used to maintain versioned data has a fundamental impact on the performance of insertions and queries. The appropriate data structure also depends on the nature of the versioned data and the nature of the access patterns. In this paper we study the design and implementation space for providing versioning support on top of a distributed key-value store (KVS). We define an API for versioned data access supporting multiple writers and show that a plain KVS does not offer the necessary synchronization power for implementing this API. We leverage the support for listeners at the KVS level and propose a general construction for implementing arbitrary types of data structures for storing and querying versioned data. We explore the design space of versioned data storage ranging from a flat data structure to a distributed sharded index. The resulting system, ALEPH, is implemented on top of an industrial-grade open-source KVS, Infinispan. Our evaluation, based on real-world Wikipedia access logs, studies the performance of each versioning mechanisms in terms of load balancing, latency and storage overhead in the context of different access scenarios.
symposium on reliable distributed systems | 2009
Ricardo Manuel Pereira Vilaça; José Pereira; Rui Carlos Mendes de Oliveira; José Enrique Armendáriz-Iñigo; José Ramón González de Mendívil
Database clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ability to meet service level agreements when load spikes occur or cluster nodes fail. This translates into the ability to provision and deploy additional nodes. Many current research efforts focus on designing autonomic controllers to perform such reconfiguration, tuned to quickly react to system changes and spawn new replicas based on resource usage and performance measurements. In contrast, we are concerned about the inherent impact of deploying an additional node to an online cluster, considering both the time required to finish such an action as well as the impact on resource usage and performance of the cluster as a whole. If noticeable, such impact hinders the practicability of self-management techniques, since it adds an additional dimension that has to be accounted for. Our approach is to systematically benchmark a number of different reconfiguration scenarios to assess the cost of bringing a new replica online. We consider factors such as: workload characteristics, incremental and parallel recovery, flow control and outdatedness of the recovering replica. As a result, we show that research should be refocused from optimizing the capture and transmition of changes to applying them, which in a realistic setting dominates the cost of the recovery operation.
symposium on reliable distributed systems | 2014
Francisco Maia; Miguel Matos; Ricardo Manuel Pereira Vilaça; José Pereira; Rui Carlos Mendes de Oliveira; Etienne Rivière
Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation.
symposium on reliable distributed systems | 2016
Rogério Pontes; Francisco Maia; João Paulo; Ricardo Manuel Pereira Vilaça
On-line applications and services are now a critical part of our everyday life. Using these services typically requires us to trust our personal or companys information to a large number of third-party entities. These entities enforce several security measures to avoid unauthorized accesses but data is still stored on common database systems that are designed without data privacy concerns in mind. As a result, data is vulnerable against anyone with direct access to the database, which may be external attackers, malicious insiders, spies or even subpoenas. Building strong data privacy mechanisms on top of common database systems is possible but has a significant impact on the systems resources, computational capabilities and performance. Notably, the amount of useful computation that may be done over strongly encrypted data is close to none, which defeats the purpose of offloading computation to third-party services. In this paper, we propose to shift the need to trust in the honesty and security of service providers to simply trust that they will not collude. This is reasonable as cloud providers, being competitors, do not share data among themselves. We focus on NoSQL databases and present SafeRegions, a novel prototype of a distributed and secure NoSQL database that is built on top of HBase and that guarantees strong data privacy while still providing most of HBases query capabilities. Safe Regions relies on secret sharing and multi-party computation techniques to provide a NoSQL database built on top of multiple, non-colluding service providers that appear as a single one to the user. Strikingly, service providers, individually, cannot disclose any of the users data but, together, are able to offer data storage and processing capabilities. Additionally, we evaluate SafeRegions exposing performance trade-offs imposed by security mechanisms and provide useful insights for future research on performance optimization.