Sérgio Esteves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sérgio Esteves is active.

Explore More

Publication

Featured researches published by Sérgio Esteves.

international conference on parallel processing | 2012

Quality-of-service for consistency of data geo-replication in cloud computing

Sérgio Esteves; João Nuno de Oliveira e Silva; Luís Veiga

Today we are increasingly more dependent on critical data stored in cloud data centers across the world. To deliver high-availability and augmented performance, different replication schemes are used to maintain consistency among replicas. With classical consistency models, performance is necessarily degraded, and thus most highly-scalable cloud data centers sacrifice to some extent consistency in exchange of lower latencies to end-users. More so, those cloud systems blindly allow stale data to exist for some constant period of time and disregard the semantics and importance data might have, which undoubtedly can be used to gear consistency more wisely, combining stronger and weaker levels of consistency. To tackle this inherent and well-studied trade-off between availability and consistency, we propose the use of VFC3, a novel consistency model for replicated data across data centers with framework and library support to enforce increasing degrees of consistency for different types of data (based on their semantics). It targets cloud tabular data stores, offering rationalization of resources (especially bandwidth) and improvement of QoS (performance, latency and availability), by providing strong consistency where it matters most and relaxing on less critical classes or items of data.

International Journal of Big Data Intelligence | 2014

Towards quality-of-service driven consistency for Big Data management

Álvaro García-Recuero; Sérgio Esteves; Luís Veiga

With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.

ieee international conference on cloud computing technology and science | 2013

Quality-of-data for consistency levels in geo-replicated cloud data stores

Álvaro García-Recuero; Sérgio Esteves; Luís Veiga

Cloud computing has recently emerged as a key technology to provide individuals and companies with access to remote computing and storage infrastructures. In order to achieve highly-available yet high-performing services, cloud data stores rely on data replication. However, providing replication brings with it the issue of consistency. Given that data are replicated in multiple geo-graphically distributed data centers, and to meet the increasing requirements of distributed applications, many cloud data stores adopt eventual consistency and therefore allow to run data intensive operations under low latency. This comes at the cost of data staleness. In this paper, we prioritize data replication based on a set of flexible data semantics that can best suit all types of Big Data applications, avoiding overloading both network and systems during large periods of disconnection or partitions in the network. Therefore we integrated these data semantics into the core architecture of a well-known NoSQL data store (e.g., HBase), which leverages a three-dimensional vector-field model (i.e., regarding timeliness, number of pending updates and divergence bounds) to provision data selectively in an on-demand fashion to applications. This enhances the former consistency model by providing a number of required levels of consistency to different applications such as, social networks or ecommerce sites, where priority of updates also differ. In addition, our implementation of the model into HBase allows updates to be tagged and grouped atomically in logical batches, akin to transactions, ensuring atomic changes and correctness of updates as they are propagated.

The Computer Journal | 2016

WaaS: Workflow-as-a-Service for the Cloud with Scheduling of Continuous and Data-Intensive Workflows

Sérgio Esteves; Luís Veiga

Data-intensive and long-lasting applications running in the form of workflows are being increasingly dispatched to cloud computing systems. Current scheduling approaches for graphs of dependencies fail to deliver high resource efficiency while keeping computation costs low, especially for continuous data processing workflows, where the scheduler does not perform any reasoning about the impact new input data may have in the workflow final output. To face such a challenge, we introduce a new scheduling criterion, Quality-of-Data (QoD), which describes the requirements about the data that are worthy of the triggering of tasks in workflows. Based on the QoD notion, we propose a novel service-oriented scheduler planner, for continuous data processing workflows, that is capable of enforcing QoD constraints and guide the scheduling to attain resource efficiency, overall controlled performance and task prioritization. To contrast the advantages of our scheduling model against others, we developed WaaS (Workflow-as-a-Service), a workflow coordinator system for the Cloud where data is shared among tasks via cloud columnar database.

Journal of Parallel and Distributed Computing | 2015

Incremental dataflow execution, resource efficiency and probabilistic guarantees with Fuzzy Boolean nets

Sérgio Esteves; João Nuno de Oliveira e Silva; João Paulo Carvalho; Luís Veiga

Currently, there is a strong need for organizations to analyze and process ever-increasing volumes of data in order to answer to real-time processing demands. Such continuous and data-intensive processing is often achieved through the composition of complex data-intensive workflows (i.e., dataflows).Dataflow management systems typically enforce strict temporal synchronization across the various processing steps. Non-synchronous behavior often has to be explicitly programmed on an ad-hoc basis, which requires additional lines of code in programs and thus the possibility of errors. More so, in a large set of scenarios for continuous and incremental processing, the output of dataflow applications at each execution can suffer almost no difference when comparing to the previous execution, and therefore resources, energy and computational power are unknowingly wasted.To face such lack of efficiency, transparency, and generality, we introduce the notion of Quality-of-Data (QoD), which describes the level of changes required on a data store that cause the triggering of processing steps. This, so that the dataflow (re-)execution is reduced until its outcome would reach a significant and meaningful variation, which is inside a specified freshness limit.Based on the QoD notion, we propose a novel dataflow model, with framework (Fluxy), for orchestrating data-intensive processing steps, which communicate data via a NoSQL storage, and whose triggering semantics is driven by dynamic QoD constraints automatically defined for different datasets by means of Fuzzy Boolean Nets. These nets give probabilistic guarantees about the prediction of the cumulative error between consecutive dataflow executions. With Fluxy, we demonstrate how dataflows can be leveraged to respond to quality boundaries (that can be seen as SLAs) to deliver controlled and augmented performance, rationalization of resources, and task prioritization. We offer a framework for resource efficient continuous and data intensive workflows.We are able to learn correlations between dataflow input and final output.We avoid re-executions when input data is predicted not to be impactful to the output.We ensure dataflow correctness within a small error constant.We achieve controlled performance, task prioritization and high resource efficiency.

Journal of Internet Services and Applications | 2013

Fluχ: a quality-driven dataflow model for data intensive computing

Sérgio Esteves; João Nuno de Oliveira e Silva; Luís Veiga

Today, there is a growing need for organizations to continuously analyze and process large waves of incoming data from the Internet. Such data processing schemes are often governed by complex dataflow systems, which are deployed atop highly-scalable infrastructures that need to manage data efficiently in order to enhance performance and alleviate costs.Current workflow management systems enforce strict temporal synchronization among the various processing steps; however, this is not the most desirable functioning in a large number of scenarios. For example, considering dataflows that continuously analyze data upon the insertion/update of new entries in a data store, it would be wise to assess the level of modifications in data, before the trigger of the dataflow, that would minimize the number of executions (processing steps), reducing overhead and augmenting performance, while maintaining the dataflow processing results within certain coverage and freshness limit.Towards this end, we introduce the notion of Quality-of-Data (QoD), which describes the level of modifications necessary on a data store to trigger processing steps, and thus conveying in the level of performance specified through data requirements. Also, this notion can be specially beneficial in cloud computing, where a dataflow computing service (SaaS) may provide certain QoD levels for different budgets.In this article we propose Fluχ, a novel dataflow model, with framework and programming library support, for orchestrating data-based processing steps, over a NoSQL data store, whose triggering is based on the evaluation and dynamic enforcement of QoD constraints that are defined (and possibly adjusted automatically) for different sets of data. With Fluχ we demonstrate how dataflows can be leveraged to respond to quality boundaries that bring controlled and augmented performance, rationalization of resources, and task prioritization.

international conference on management of data | 2017

Empowering Stream Processing through Edge Clouds

Sérgio Esteves; Nico Janssens; Bart Theeten; Luís Veiga

CHive is a new streaming analytics platform to run distributed SQL-style queries on edge clouds. However, CHive is currently tightly coupled to a specific stream processing system (SPS), Apache Storm. In this paper we address the decoupling of the CHive query planner and optimizer from the runtime environment, and also extend the latter to support pluggable runtimes through a common API. As runtimes, we currently support Apache Spark and Flink streaming. The fundamental contribution of this paper is to assess the cost of employing interstream parallelism in SPS. Experimental evaluation indicates that we can enable popular SPS to be distributed on edge clouds with stable overhead in terms of throughput

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems" | 2017

SmartGC: Online Memory Management Prediction for PaaS Cloud Models

José Simão; Sérgio Esteves; Luís Veiga

In Platform-as-a-Service clouds (public and private) an efficient resource management of several managed runtimes involves limiting the heap size of some VMs so that extra memory can be assigned to higher priority workloads. However, this should not be done in an application-oblivious way because performance degradation must be minimized. Also, each tenant tends to repeat the execution of applications with similar memory-usage patterns, giving opportunity to reuse parameters known to work well for a given workload. This paper presents SmartGC, a system to determine, at runtime, the best values for critical heap management parameters of JVMs. SmartGC comprises two main phases: (1) a training phase where it collects, with different heap resizing policies, representative execution metrics during the lifespan of a workload; and (2) an execution phase where it matches the execution parameters of new workloads against those of already seen workloads, and enforces the best heap resizing policy. Distinctly from other works, this is done without a previous analysis of unknown workloads. Using representative applications, we show that our approach can lead to memory savings, even when compared with a state-of-the-art virtual machine - OpenJDK. Furthermore, we show that we can predict with high accuracy the best heap policy in a relatively short period of time and with a negligible runtime overhead. Although we focus on the heap resizing, this same approach could also be used to adapt other parameters or even the GC algorithm.

The Computer Journal | 2015

C 3 P: A Re-Configurable Framework to Design Cycle-sharing Computing Cloud Platforms

Sérgio Esteves; Paulo Ferreira; Luís Veiga

A new era of High-Performance Computing has been coming about during the last decade. The overabundance of resources lying idle throughout the Internet, for long periods of time, calls for resource-sharing infrastructures operating in the settings of the Cluster, Grid, P2P and Cloud. Many organizations own grids, frequently underutilized, but impose several restrictions to their usage by outside users. Despite the already extensive study in the field of Grid and Cloud computing, no solution was ever successful in reaching out to typical home users and their resource-intensive commodity applications. This is especially lacking in an open environment with no cost and low access barriers (e.g. authentication, configuration). We propose C 3 P, a comprehensive distributed cycle-sharing framework for enabling the sharing of computational resources in a decentralized and free computing cloud platform, across large-scale networks and thus improve the performance of commonly used applications. C 3 P encompasses the following activities: application adaptation, job scheduling, resource discovery, reliability of job results and overlay network management. C 3 P evaluation shows that any ordinary Internet user is able to easily and effectively take advantage of remote resources, namely CPU cycles, for their own benefit; or provide spare cycles to other users, getting incentives in return, in a free, yet fair and managed global infrastructure.

international conference on service oriented computing | 2013

Planning and Scheduling Data Processing Workflows in the Cloud with Quality-of-Data Constraints

Sérgio Esteves; Luís Veiga

Data-intensive and long-lasting applications running in the form of workflows are being increasingly more dispatched to cloud computing systems. Current scheduling approaches for graphs of dependencies fail to deliver high resource efficiency while keeping computation costs low, especially for continuous data processing workflows, where the scheduler does not perform any reasoning about the impact new input data may have in the workflow final output. To face such stark challenge, we introduce a new scheduling criterion, Quality-of-Data (QoD), which describes the requirements about the data that worth the triggering of tasks in workflows. Based on the QoD notion, we propose a novel service-oriented scheduler planner, for continuous data processing workflows, that is capable of enforcing QoD constraints and guide the scheduling to attain resource efficiency, overall controlled performance, and task prioritization. To contrast the advantages of our scheduling model against others, we developed WaaS (Workflow-as-a-Service), a workflow coordinator system for the Cloud where data is shared among tasks via cloud columnar database.

Explore More