Carlos Baquero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carlos Baquero is active.

Explore More

Publication

Featured researches published by Carlos Baquero.

international conference on stabilization safety and security of distributed systems | 2011

Conflict-free replicated data types

Marc Shapiro; Nuno M. Preguiça; Carlos Baquero; Marek Zawirski

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in large-scale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflict-free Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Information Processing Letters | 2007

Scalable Bloom Filters

Paulo Sérgio Almeida; Carlos Baquero; Nuno M. Preguiça; David Hutchison

Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.

IEEE Communications Surveys and Tutorials | 2015

A Survey of Distributed Data Aggregation Algorithms

Paulo Jesus; Carlos Baquero; Paulo Sérgio Almeida

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, which can then be used to direct the execution of other applications. The resulting values are derived by the distributed computation of functions like Count, Sum, and Average. Some application examples deal with the determination of the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

IEEE Transactions on Parallel and Distributed Systems | 2012

Extrema Propagation: Fast Distributed Estimation of Sums and Network Sizes

Carlos Baquero; Paulo Sérgio Almeida; Raquel Menezes; Paulo Jesus

Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result produced at every node.

international conference on tools with artificial intelligence | 2010

Genetic Algorithm with Local Search for Community Mining in Complex Networks

Di Jin; Dongxiao He; Dayou Liu; Carlos Baquero

Detecting communities from complex networks has triggered considerable attention in several application domains. Targeting this problem, a local search based genetic algorithm (GALS) which employs a graph-based representation (LAR) has been proposed in this work. The core of the GALS is a local search based mutation technique. Aiming to overcome the drawbacks of the existing mutation methods, a concept called marginal gene has been proposed, and then an effective and efficient mutation method, combined with a local search strategy which is based on the concept of marginal gene, has also been proposed by analyzing the modularity function. Moreover, in this paper the percolation theory on ER random graphs is employed to further clarify the effectiveness of LAR presentation; A Markov random walk based method is adopted to produce an accurate and diverse initial population; the solution space of GALS will be significantly reduced by using a graph based mechanism. The proposed GALS has been tested on both computer-generated and real-world networks, and compared with some competitive community mining algorithms. Experimental result has shown that GALS is highly effective and efficient for discovering community structure.

international conference on principles of distributed systems | 2008

Interval Tree Clocks

Paulo Sérgio Almeida; Carlos Baquero; Victor Fonte

Causality tracking mechanisms, such as vector clocks and version vectors, rely on mappings from globally unique identifiers to integer counters. In a system with a well known set of entities these ids can be preconfigured and given distinct positions in a vector or distinct names in a mapping. Id management is more problematic in dynamic systems, with large and highly variable number of entities, being worsened when network partitions occur. Present solutions for causality tracking are not appropriate to these increasingly common scenarios. In this paper we introduce Interval Tree Clocks , a novel causality tracking mechanism that can be used in scenarios with a dynamic number of entities, allowing a completely decentralized creation of processes/replicas without need for global identifiers or global coordination. The mechanism has a variable size representation that adapts automatically to the number of existing entities, growing or shrinking appropriately. The representation is so compact that the mechanism can even be considered for scenarios with a fixed number of entities, which makes it a general substitute for vector clocks and version vectors.

symposium on reliable distributed systems | 2014

SwiftCloud: Fault-Tolerant Geo-Replication Integrated all the Way to the Client Machine

Nuno M. Preguiça; Marek Zawirski; Annette Bieniusa; Sérgio Duarte; Valter Balegas; Carlos Baquero; Marc Shapiro

Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client-and server-side storage. We support both mergeable and strongly consistent transactions that target either client or server replicas and provide access to causally-consistent snapshots efficiently. In the presence of infrastructure faults, a client-assisted failover solution allows client execution to resume immediately and seamlessly access consistent snapshots without waiting. We implement this approach in SwiftCloud, the first transactional system to bring geo-replication all the way to the client machine. Example applications show that our programming model is useful across a range of application areas. Our experimental evaluation shows that SwiftCloud provides better fault tolerance and at the same time can improve both latency and throughput by up to an order of magnitude, compared to classical geo-replication techniques.

distributed applications and interoperable systems | 2009

Fault-Tolerant Aggregation by Flow Updating

Paulo Jesus; Carlos Baquero; Paulo Sérgio Almeida

Data aggregation plays an important role in the design of scalable systems, allowing the determination of meaningful system-wide properties to direct the execution of distributed applications. In the particular case of wireless sensor networks, data collection is often only practicable if aggregation is performed. Several aggregation algorithms have been proposed in the last few years, exhibiting different properties in terms of accuracy, speed and communication tradeoffs. Nonetheless, existing approaches are found lacking in terms of fault tolerance. In this paper, we introduce a novel fault-tolerant averaging based data aggregation algorithm. It tolerates substantial message loss (link failures), while competing algorithms in the same class can be affected by a single lost message. The algorithm is based on manipulating flows (in the graph theoretical sense), that are updated using idempotent messages, providing it with unique robustness capabilities. Furthermore, evaluation results obtained by comparing it with other averaging approaches have revealed that it outperforms them in terms of time and message complexity.

international conference on distributed computing systems | 2002

Version stamps-decentralized version vectors

Paulo Sérgio Almeida; Carlos Baquero; Victor Fonte

Version vectors and their variants play a central role in update tracking in optimistic distributed systems. Existing mechanisms for a variable number of participants use a mapping from identities to integers, and rely on some form of global configuration or distributed naming protocol to assign unique identifiers to each participant. These approaches are incompatible with replica creation under arbitrary partitions, a typical mode of operation in mobile or poorly connected environments. We present an update tracking mechanism that overcomes this limitation; it departs from the traditional mapping and avoids the use of integer counters, while providing all the functionality of version vectors in what concerns version tracking.

international symposium on distributed computing | 2004

Bounded version vectors

Jose B. Almeida; Paulo Sérgio Almeida; Carlos Baquero

Version vectors play a central role in update tracking under optimistic distributed systems, allowing the detection of obsolete or inconsistent versions of replicated data. Version vectors do not have a bounded representation; they are based on integer counters that grow indefinitely as updates occur. Existing approaches to this problem are scarce; the mechanisms proposed are either unbounded or operate only under specific settings. This paper examines version vectors as a mechanism for data causality tracking and clarifies their role with respect to vector clocks. Then, it introduces bounded stamps and proves them to be a correct alternative to integer counters in version vectors. The resulting mechanism, bounded version vectors, represents the first bounded solution to data causality tracking between replicas subject to local updates and pairwise symmetrical synchronization.

Explore More