Is this you? Create Your Porfile

Andrey Brito

Federal University of Campina Grande

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrey Brito is active.

Explore More

Publication

Featured researches published by Andrey Brito.

ieee international conference on cloud computing technology and science | 2011

Scalable and Low-Latency Data Processing with Stream MapReduce

Andrey Brito; André Martin; Thomas Knauth; Stephan Creutz; Diogo Becker; Stefan Weigert; Christof Fetzer

We present StreamMapReduce, a data processing approach that combines ideas from the popular MapReduce paradigm and recent developments in Event Stream Processing. We adopted the simple and scalable programming model of MapReduce and added continuous, low-latency data processing capabilities previously found only in Event Stream Processing systems. This combination leads to a system that is efficient and scalable, but at the same time, simple from the users point of view. For latency-critical applications, our system allows a hundred-fold improvement in response time. Notwithstanding, when throughput is considered, our system offers a ten-fold per node throughput increase in comparison to Hadoop. As a result, we show that our approach addresses classes of applications that are not supported by any other existing system and that the MapReduce paradigm is indeed suitable for scalable processing of real-time data streams.

distributed event-based systems | 2008

Speculative out-of-order event processing with software transaction memory

Andrey Brito; Christof Fetzer; Heiko Sturzrehm; Pascal Felber

In event stream applications, events flow through a network of components that perform various types of operations, e.g., filtering, aggregation, transformation. When the operation only depends on the input events, one can trivially parallelize its processing by replicating the associated components. This is not possible, however, with stateful components or when there exist dependencies between the events. Parallel versions of a number of simple stream mining operators have been designed, but, in general, complex and user-defined operators are limited by single thread performance. In this paper, we propose leveraging the processing capabilities of multi-core processors to improve the efficiency of stateful components using optimistic parallelization techniques (as provided by transactional memory). We show that, even though some speculative event executions might need to be disregarded, the overall throughput increases noticeably in the general case and latency can be reduced by pre-processing out-of-order events. Moreover, we show how simple conflict predictors can boost the parallelism even more and reduce the amount of resources used for a given level of parallelism.

international conference on distributed computing systems | 2011

Low-Overhead Fault Tolerance for High-Throughput Data Processing Systems

André Martin; Thomas Knauth; Stephan Creutz; Diogo Becker; Stefan Weigert; Christof Fetzer; Andrey Brito

The MapReduce programming paradigm proved to be a useful approach for building highly scalable data processing systems. One important reason for its success is simplicity, including the fault tolerance mechanisms. However, this simplicity comes at a price: efficiency. MapReduces fault tolerance scheme stores too much intermediate information on disk. This inefficiency negatively affects job completion time. Furthermore, this inefficiency in particular forbids the application of MapReduce in near real-time scenarios where jobs need to produce results quickly. In this paper, we discuss an alternative fault tolerance scheme that is inspired by virtual synchrony. The key feature of our approach is a low-overhead deterministic execution. Deterministic execution reduces the amount of persistently stored information. In addition, because persisting intermediate results are no longer required for fault tolerance, we use more efficient communication techniques that considerably improve job completion time and throughput. Our contribution is twofold: (i) we enable the use of MapReduce for jobs ranging from seconds to a few tens of seconds, satisfying these deadlines even in the case of failures, (ii) we considerably reduce the fault tolerance overhead and as such the overhead of MapReduce in general. Our modifications are transparent to the application.

symposium on reliable distributed systems | 2011

Active Replication at (Almost) No Cost

André Martin; Christof Fetzer; Andrey Brito

MapReduce has become a popular programming paradigm in the domain of batch processing systems. Its simplicity allows applications to be highly scalable and to be easily deployed on large clusters. More recently, the MapReduce approach has been also applied to Event Stream Processing (ESP) systems. This approach, which we call StreamMapReduce, enabled many novel applications that require both scalability and low latency. Another recent trend is to move distributed applications to public clouds such as Amazon EC2 rather than running and maintaining private data centers. Most cloud providers charge their customers on an hourly basis rather than on CPU cycles consumed. However, many applications, especially those that process online data, need to limit their CPU utilization to conservative levels (often as low as

symposium on reliable distributed systems | 2009

Multithreading-Enabled Active Replication for Event Stream Processing Operators

Andrey Brito; Christof Fetzer; Pascal Felber

50\%

international conference on distributed computing systems | 2009

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

Andrey Brito; Christof Fetzer; Pascal Felber

) to be able to accommodate natural and sudden load variations without causing unacceptable deterioration in responsiveness. In this paper, we present a new fault tolerance approach based on active replication for StreamMapReduce systems. This approach is cost effective for cloud consumers as well as cloud providers. Cost effectiveness is achieved by fully utilizing the acquired computational resources without performance degradation and by reducing the need for additional nodes dedicated to fault tolerance.

distributed event-based systems | 2014

Predicting energy consumption with StreamMine3G

André Martin; R. R. T. Marinho; Andrey Brito; Christof Fetzer

Event Stream Processing (ESP) systems are very popular in monitoring applications. Algorithmic trading, network monitoring and sensor networks are good examples of applications that rely upon ESP systems. As these systems become larger and more widely deployed, they have to answer increasingly stronger requirements that are often difficult to satisfy. Fault-tolerance is a good example of such a non-trivial requirement. Making ESP operators fault-tolerant can add considerable performance overhead to the application. In this paper, we focus on active replication as an approach to provide fault-tolerance to ESP operators. More precisely, we address the performance costs of active replication for operators in distributed ESP applications.We use a speculation mechanism based on Software Transactional Memory (STM) to achieve the following goals: (i) enable replicas to make progress using optimistic delivery; (ii) enable early forwarding of speculative computation results; (iii) enable active replication of multi-threaded operators using transactional executions. Experimental evaluation shows that, using this combination of mechanisms, one can implement highly efficient fault-tolerant ESP operators.

distributed event-based systems | 2014

Scalable and elastic realtime click stream analysis using StreamMine3G

André Martin; Andrey Brito; Christof Fetzer

Event stream processing (ESP) applications target the real-time processing of huge amounts of data. Events traverse a graph of stream processing operators where the information of interest is extracted. As these applications gain popularity, the requirements for scalability, availability, and dependability increase. In terms of dependability and availability, many applications require a precise recovery, i.e., a guarantee that the outputs during and after a recovery would be the same as if the failure that triggered recovery had never occurred. Existing solutions for precise recovery induce prohibitive latency costs, either by requiring continuous checkpoint or logging (in a passive replication approach) or perfect synchronization between replicas executing the same operations (in an active replication approach). We introduce a novel technique to guarantee precise recovery for ESP applications while minimizing the latency costs as compared to traditional approaches. The technique minimizes latencies via speculative execution in a distributed system. In terms of scalability, the key component of our approach is a modified software transactional memory that provides not only the speculation capabilities but also optimistic parallelization for costly operations.

ieee international conference on cloud computing technology and science | 2012

Analysis of overhead and profitability in nested cloud environments

Josef Spillner; Andrey Brito; Francisco Vilar Brasileiro; Alexander Schill

In this paper, we present our approach on solving the DEBS Grand Challenge using StreamMine3G, a distributed, highly scalable, elastic and fault tolerant ESP system. We will provide an overview about the system architecture of Stream-Mine3G and implementation details of an application aimed at consumption prediction and outlier detection. Using our elastic approach, we can provide an accurate prediction as we can keep a practically unbounded history able to deal with high volume, highly fluctuating workloads. Our system also provides techniques for dealing with incomplete data in the source stream, which is a common problem when processing data from a large number of sources. Finally, we provide performance measurements showing that we are able to process the dataset given as part of the 2014 DEBS Challenge (135 GB) at a throughput of up to 40 kEvents/s.

utility and cloud computing | 2012

A Highly-Virtualising Cloud Resource Broker

Josef Spillner; Andrey Brito; Francisco Vilar Brasileiro; Alexander Schill

Click stream analysis is a common approach for analyzing customer behavior during the navigation through e-commerce or social network sites. Performing such an analysis in real-time opens up new business opportunities as well as increases revenues as recommendations can be generated on the fly making a previously unknown product to the potential customer attractive. As click streams are highly fluctuating as well as must be processed in real time, there is a high demand for Event-Stream-Processing (ESP) engines that are (1) horizontally as well as vertically scalable, (2) elastic in order to cope with the fluctuation in the data stream, and (3) provide efficient state management mechanisms in order to drive such kind of analysis. However, the majority of the nowadays ESP engines such as Apache S4 or Storm provide neither explicit state management nor techniques for elastic scaling. In this paper, we present StreamMine3G, a scalable and elastic ESP engine which provides state management out of the box, scales with the number of nodes as well as cores and improves performance due to a novel delegation mechanisms lowering contention on state as well as network links caused by fluctuations and temporary imbalances in the data streams.

Explore More