Cristiano Pereira | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristiano Pereira is active.

Explore More

Publication

Featured researches published by Cristiano Pereira.

design automation conference | 2004

Leakage aware dynamic voltage scaling for real-time embedded systems

Ravindra Jejurikar; Cristiano Pereira; Rajesh K. Gupta

A five-fold increase in leakage current is predicted with each technology generation. While Dynamic Voltage Scaling (DVS) is known to reduce dynamic power consumption, it also causes increased leakage energy drain by lengthening the interval over which a computation is carried out. Therefore, for minimization of the total energy, one needs to determine an operating point, called the critical speed. We compute processor slowdown factors based on the critical speed for energy minimization. Procrastination scheduling attempts to maximize the duration of idle intervals by keeping the processor in a sleep/shutdown state even if there are pending tasks, within the constraints imposed by performance requirements. Our simulation experiments show that the critical speed slowdown results in up to 5% energy gains over a leakage oblivious dynamic voltage scaling. Procrastination scheduling scheme extends the sleep intervals to up to 5 times, resulting in up to an additional 18% energy gains, while meeting all timing requirements.

symposium on code generation and optimization | 2010

PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs

Harish Patil; Cristiano Pereira; Mack Stallcup; Gregory Lueck; James Cownie

Analysis of parallel programs is hard mainly because their behavior changes from run to run. We present an execution capture and deterministic replay system that enables repeatable analysis of parallel programs. Our goal is to provide an easy-to-use framework for capturing, deterministically replaying, and analyzing execution of large programs with reasonable runtime and disk usage. Our system, called PinPlay, is based on the popular Pin dynamic instrumentation system hence is very easy to use. PinPlay extends the capability of Pin-based analysis by providing a tool for capturing one execution instance of a program (as log files called pinballs) and by allowing Pin-based tools to run off the captured execution. Most Pintools can be trivially modified to work off pinballs thus doing their usual analysis but with a guaranteed repeatability. Furthermore, the capture/replay works across operating systems (Windows to Linux) as the pinball format is independent of the operating system. We have used PinPlay to analyze and deterministically debug large parallel programs running trillions of instructions. This paper describes the design of PinPlay and its applications for analyses such as simulation point selection, tracing, and debugging.

architectural support for programming languages and operating systems | 2006

Recording shared memory dependencies using strata

Satish Narayanasamy; Cristiano Pereira; Brad Calder

Significant time is spent by companies trying to reproduce and fix bugs. BugNet and FDR are recent architecture proposals that provide architecture support for deterministic replay debugging. They focus on continuously recording information about the programs execution, which can be communicated back to the developer. Using that information, the developer can deterministically replay the programs execution to reproduce and fix the bugs.In this paper, we propose using Strata to efficiently capture the shared memory dependencies. A stratum creates a time layer across all the logs for the running threads, which separates all the memory operations executed before and after the stratum. A strata log allows us to determine all the shared memory dependencies during replay and thereby supports deterministic replay debugging for multi-threaded programs.

conference on object-oriented programming systems, languages, and applications | 2012

Maple: a coverage-driven testing tool for multithreaded programs

Jie Yu; Satish Narayanasamy; Cristiano Pereira; Gilles Pokam

Testing multithreaded programs is a hard problem, because it is challenging to expose those rare interleavings that can trigger a concurrency bug. We propose a new thread interleaving coverage-driven testing tool called Maple that seeks to expose untested thread interleavings as much as possible. It memoizes tested interleavings and actively seeks to expose untested interleavings for a given test input to increase interleaving coverage. We discuss several solutions to realize the above goal. First, we discuss a coverage metric based on a set of interleaving idioms. Second, we discuss an online technique to predict untested interleavings that can potentially be exposed for a given test input. Finally, the predicted untested interleavings are exposed by actively controlling the thread schedule while executing for the test input. We discuss our experiences in using the tool to expose several known and unknown bugs in real-world applications such as Apache and MySQL.

programming language design and implementation | 2014

Race detection for event-driven mobile applications

Chun-Hung Hsiao; Jie Yu; Satish Narayanasamy; Ziyun Kong; Cristiano Pereira; Gilles Pokam; Peter M. Chen; Jason Flinn

Mobile systems commonly support an event-based model of concurrent programming. This model, used in popular platforms such as Android, naturally supports mobile devices that have a rich array of sensors and user input modalities. Unfortunately, most existing tools for detecting concurrency errors of parallel programs focus on a thread-based model of concurrency. If one applies such tools directly to an event-based program, they work poorly because they infer false dependencies between unrelated events handled sequentially by the same thread. In this paper we present a race detection tool named CAFA for event-driven mobile systems. CAFA uses the causality model that we have developed for the Android event-driven system. A novel contribution of our model is that it accounts for the causal order due to the event queues, which are not accounted for in past data race detectors. Detecting races based on low-level races between memory accesses leads to a large number of false positives. CAFA overcomes this problem by checking for races between high-level operations. We discuss our experience in using CAFA for finding and understanding a number of known and unknown harmful races in open-source Android applications.

measurement and modeling of computer systems | 2006

Automatic logging of operating system effects to guide application-level architecture simulation

Satish Narayanasamy; Cristiano Pereira; Harish Patil; Robert Cohn; Brad Calder

Modern architecture research relies heavily on application-level detailed pipeline simulation. A time consuming part of building a simulator is correctly emulating the operating system effects, which is required even if the goal is to simulate just the application code, in order to achieve functional correctness of the applications execution. Existing application-level simulators require manually hand coding the emulation of each and every possible system effect (e.g., system call, interrupt, DMA transfer) that can impact the applications execution. Developing such an emulator for a given operating system is a tedious exercise, and it can also be costly to maintain it to support newer versions of that operating system. Furthermore, porting the emulator to a completely different operating system might involve building it all together from scratch.In this paper, we describe a tool that can automatically log operating system effects to guide architecture simulation of application code. The benefits of our approach are: (a) we do not have to build or maintain any infrastructure for emulating the operating system effects, (b) we can support simulation of more complex applications on our application-level simulator, including those applications that use asynchronous interrupts, DMA transfers, etc., and (c) using the system effects logs collected by our tool, we can deterministically re-execute the application to guide architecture simulation that has reproducible results.

IEEE Transactions on Very Large Scale Integration Systems | 2005

Energy-aware wireless systems with adaptive power-fidelity tradeoffs

Vijay Raghunathan; Cristiano Pereira; Mani B. Srivastava; Rajesh K. Gupta

Wireless networked embedded systems, such as multimedia terminals, sensor nodes, etc., present a rich domain for making energy/performance/quality tradeoffs based on application needs, network conditions, etc. Energy awareness in these systems is the ability to perform tradeoffs between available battery energy and application quality requirements. In this paper, we show how operating system directed dynamic voltage scaling and dynamic power management can provide for such a capability. We propose a real-time scheduling algorithm that uses runtime feedback about application behavior to provide adaptive power-fidelity tradeoffs. We demonstrate our approach in the context of a static priority-based preemptive task scheduler. Simulation results show that the proposed algorithm results in significant energy savings compared to state-of-the-art dynamic voltage scaling schemes with minimal loss in system fidelity. We have implemented our scheduling algorithm into the eCos real-time operating system running on an Intel XScale-based variable voltage platform. Experimental results obtained using this platform confirm the effectiveness of our technique

international symposium on microarchitecture | 2009

Offline symbolic analysis for multi-processor execution replay

Dongyoon Lee; Mahmoud Said; Satish Narayanasamy; Zijiang Yang; Cristiano Pereira

Ability to replay a programs execution on a multi-processor system can significantly help parallel programming. To replay a shared-memory multi-threaded program, existing solutions record its program input (I/O, DMA, etc.) and the shared-memory dependencies between threads. Prior processor based record-and-replay solutions are efficient, but they require non-trivial modifications to the coherency protocol and the memory sub-system for recording the shared-memory dependencies. In this paper, we propose a processor-based record-and-replay solution that does not require detecting and logging shared-memory dependencies to enable multi-processor execution replay. We show that a load-based checkpointing scheme, which was originally proposed for just recording program input, is also sufficient for replaying every thread in a multi-threaded program. Shared-memory dependencies between threads are reconstructed offline, during replay, using an algorithm based on an SMT solver. In addition to saving log space, the proposed solution significantly reduces the complexity of hardware support required for enabling replay.

international symposium on microarchitecture | 2009

Architecting a chunk-based memory race recorder in modern CMPs

Gilles Pokam; Cristiano Pereira; Klaus Danne; Rolf Kassa; Ali-Reza Adl-Tabatabai

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay. This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.

international symposium on software testing and analysis | 2013

Selective mutation testing for concurrent code

Milos Gligoric; Lingming Zhang; Cristiano Pereira; Gilles Pokam

Concurrent code is becoming increasingly important with the advent of multi-cores, but testing concurrent code is challenging. Researchers are developing new testing techniques and test suites for concurrent code, but evaluating these techniques and test suites often uses a small number of real or manually seeded bugs. Mutation testing allows creating a large number of buggy programs to evaluate test suites. However, performing mutation testing is expensive even for sequential code, and the cost is higher for concurrent code where each test has to be executed for many (possibly all) thread schedules. The most widely used technique to speed up mutation testing is selective mutation, which reduces the number of mutants by applying only a subset of mutation operators such that test suites that kill all mutants generated by this subset also kill (almost) all mutants generated by all mutation operators. To date, selective mutation has been used only for sequential mutation operators. This paper explores selective mutation for concurrent mutation operators. Our results identify several sets of concurrent mutation operators that can effectively reduce the number of mutants, show that operator-based selection is slightly better than random mutant selection, and show that sequential and concurrent mutation operators are independent, demonstrating the importance of studying concurrent mutation operators.

Explore More