Milos Prvulovic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Milos Prvulovic is active.

Explore More

Publication

Featured researches published by Milos Prvulovic.

international symposium on computer architecture | 2002

ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors

Milos Prvulovic; Zheng Zhang; Josep Torrellas

This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of availability, performance, and hardware cost. ReVive performs checkpointing, logging, and distributed parity protection, all memory-based. It enables recovery from a wide class of errors, including the permanent loss of an entire node. To maintain high performance, ReVive includes specialized hardware that performs frequent operations in the background, such as log and parity updates. To keep the cost low, more complex checkpointing and recovery functions are performed in software, while the hardware modifications are limited to the directory controllers of the machine. Our simulation results on a 16-processor system indicate that the average error-free execution time overhead of using ReVive is only 6.3%, while the achieved availability is better than 99.999% even when the errors occur as often as once per day.

international symposium on microarchitecture | 2002

Cherry: Checkpointed early resource recycling in out-of-order microprocessors

Jose F. Martinez; Jose Renau; Michael C. Huang; Milos Prvulovic; Josep Torrellas

This paper presents checkpointed early resource recycling (Cherry), a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions whose resources have been recycled. Cherry leverages the ROB to (1) not require in-order execution as a fallback mechanism, (2) allow memory replay traps and branch mispredictions without rolling back to the Cherry checkpoint, and (3) quickly fall back to conventional out-of-order execution without rolling back to the checkpoint or flushing the pipeline. We present a Cherry implementation with early recycling at three different points of the execution engine: the load queue, the store queue, and the register file. We report average speedups of 1.06 and 1.26 in SPECint and SPECfp applications, respectively, relative to an aggressive conventional architecture. We also describe how Cherry and speculative multithreading can be combined and complement each other.

international symposium on computer architecture | 2003

ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Milos Prvulovic; Josep Torrellas

While removing software bugs consumes vast amounts of human time, hardware support for debugging in modern computers remains rudimentary. Fortunately, we show that mechanisms for Thread-Level Speculation (TLS) can be reused to boost debugging productivity. Most notably, TLSs rollback capabilities can be extended to support rolling back recent buggy execution and repeating it as many times as necessary until the bug is fully characterized. These incremental re-executions are deterministic even in multithreaded codes. Importantly, this operation can be done automatically on the fly, and is compatible with production runs.As a specific implementation of a TLS-based debugging framework, we introduce ReEnact. ReEnact targets a particularly hairy class of bugs: data races in multithreaded programs. ReEnact extends the communication monitoring mechanisms in TLS to also detect data races. It extends TLSs rollback capabilities to be able to roll back and deterministically re-execute the code with races to obtain the race signature. Finally, the signature is compared to a library of race patterns and, if a match occurs, the execution may be repaired. Overall, ReEnact successfully detects, characterizes, and often repairs races automatically on the fly. Moreover, it is fully compatible with always-on use in production runs: the slowdown of race-free execution with ReEnact is on average only 5.8%.

international symposium on computer architecture | 2006

Improving Cost, Performance, and Security of Memory Encryption and Authentication

Chenyu Yan; B. Rogers; D. Englender; D. Solihin; Milos Prvulovic

Protection from hardware attacks such as snoopers and mod chips has been receiving increasing attention in computer architecture. This paper presents a new combined memory encryption/authentication scheme. Our new split counters for counter-mode encryption simultaneously eliminate counter overflow problems and reduce per-block counter size, and we also dramatically improve authentication performance and security by using the Galois/Counter Mode of operation (GCM), which leverages counter-mode encryption to reduce authentication latency and overlap it with memory accesses. Our results indicate that the split-counter scheme has a negligible overhead even with a small (32KB) counter cache and using only eight counter bits per data block. The combined encryption/authentication scheme has an IPC overhead of 5% on average across SPEC CPU 2000 benchmarks, which is a significant improvement over the 20% overhead of existing encryption/authentication schemes.

high-performance computer architecture | 2008

FlexiTaint: A programmable accelerator for dynamic taint propagation

Guru Venkataramani; Ioannis Doudalis; Yan Solihin; Milos Prvulovic

This paper presents FlexiTaint, a hardware accelerator for dynamic taint propagation. FlexiTaint is implemented as an in-order addition to the back-end of the processor pipeline, and the taints for memory locations are stored as a packed array in regular memory. The taint propagation scheme is specified via a software handler that, given the operation and the sourcespsila taints, computes the new taint for the result. To keep performance overheads low, FlexiTaint caches recent taint propagation lookups and uses a filter to avoid lookups for simple common-case behavior. We also describe how to implement consistent taint propagation in a multi-core environment. Our experiments show that FlexiTaint incurs average performance overheads of only 1% for SPEC2000 benchmarks and 3.7% for Splash-2 benchmarks, even when simultaneously following two different taint propagation policies.

high-performance computer architecture | 2007

MemTracker: Efficient and Programmable Support for Memory Access Monitoring and Debugging

Guru Venkataramani; Brandyn Roemer; Yan Solihin; Milos Prvulovic

Memory bugs are a broad class of bugs that is becoming increasingly common with increasing software complexity, and many of these bugs are also security vulnerabilities. Unfortunately, existing software and even hardware approaches for finding and identifying memory bugs have considerable performance overheads, target only a narrow class of bugs, are costly to implement, or use computational resources inefficiently. This paper describes MemTracker, a new hardware support mechanism that can be configured to perform different kinds of memory access monitoring tasks. MemTracker associates each word of data in memory with a few bits of state, and uses a programmable state transition table to react to different events that can affect this state. The number of state bits per word, the events to which MemTracker reacts, and the transition table are all fully programmable. MemTrackers rich set of states, events, and transitions can be used to implement different monitoring and debugging checkers with minimal performance overheads, even when frequent state updates are needed. To evaluate MemTracker, we map three different checkers onto it, as well as a checker that combines all three. For the most demanding (combined) checker, we observe performance overheads of only 2.7% on average and 4.8% worst-case on SPEC 2000 applications. Such low overheads allow continuous (always-on) use of MemTracker-enabled checkers even in production runs

international symposium on microarchitecture | 2007

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

Brian Rogers; Siddhartha Chhabra; Milos Prvulovic; Yan Solihin

In todays digital world, computer security issues have become increasingly important. In particular, researchers have proposed designs for secure processors which utilize hardware-based memory encryption and integrity verification to protect the privacy and integrity of computation even from sophisticated physical attacks. However, currently proposed schemes remain hampered by problems that make them impractical for use in todays computer systems: lack of virtual memory and inter-process communication support as well as excessive storage and performance overheads. In this paper, we propose 1) address independent seed encryption (AISE), a counter-mode based memory encryption scheme using a novel seed composition, and 2) Bonsai Merkle trees (BMT), a novel Merkle tree-based memory integrity verification technique, to eliminate these system and performance issues associated with prior counter-mode memory encryption and Merkle tree integrity verification schemes. We present both a qualitative discussion and a quantitative analysis to illustrate the advantages of our techniques over previously proposed approaches in terms of complexity, feasibility, performance, and storage. Our results show that AISE+BMT reduces the overhead of prior memory encryption and integrity verification schemes from 12% to 2% on average, while eliminating critical system-level problems.

high-performance computer architecture | 2006

CORD: cost-effective (and nearly overhead-free) order-recording and data race detection

Milos Prvulovic

Chip-multiprocessors are becoming the dominant vehicle for general-purpose processing, and parallel software will be needed to effectively utilize them. This parallel software is notoriously prone to synchronization bugs, which are often difficult to detect and repeat for debugging. While data race detection and order-recording for deterministic replay are useful in debugging such problems, only order-recording schemes are lightweight, whereas data race detection support scales poorly and degrades performance significantly. This paper presents our CORD (cost-effective order-recording and data race detection) mechanism. It is similar in cost to prior order-recording mechanisms, but costs considerably less then prior schemes for data race detection. CORD also has a negligible performance overhead (0.4% on average) and detects most dynamic manifestations of synchronization problems (77% on average). Overall, CORD is fast enough to run always (even in performance-sensitive production runs) and provides the support programmers need to deal with the complexities of writing, debugging, and maintaining parallel software for future multi-threaded and multi-core machines.

international symposium on computer architecture | 2001

Removing architectural bottlenecks to the scalability of speculative parallelization

Milos Prvulovic; María Jesús Garzarán; Lawrence Rauchwerger; Josep Torrellas

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.

international conference on supercomputing | 2011

SecureME: a hardware-software approach to full system security

Siddhartha Chhabra; Brian Rogers; Yan Solihin; Milos Prvulovic

With computing increasingly becoming more dispersed, relying on mobile devices, distributed computing, cloud computing, etc. there is an increasing threat from adversaries obtaining physical access to some of the computer systems through theft or security breaches. With such an untrusted computing node, a key challenge is how to provide secure computing environment where we provide privacy and integrity for data and code of the application. We propose SecureME, a hardware-software mechanism that provides such a secure computing environment. SecureME protects an application from hardware attacks by using a secure processor substrate, and also from the Operating System (OS) through memory cloaking, permission paging, and system call protection. Memory cloaking hides data from the OS but allows the OS to perform regular virtual memory management functions, such as page initialization, copying, and swapping. Permission paging extends the OS paging mechanism to provide a secure way for two applications to establish shared pages for inter-process communication. Finally, system call protection applies spatio-temporal protection for arguments that are passed between the application and the OS. Based on our performance evaluation using microbenchmarks, single-program workloads, and multiprogrammed workloads, we found that SecureME only adds a small execution time overhead compared to a fully unprotected system. Roughly half of the overheads are contributed by the secure processor substrate. SecureME also incurs a negligible additional storage overhead over the secure processor substrate.

Explore More