Is this you? Create Your Porfile

Mark C. Jeffrey

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark C. Jeffrey is active.

Explore More

Publication

Featured researches published by Mark C. Jeffrey.

ACM Transactions on Reconfigurable Technology and Systems | 2011

Application-specific signatures for transactional memory in soft processors

Martin Labrecque; Mark C. Jeffrey; J. Gregory Steffan

As reconfigurable computing hardware and in particular FPGA-based systems-on-chip comprise an increasing number of processor and accelerator cores, supporting sharing and synchronization in a way that is scalable and easy to program becomes a challenge. Transactional Memory (TM) is a potential solution to this problem, and an FPGA-based system provides the opportunity to support TM in hardware (HTM). Although there are many proposed approaches to HTM support for ASICs, these do not necessarily map well to FPGAs. In particular in this work we demonstrate that while signature-based conflict detection schemes (essentially bit-vectors) should intuitively be a good match to the bit parallelism of FPGAs, previous approaches result in unacceptable multicycle stalls, operating frequencies, or false-conflict rates. Capitalizing on the reconfigurable nature of FPGA-based systems, we propose an application-specific signature mechanism for HTM conflict detection. Our evaluation uses real and projected FPGA-based soft multiprocessor systems that support HTM and implement threaded, shared-memory network packet processing applications. We find that our application-specific approach: (i) maintains a reasonable operating frequency of 125 MHz, (ii) achieves a 9% to 71% increase in packet throughput relative to signatures with bit selection on a 2-thread architecture, and (iii) allows our HTM to achieve 6%, 54%, and 57% increases in packet throughput on an 8-thread architecture versus a baseline lock-based synchronization for three of four packet processing applications studied, due to reduced false synchronization.

international symposium on microarchitecture | 2015

A scalable architecture for ordered parallelism

Mark C. Jeffrey; Suvinay Subramanian; Cong Yan; Joel S. Emer; Daniel Sanchez

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51–122 × speedups over a single-core system, and outperforms software-only parallel algorithms by 3–18 ×.

acm symposium on parallel algorithms and architectures | 2011

Understanding bloom filter intersection for lazy address-set disambiguation

Mark C. Jeffrey; J. Gregory Steffan

A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and decide set disjointness. This is in contrast with the conventional and well-studied approach of making individual membership queries into the Bloom filter. In this paper we present much-needed probabilistic models for the unconventional application of testing set disjointness using Bloom filters. Consequently, we demonstrate that intersecting Bloom filters requires substantially larger bit-arrays to provide the same probability of false set-overlap as querying into the bit-array. For when intersection is unavoidable, we prove that partitioned Bloom filters require less space than unpartitioned. Finally, we show that for Bloom filters with a single hash function, surprisingly, intersection and querying share the same probability of false set-overlap.

IEEE Micro | 2016

Unlocking Ordered Parallelism with the Swarm Architecture

Mark C. Jeffrey; Suvinay Subramanian; Cong Yan; Joel S. Emer; Daniel Sanchez

The authors present Swarm, a parallel architecture that exploits ordered parallelism, which is abundant but hard to mine with current software and hardware techniques. Swarm programs consist of short tasks, as small as tens of instructions each, with programmer-specified order constraints. Swarm executes tasks speculatively and out of order and efficiently speculates thousands of tasks ahead of the earliest active task to uncover enough parallelism. Several techniques allow Swarm to scale to large core counts and speculation windows. The authors evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm outperforms sequential implementations of these algorithms by 43 to 117 times and state-of-the-art software-only parallel algorithms by 3 to 18 times. Besides achieving near-linear scalability, Swarm programs are almost as simple as their sequential counterparts, because they do not use explicit synchronization.

international symposium on microarchitecture | 2016

Data-centric execution of speculative parallel programs

Mark C. Jeffrey; Suvinay Subramanian; Maleen Abeydeera; Joel S. Emer; Daniel Sanchez

Multicore systems must exploit locality to scale, scheduling tasks to minimize data movement. While locality-aware parallelism is well studied in non-speculative systems, it has received little attention in speculative systems (e.g., HTM or TLS), which hinders their scalability. We present spatial hints, a technique that leverages program knowledge to reveal and exploit locality in speculative parallel programs. A hint is an abstract integer, given when a speculative task is created, that denotes the data that the task is likely to access. We show it is easy to modify programs to convey locality through hints. We design simple hardware techniques that allow a state-of-the-art, tiled speculative architecture to exploit hints by: (i) running tasks likely to access the same data on the same tile, (ii) serializing tasks likely to conflict, and (iii) balancing tasks across tiles in a locality-aware fashion. We also show that programs can often be restructured to make hints more effective. Together, these techniques make speculative parallelism practical on large-scale systems: at 256 cores, hints achieve near-linear scalability on nine challenging applications, improving performance over hint-oblivious scheduling by 3.3× gmean and by up to 16×. Hints also make speculation far more efficient, reducing wasted work by 6.4× and traffic by 3.5× on average.

international symposium on computer architecture | 2017

Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism

Suvinay Subramanian; Mark C. Jeffrey; Maleen Abeydeera; Hyun Ryong Lee; Victor A. Ying; Joel S. Emer; Daniel Sanchez

Most systems that support speculative parallelization, like hardware transactional memory (HTM), do not support nested parallelism. This sacrifices substantial parallelism and precludes composing parallel algorithms. And the few HTMs that do support nested parallelism focus on parallelizing at the coarsest (shallowest) levels, incurring large overheads that squander most of their potential. We present FRACTAL, a new execution model that supports unordered and timestamp-ordered nested parallelism. FRACTAL lets programmers seamlessly compose speculative parallel algorithms, and lets the architecture exploit parallelism at all levels. FRACTAL can parallelize a broader range of applications than prior speculative execution models. We design a FRACTAL implementation that extends the Swarm architecture and focuses on parallelizing at the finest (deepest) levels. Our approach sidesteps the issues of nested parallel HTMs and uncovers abundant fine-grain parallelism. As a result, FRACTAL outperforms prior speculative architectures by up to 88× at 256 cores.

international conference on parallel architectures and compilation techniques | 2017

SAM: Optimizing Multithreaded Cores for Speculative Parallelism

Maleen Abeydeera; Suvinay Subramanian; Mark C. Jeffrey; Joel S. Emer; Daniel Sanchez

This work studies the interplay between multithreaded cores and speculative parallelism (e.g., transactional memory or thread-level speculation). These techniques are often used together, yet they have been developed independently. This disconnect causes major performance pathologies: increasing the number of threads per core adds conflicts and wasted work, and puts pressure on speculative execution resources. These pathologies often squander the benefits of multithreading.We present speculation-aware multithreading (SAM), a simple policy that addresses these pathologies. By coordinating instruction dispatch and conflict resolution priorities, SAM focuses execution resources on work that is more likely to commit, avoiding aborts and using speculation resources more efficiently.We design SAM variants for in-order and out-of-order cores. SAM is cheap to implement and makes multithreaded cores much more beneficial on speculative parallel programs. We evaluate SAM on systems with up to 64 SMT cores. With SAM, 8-threaded cores outperform single-threaded cores by 2.33x on average, while a speculation-oblivious policy yields a 1.85x speedup. SAM also reduces wasted work by 52%.

Archive | 2009