Christopher W. Fletcher
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christopher W. Fletcher.
computer and communications security | 2013
Emil Stefanov; Marten van Dijk; Elaine Shi; Christopher W. Fletcher; Ling Ren; Xiangyao Yu; Srinivas Devadas
We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme for small client storage known to date. We formally prove that Path ORAM requires log^2 N / log X bandwidth overhead for block size B = X log N. For block sizes bigger than Omega(log^2 N), Path ORAM is asymptotically better than the best known ORAM scheme with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.
scalable trusted computing | 2012
Christopher W. Fletcher; Marten van Dijk; Srinivas Devadas
This paper considers encrypted computation where the user specifies encrypted inputs to an untrusted program, and the server computes on those encrypted inputs. To this end we propose a secure processor architecture, called Ascend, that guarantees privacy of data when arbitrary programs use the data running in a cloud-like environment (e.g., an untrusted server running an untrusted software stack). The key idea to guarantee privacy is obfuscated instruction execution; Ascend does not disclose what instruction is being run at any given time, be it an arithmetic instruction or a memory instruction. Periodic accesses to external instruction and data memory are performed through an Oblivious RAM (ORAM) interface to prevent leakage through memory access patterns. We evaluate the processor architecture on SPEC benchmarks running on encrypted data and quantify overheads.
international symposium on performance analysis of systems and software | 2011
Mieszko Lis; Pengju Ren; Myong Hyon Cho; Keun Sup Shim; Christopher W. Fletcher; Omer Khan; Srinivas Devadas
We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11 ×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012
Pengju Ren; Mieszko Lis; Myong Hyon Cho; Keun Sup Shim; Christopher W. Fletcher; Omer Khan; Nanning Zheng; Srinivas Devadas
We present hornet, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on six separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 12×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, parameters driving power, and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple dimension-ordered routing to complex Valiant, ROMM, O1Turn or PROM schemes, BSOR, and adaptive routing. Hornet can run in network-only mode using synthetic traffic or traces, or directly emulate a MIPS-based multicore. Hornet is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/.
theory of cryptography conference | 2016
Srinivas Devadas; Marten van Dijk; Christopher W. Fletcher; Ling Ren; Elaine Shi; Daniel Wichs
We present Onion ORAM, an Oblivious RAM (ORAM) with constant worst-case bandwidth blowup that leverages poly-logarithmic server computation to circumvent the logarithmic lower bound on ORAM bandwidth blowup. Our construction does not require fully homomorphic encryption, but employs an additively homomorphic encryption scheme such as the Damgard-Jurik cryptosystem, or alternatively a BGV-style somewhat homomorphic encryption scheme without bootstrapping. At the core of our construction is an ORAM scheme that has “shallow circuit depth” over the entire history of ORAM accesses. We also propose novel techniques to achieve security against a malicious server, without resorting to expensive and non-standard techniques such as SNARKs. To the best of our knowledge, Onion ORAM is the first concrete instantiation of a constant bandwidth blowup ORAM under standard assumptions (even for the semi-honest setting).
reconfigurable computing and fpgas | 2010
Ilia A. Lebedev; Shaoyi Cheng; Austin Doupnik; James B. Martin; Christopher W. Fletcher; Daniel Burke; Mingjie Lin; John Wawrzynek
We present a Many-core Approach to Reconfigurable Computing (MARC), enabling efficient high-performance computing for applications expressed using parallel programming models such as OpenCL. The MARC system exploits abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core micro architectures. The key benefits of MARC are that it (i) allows programmers to easily express parallelism through the API defined in a high-level programming language, (ii) supports coarse-grain multithreading and dataflow-style fine-grain threading while permitting bit-level resource control, and (iii) greatly reduces the effort required to re-purpose the hardware system for different algorithms or different applications. A MARC prototype machine with 48 processing nodes was implemented using a Virtex-5 (XCV5LX155T-2) FPGA for a well known Bayesian network inference problem. We compare the runtime of the MARC machine against a manually optimized implementation. With fully synthesized application-specific processing cores, our MARC machine comes within a factor of 3 of the performance of a fully optimized FPGA solution but with a considerable reduction in development effort and a significant increase in retarget ability.
architectural support for programming languages and operating systems | 2015
Christopher W. Fletcher; Ling Ren; Albert Kwon; Marten van Dijk; Srinivas Devadas
Oblivious RAM (ORAM) is a cryptographic primitive that hides memory access patterns as seen by untrusted storage. Recently, ORAM has been architected into secure processors. A big challenge for hardware ORAM schemes is how to efficiently manage the Position Map (PosMap), a central component in modern ORAM algorithms. Implemented naively, the PosMap causes ORAM to be fundamentally unscalable in terms of on-chip area. On the other hand, a technique called Recursive ORAM fixes the area problem yet significantly increases ORAMs performance overhead. To address this challenge, we propose three new mechanisms. We propose a new ORAM structure called the PosMap Lookaside Buffer (PLB) and PosMap compression techniques to reduce the performance overhead from Recursive ORAM empirically (the latter also improves the construction asymptotically). Through simulation, we show that these techniques reduce the memory bandwidth overhead needed to support recursion by 95%, reduce overall ORAM bandwidth by 37% and improve overall SPEC benchmark performance by 1.27x. We then show how our PosMap compression techniques further facilitate an extremely efficient integrity verification scheme for ORAM which we call PosMap MAC (PMMAC). For a practical parameterization, PMMAC reduces the amount of hashing needed for integrity checking by >= 68x relative to prior schemes and introduces only 7% performance overhead. We prototype our mechanisms in hardware and report area and clock frequency for a complete ORAM design post-synthesis and post-layout using an ASIC flow in a 32~nm commercial process. With 2 DRAM channels, the design post-layout runs at 1~GHz and has a total area of .47~mm2. Depending on PLB-specific parameters, the PLB accounts for 10% to 26% area. PMMAC costs 12% of total design area. Our work is the first to prototype Recursive ORAM or ORAM with any integrity scheme in hardware.
ieee high performance extreme computing conference | 2013
Ling Ren; Christopher W. Fletcher; Xiangyao Yu; Marten van Dijk; Srinivas Devadas
Oblivious-RAMs (ORAM) are used to hide memory access patterns. Path ORAM has gained popularity due to its efficiency and simplicity. In this paper, we propose an efficient integrity verification layer for Path ORAM, which only imposes 17% latency overhead. We also show that integrity verification is vital to maintaining privacy for recursive Path ORAMs under active adversaries.
cloud computing security workshop | 2013
Xiangyao Yu; Christopher W. Fletcher; Ling Ren; Marten van Dijk; Srinivas Devadas
This paper investigates secure ways to interact with tamper-resistant hardware leaking a strictly bounded amount of information. Architectural support for the interaction mechanisms is studied and performance implications are evaluated. The interaction mechanisms are built on top of a recently-proposed secure processor Ascend[ascend-stc12]. Ascend is chosen because unlike other tamper-resistant hardware systems, Ascend completely obfuscates pin traffic through the use of Oblivious RAM (ORAM) and periodic ORAM accesses. However, the original Ascend proposal, with the exception of main memory, can only communicate with the outside world at the beginning or end of program execution; no intermediate information transfer is allowed. Our system, Stream-Ascend, is an extension of Ascend that enables intermediate interaction with the outside world. Stream-Ascend significantly improves the generality and efficiency of Ascend in supporting many applications that fit into a streaming model, while maintaining the same security level.Simulation results show that with smart scheduling algorithms, the performance overhead of Stream-Ascend relative to an insecure and idealized baseline processor is only 24.5%, 0.7%, and 3.9% for a set of streaming benchmarks in a large dataset processing application. Stream-Ascend is able to achieve a very high security level with small overheads for a large class of applications.
field programmable gate arrays | 2011
Christopher W. Fletcher; Ilia A. Lebedev; Narges Bani Asadi; Daniel Burke; John Wawrzynek
This paper compares an implementation of a Bayesian inference algorithm across several FPGAs and GPGPUs, while embracing both the execution model and high-level architecture of a GPGPU. Our study is motivated by recent work in template-based programming and architectural models for FPGA computing. The comparison we present is meant to demonstrate the FPGAs potential, while constraining the design to follow the microarchitectural template of more programmable devices such as GPGPUs. The FPGA implementation proves capable of matching the performance of a high-end Nvidia Fermi-based GPU - the most advanced GPGPU available to us at the time of this study. Further investigation shows that each FPGA core outperforms workstation GPGPU cores by a factor of ~ 3.14x, and mobile GPGPU cores by ~ 4.25x despite a ~ 4x reduction in core clock frequency. Using these observations, we discuss the efficiency gap between these two platforms, and the challenges associated with template-based programming models.