Is this you? Create Your Porfile

Myron King

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Myron King is active.

Explore More

Publication

Featured researches published by Myron King.

international symposium on computer architecture | 2015

BlueDBM: an appliance for big data analytics

Sang Woo Jun; Ming Liu; Sungjin Lee; Jamey Hicks; John Ankcorn; Myron King; Shuotao Xu; Arvind

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.

international conference on formal methods and models for co-design | 2007

Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

Nirav Dave; Kermin Fleming; Myron King; Michael Pellauer; Muralidaran Vijayaraghavan

The first MEMOCODE hardware/software co-design contest posed the following problem: optimize matrix-matrix multiplication in such a way that it is split between the FPGA and PowerPC on a Xilinx Virtex IIPro30. In this paper we discuss our solution, which we implemented on a Xilinx XUP development board with 256 MB of DRAM. The design was done by the five authors over a span of approximately 3 weeks, though of the 15 possible man-weeks, about 9 were actually spent working on this problem. All hardware design was done using Blue-spec SystemVerilog (BSV), with the exception of an imported Verilog multiplication unit, necessary only due to the limitations of the Xilinx FPGA toolflow optimizations.

architectural support for programming languages and operating systems | 2012

Automatic generation of hardware/software interfaces

Myron King; Nirav Dave; Arvind

Enabling new applications for mobile devices often requires the use of specialized hardware to reduce power consumption. Because of time-to-market pressure, current design methodologies for embedded applications require an early partitioning of the design, allowing the hardware and software to be developed simultaneously, each adhering to a rigid interface contract. This approach is problematic for two reasons: (1) a detailed hardware-software interface is difficult to specify until one is deep into the design process, and (2) it prevents the later migration of functionality across the interface motivated by efficiency concerns or the addition of features. We address this problem using the Bluespec Codesign Language~(BCL) which permits the designer to specify the hardware-software partition in the source code, allowing the compiler to synthesize efficient software and hardware along with transactors for communication between the partitions. The movement of functionality across the hardware-software boundary is accomplished by simply specifying a new partitioning, and since the compiler automatically generates the desired interface specifications, it eliminates yet another error-prone design task. In this paper we present BCL, an extension of a commercially available hardware design language (Bluespec SystemVerilog), a new software compiling scheme, and preliminary results generated using our compiler for various hardware-software decompositions of an Ogg Vorbis audio decoder, and a ray-tracing application.

field-programmable logic and applications | 2013

Generating infrastructure for FPGA-accelerated applications

Myron King; Asif Khan; Abhinav Agarwal; Oriol Arcas; Arvind

Whether for use as the final target or simply a rapid prototyping platform, programming systems containing FPGAs is challenging. Some of the difficulty is due to the difference between the models used to program hardware and software, but great effort is also required to coordinate the simultaneous execution of the application running on the microprocessor with the accelerated kernel(s) running on the FPGA. In this paper we present a new methodology and programming model for introducing hardware-acceleration to an application running in software. The application is represented as a data-flow graph and the computation at each node in the graph is specified for execution either in software or on the FPGA using the programmers language of choice. We have implemented an interface compiler which takes as its input the FIFO edges of the graph and generates code to connect all the different parts of the program, including those which communicate across the hardware/software boundary. Our methodology and compiler enable programmers to effectively exploit FPGA acceleration without ever leaving the application space.

international conference on formal methods and models for co design | 2008

High-throughput Pipelined Mergesort

Kermin Fleming; Myron King; Man Cheuk Ng; Asif Khan; Muralidaran Vijayaraghavan

We present an implementation of a high-throughput cryptosorter, capable of sorting an encrypted database of eight megabytes in .15 seconds; 1102 times faster than a software implementation.

formal methods | 2011

Verification of microarchitectural refinements in rule-based systems

Nirav Dave; Michael Katelman; Myron King; Arvind; José Meseguer

Microarchitectural refinements are often required to meet performance, area, or timing constraints when designing complex digital systems. While refinements are often straightforward to implement, it is difficult to formally specify the conditions of correctness for those which change cycle-level timing. As a result, in the later stages of design only those changes are considered that do not affect timing and whose verification can be automated using tools for checking FSM equivalence. This excludes an essential class of microarchitectural changes, such as the insertion of a register in a long combinational path to meet timing. A design methodology based on guarded atomic actions, or rules, offers an opportunity to raise the notion of correctness to a more abstract level. In rule-based systems, many useful refinements can be expressed simply by breaking a single rule into smaller rules which execute the original operation in multiple steps. Since the smaller rule executions can be interleaved with other rules, the verification task is to determine that no new behaviors have been introduced. We formalize this notion of correctness and present a tool based on SMT solvers that can automatically prove that a refinement is correct, or provide concrete information as to why it is not correct. With this tool, a larger class of refinements at all stages of the design process can be verified easily. We demonstrate the use of our tool in proving the correctness of the refinement of a processor pipeline from four stages to five.

ACM Transactions on Computer Systems | 2016

BlueDBM: Distributed Flash Storage for Big Data Analytics

Sang Woo Jun; Ming Liu; Sungjin Lee; Jamey Hicks; John Ankcorn; Myron King; Shuotao Xu; Arvind

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5p to 10p of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics.

formal methods | 2009

Implementing a fast cartesian-polar matrix interpolator

Abhinav Agarwal; Nirav Dave; Kermin Fleming; Asif Khan; Myron King; Man Cheuk Ng; Muralidaran Vijayaraghavan

The 2009 MEMOCODE Hardware/Software Co-Design Contest assignment was the implementation of a cartesian-to-polar matrix interpolator. We discuss our hardware and software design submissions.

international conference on computer design | 2007

Continual hashing for efficient fine-grain state inconsistency detection

Jae W. Lee; Myron King; Krste Asanovic

Transaction-level modeling (TLM) allows a designer to save functional verification effort during the modular refinement of an SoC by reusing the prior implementation of a module as a golden model for state inconsistency detection. One problem in simulation-based verification is the performance and bandwidth overhead of state dump and comparison between two models. In this paper, we propose an efficient fine-grain state inconsistency detection technique that checks the consistency of two states of arbitrary size at sub- transaction (tick) granularity using incremental hashes. At each tick, the hash generates a signature of the entire state, which can be efficiently updated and compared. We evaluate the proposed signature scheme with a FIR filter and a Vorbis decoder and show that very fine-grain state consistency checking is feasible. The hash signature checking increases execution time of Bluespec RTL simulation by 1.2% for the FIR filter and by 2.2% for the Verbis decoder while correctly detecting any injected state inconsistency.

field programmable gate arrays | 2015