Muralidaran Vijayaraghavan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Muralidaran Vijayaraghavan is active.

Explore More

Publication

Featured researches published by Muralidaran Vijayaraghavan.

international symposium on performance analysis of systems and software | 2008

Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs

Michael Pellauer; Muralidaran Vijayaraghavan; Michael Adler; Arvind; Joel S. Emer

In this paper we explore microprocessor performance models implemented on FPGAs. While FPGAs can help with simulation speed, the increased implementation complexity can degrade model development time. We assess whether a simulator split into closely-coupled timing and functional partitions can address this by easing the development of timing models while retaining fine-grained parallelism. We give the semantics of our simulator partitioning, and discuss the architecture of its implementation on an FPGA. We describe how three timing models of vastly different target processors can use the same functional partition, and assess their performance.

international conference on formal methods and models for co-design | 2007

Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA

Nirav Dave; Kermin Fleming; Myron King; Michael Pellauer; Muralidaran Vijayaraghavan

The first MEMOCODE hardware/software co-design contest posed the following problem: optimize matrix-matrix multiplication in such a way that it is split between the FPGA and PowerPC on a Xilinx Virtex IIPro30. In this paper we discuss our solution, which we implemented on a Xilinx XUP development board with 256 MB of DRAM. The design was done by the five authors over a span of approximately 3 weeks, though of the 15 possible man-weeks, about 9 were actually spent working on this problem. All hardware design was done using Blue-spec SystemVerilog (BSV), with the exception of an imported Verilog multiplication unit, necessary only due to the limitations of the Xilinx FPGA toolflow optimizations.

field programmable gate arrays | 2008

A-Ports: an efficient abstraction for cycle-accurate performance models on FPGAs

Michael Pellauer; Muralidaran Vijayaraghavan; Michael Adler; Arvind; Joel S. Emer

Recently there has been interest in using FPGAs as a platform for cycle-accurate performance models. We discuss how the properties of FPGAs make them a good platform to achieve a performance improvement over software models. Some metrics are developed to gain insight into the strengths and weaknesses of different simulation methodologies. This paper introduces A-Ports, a distributed, efficient simulation scheme for creating cycle-accurate performance models on FPGAs. Finally, we quantitatively demonstrate an average performance improvement of 19% using A-Ports over other FPGA-based simulation schemes

formal methods | 2009

Bounded dataflow networks and latency-insensitive circuits

Muralidaran Vijayaraghavan; Arvind Arvind

We present a theory for modular refinement of Synchronous Sequential Circuits (SSMs) using Bounded Dataflow Networks (BDNs). We provide a procedure for implementing any SSM into an LI-BDN, a special class of BDNs with some good compositional properties. We show that the Latency-Insensitive property of LI-BDNs is preserved under parallel and iterative composition of LI-BDNs. Our theory permits one to make arbitrary cuts in an SSM and turn each of the parts into LI-BDNs without affecting the overall functionality. We can further refine each constituent LI-BDN into another LI-BDN which may take different number of cycles to compute. If the constituent LI-BDN is refined correctly we guarantee that the overall behavior would be cycle-accurate with respect to the original SSM. Thus one can replace, say a 3-ported register file in an SSM by a one-ported register file without affecting the correctness of the SSM. We give several examples to show how our theory supports a generalization of previous techniques for Latency-Insensitive refinements of SSMs.

international conference on formal methods and models for co-design | 2007

From WiFi to WiMAX: Techniques for High-Level IP Reuse across Different OFDM Protocols

Man Cheuk Ng; Muralidaran Vijayaraghavan; Nirav Dave; Arvind; Gopal Raghavan; Jamey Hicks

Orthogonal frequency-division multiplexing (OFDM) has become the preferred modulation scheme for both broadband and high bitrate digital wireless protocols because of its spectral efficiency and robustness against multipath interference. Although the components and overall structure of different OFDM protocols are functionally similar, the characteristics of the environment for which a wireless protocol is designed often result in different instantiations of various components. In this paper, we describe how we can instantiate baseband processoring of two different wireless protocols, namely 802.11a and 802.16 in Bluespec from a highly parameterized code for a generic OFDM protocol. Our approach results in highly reusable IP blocks that can dramatically reduce the time-to-market of new OFDM protocols. One advantage of Bluespec over SystemC is that our code is synthesizable into high quality hardware, which we demonstrate via synthesis results. Using a Viterbi decoder we also demonstrate how parameterization can be used to study area-performance tradeoff in the implementation of a module. Furthermore, parameterized modules and modular composition can facilitate implementation-grounded algorithmic exploration in the design of new protocols.

ACM Transactions on Reconfigurable Technology and Systems | 2009

A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs

Michael Pellauer; Muralidaran Vijayaraghavan; Michael Adler; Arvind; Joel S. Emer

Computer architects need to run cycle-accurate performance models of processors orders of magnitude faster. We discuss why the speedup on traditional multicores is limited, and why FPGAs represent a good vehicle to achieve a dramatic performance improvement over software models. This article introduces A-Port Networks, a simulation scheme designed to expose the fine-grained parallelism inherent in performance models and efficiently exploit them using FPGAs.

international symposium on performance analysis of systems and software | 2012

Fast and cycle-accurate modeling of a multicore processor

Asif Khan; Muralidaran Vijayaraghavan; Silas Boyd-Wickizer; Arvind

An ideal simulator allows an architect to swiftly explore design alternatives and accurately determine their impact on performance. Design exploration requires simulators to be easily modifiable, and accurate performance estimates require detailed models. Unfortunately, detailed modeling not only impacts the ease with which a simulator can be modified, but also the speed at which it can be executed, resulting in fidelity being traded for simulation speed. Although FPGA-based simulators have dramatically higher speed than software simulators, sacrificing fidelity is still common. In this paper we present Arete, an FPGA-based processor simulator, which offers high performance along with accuracy and modifiability. We begin with a cycle-level specification of a multicore architecture which includes realistic in-order cores and detailed models of shared, coherent memory and on-chip network. We then describe how this specification is implemented faithfully and efficiently on FPGAs. Arete delivers a performance of up to 11 MIPS per core. We run a subset of the PARSEC benchmark suite on top of off-the-shelf SMP Linux, and achieve an average performance of 55 MIPS for an 8-core model.We also describe two significant architectural explorations: one involving three different branch predictors and the other requiring major modifications to the cache-coherence protocol.

computer aided verification | 2015

Modular Deductive Verification of Multiprocessor Hardware Designs

Muralidaran Vijayaraghavan; Adam Chlipala; Arvind; Nirav Dave

We present a new framework for modular verification of hardware designs in the style of the Bluespec language. That is, we formalize the idea of components in a hardware design, with well-defined input and output channels; and we show how to specify and verify components individually, with machine-checked proofs in the Coq proof assistant. As a demonstration, we verify a fairly realistic implementation of a multicore shared-memory system with two types of components: memory system and processor. Both components include nontrivial optimizations, with the memory system employing an arbitrary hierarchy of cache nodes that communicate with each other concurrently, and with the processor doing speculative execution of many concurrent read operations. Nonetheless, we prove that the combined system implements sequential consistency. To our knowledge, our memory-system proof is the first machine verification of a cache-coherence protocol parameterized over an arbitrary cache hierarchy, and our full-system proof is the first machine verification of sequential consistency for a multicore hardware design that includes caches and speculative processors.

formal methods | 2010

Design contest overview: Combined architecture for network stream categorization and intrusion detection (CANSCID)

Michael Pellauer; Abhinav Agarwal; Asif Khan; Man Cheuk Ng; Muralidaran Vijayaraghavan; Forrest Brewer; Joel S. Emer

This year we received 8 submissions for our Deep Packet Inspection problem. 6 submissions used FPGAs, and 2 used GP-GPUs. The organizers find it significant that no team submitted a software-only solution that did not use some kind of hardware accelerator— an indication that software alone could not meet the required line rate. This year the contest ended in a tie. Congratulations to the joint winners, Team Sasao Lab and Team Limenators, each having implemented 140 patterns while maintaining line rate. Additionally, Team Sasao Lab was the only team to use an NFA approach rather than DFAs for matching the regular expressions. Full results are given in Table II. The performance of the two winners was verified by the organizers using undisclosed test inputs. The performance of the other teams is self-reported.

international conference on formal methods and models for co design | 2008

High-throughput Pipelined Mergesort

Kermin Fleming; Myron King; Man Cheuk Ng; Asif Khan; Muralidaran Vijayaraghavan

We present an implementation of a high-throughput cryptosorter, capable of sorting an encrypted database of eight megabytes in .15 seconds; 1102 times faster than a software implementation.

Explore More