Murugan Sankaradass | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Murugan Sankaradass is active.

Explore More

Publication

Featured researches published by Murugan Sankaradass.

design automation conference | 2002

System design methodologies for a wireless security processing platform

Srivaths Ravi; Anand Raghunathan; Nachiketh Potlapally; Murugan Sankaradass

Security protocols are critical to enabling the growth of a wide range of wireless data services and applications. However, they impose a high computational burden that is mismatched with the modest processing capabilities and battery resources available on wireless clients. Bridging the security processing gap, while retaining sufficient programmability in order to support a wide range of current and future security protocol standards, requires the use of novel system architectures and design methodologies.We present the system-level design methodology used to design a programmable security processor platform for next-generation wireless handsets. The platform architecture is based on (i) a configurable and extensible processor that is customized for efficient domain-specific processing, and (ii) layered software libraries implementing cryptographic algorithms that are optimized to the hardware platform. Our system-level design methodology enables the efficient co design of optimal cryptographic algorithms and an optimized system architecture. It includes novel techniques for algorithmic exploration and tuning, performance characterization and macro-modeling of software libraries, and architecture refinement based on selection of instruction extensions to accelerate performance-critical, computation-intensive operations. We have designed a programmable security processor platform to support both public-key and private key operations using the proposed methodology, and have evaluated its performance through extensive system simulations as well as hardware prototyping. Our experiments demonstrate large performance improvements (e.g., 31.0X for DES, 33.9X for 3DES, 17.4X for AES, and upto 66.4X for RSA) compared to well-optimized software implementations on a state-of-the-art embedded processor.

field-programmable custom computing machines | 2009

A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines

Srihari Cadambi; Igor Durdanovic; Venkata Jakkula; Murugan Sankaradass; Eric Cosatto; Srimat T. Chakradhar; Hans Peter Graf

We present a massively parallel FPGA-based coprocessor for Support Vector Machines (SVMs), a machine learning algorithm whose applications include recognition tasks such as learning scenes, situations and concepts, and reasoning tasks such as analyzing the recognized scenes and semantics. The coprocessor architecture, targeted at both SVM training and classification, is based on clusters of vector processing elements (VPEs) operating in single-instruction multiple data (SIMD) mode to take advantage of large amounts of data parallelism in the application. We use the FPGA’s DSP elements as parallel multiply-accumulators (MACs), a core computation in SVMs. A key feature of the architecture is that it is customized to low precision arithmetic which permits one DSP unit to perform two or more MACs in parallel. Low precision also reduces the required number of parallel off-chip memory accesses by packing multiple data words on the FPGA-memory bus. We have built a prototype using an off-the-shelf PCI-based FPGA card with a Xilinx Virtex 5 FPGA and 1GB DDR2 memory. For SVM training, we observe application-level end-to-end computation speeds of over 9 billion multiply-accumulates per second (GMACs). For SVM classification, using data packing, the application speed increases to 14 GMACs. The FPGA-based system is about 20x faster than a dual Opteron 2.2 GHz processor CPU, and dissipates around 10W of power.

design automation conference | 2003

CoCo: a hardware/software platform for rapid prototyping of code compression technologies

Haris Lekatsas; Jörg Henkel; Srimat T. Chakradhar; Venkata Jakkula; Murugan Sankaradass

In recent years, instruction code compression/decompression technologies have emerged as an efficient way to: a) reduce the memory usage of an embedded system, b) to improve performance through effective higher bandwidths and/or to c) reduce the overall power consumption of a system processing compressed code. We have presented efficient code compression/decompression techniques and architectures in the past. For the commercialization phase, we designed a novel hardware/software code compression/decompression platform (CoCo). It consists of a software platform that prepares, optimizes, compresses and compiles instruction code and a generic, parameterizable FPGA-based hardware architecture in form of a hardware platform that allows to rapidly evaluate prototypes of diverse compression/decompression technologies. We show the flexibility of CoCo, its ability to achieve code compression ratios (parameterizable) of up to 50% with a slight system performance gain and its ability to apply compression in a real-world compiled code without any limitations where others have made implicit software-restrictive assumptions.

high performance distributed computing | 2013

COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Srihari Cadambi; Giuseppe Coviello; Cheng-Hong Li; Rajat Phull; Kunal Rao; Murugan Sankaradass; Srimat T. Chakradhar

It is remarkably easy to offload processing to Intels newest manycore coprocessor, the Xeon-Phi: it supports a popular ISA (x86-based), a popular OS (Linux) and a popular programming model (OpenMP). Unfortunately, easy portability does not automatically ensure high performance. Additional programmer effort is necessary to leverage the new performance-oriented hardware features. But programmer optimizations alone are insufficient. Multiprocessing is also necessary to improve hardware utilization, and Linux makes it easy for processes to share the manycore coprocessor. However multiprocessing inefficiencies can easily offset gains made by the programmer. Our experiments on a production, high-performance Xeon server with multiple Xeon Phi coprocessors show that multiprocessing on coprocessors not only slows down the processes but also introduces unreliability (some processes crash unexpectedly). We propose a new, user-level middleware called COSMIC that improves performance and reliability of multiprocessing on coprocessors like the Xeon Phi. COSMIC seamlessly fits in the existing Xeon Phi software stack and is transparent to programmers. It manages Xeon Phi processes that execute parallel regions offloaded to the coprocessors. Offloads typically have programmer-driven performance directives like thread and affinity requirements. Unlike the existing Xeon Phi software stack, COSMIC does fair scheduling of both processes and offloads, and takes into account conflicting requirements of offloads belonging to different processes. By doing so, COSMIC has two clear benefits. First, it improves multiprocessing performance by preventing thread and memory oversubscription, by avoiding inter-offload interference and by reducing load imbalance on coprocessors and cores. Second, it increases multiprocessing reliability by exploiting programmer-specified per-process coprocessor memory requirements to completely avoid memory oversubscription and crashes. Our experiments on several representative Xeon Phi workloads show that, in a multiprocessing environment, COSMIC improves average core utilization by up to 3 times, reduces make-span by up to 52%, reduces average process latency (turn-around-time) by 70%, and completely eliminates process crashes.

design automation conference | 2006

Software architecture exploration for high-performance security processing on a multiprocessor mobile SoC

Divya Arora; Anand Raghunathan; Srivaths Ravi; Murugan Sankaradass; Niraj K. Jha; Srimat T. Chakradhar

We present a systematic methodology for exploring the security processing software architecture for a commercial heterogeneous multiprocessor system-on-chip (SoC) for mobile devices. The SoC contains multiple host processors executing applications and a dedicated programmable security processing engine. We developed an exploration methodology to map the code and data of security software libraries onto the platform, with the objective of maximizing the overall application-visible performance. The salient features of the methodology include (i) the use of real performance measurements from a prototyping board that contains the target platform to drive the exploration, (ii) a new data structure access profiling framework that allows us to accurately model the communication overheads involved in offloading a given set; of functions to the security processor, and (iii) an exact branch-and-bound based design space exploration algorithm that determines the best mapping of security library functions and data structures to the host and security processors. We used the proposed framework to map a commercial security library to the target mobile application SoC. The resulting optimized software architecture outperformed several manually-designed software architectures, resulting in up to 12.5times speedup for individual cryptographic operations (encryption, hashing) and 2.2times-6.2times speedup for applications such as a digital rights management (DRM) agent and secure sockets layer (SSL) client. We also demonstrate the applicability of our framework to software architecture exploration in other multiprocessor scenarios

IEEE Transactions on Very Large Scale Integration Systems | 2007

Exploring Software Partitions for Fast Security Processing on a Multiprocessor Mobile SoC

Divya Arora; Anand Raghunathan; Srivaths Ravi; Murugan Sankaradass; Niraj K. Jha; Srimat T. Chakradhar

The functionality of mobile devices, such as cell phones and personal digital assistants (PDAs), has evolved to include various applications where security is a critical concern (secure web transactions, mobile commerce, download and playback of protected audio/video content, connection to corporate private networks, etc.). Security mechanisms (e.g., secure communication protocols) involve cryptographic algorithms, and are often quite computationally intensive, challenging the constrained processing and battery resources of mobile devices. Extensive design effort and aggressive hardware and software optimizations are required to address this challenge. Previous work has addressed the design of hardware architectures (custom accelerators, domain-specific processors, etc.) to accelerate security processing, and many emerging systems-on-chip (SoCs) feature some form of hardware support for security. In this paper, we address the complementary problem of mapping a complex security software library to an SoC platform with security hardware enhancements. We present a systematic methodology for exploring the software architecture for security processing for a commercial heterogeneous multiprocessor SoC for mobile devices. The SoC contains multiple host processors executing applications and a dedicated programmable security processing engine. We developed an exploration methodology to map the code and data of security software libraries onto the platform, with the objective of maximizing the overall application-visible performance. The salient features of the methodology include: 1) the use of real performance measurements from a prototyping board, which contains the target platform, to drive the exploration; 2) a new data structure access profiling framework that allows us to accurately model the communication overheads involved in off loading a given set of functions to the security processor; and 3) an exact branch-and-bound-based design space exploration algorithm that determines the best mapping of security library functions and data structures to the host and security processors. We used the proposed framework to map a commercial security library to the target mobile application SoC. The resulting optimized software architecture outperformed several manually designed software architectures, resulting in up to 12.5 times speed-up for individual cryptographic operations (encryption, hashing) and 2.2-6.2 times speed-up for applications such as a digital rights management (DRM) agent and secure sockets layer (SSL) client. We also demonstrate the applicability of our framework to software architecture exploration in other multiprocessor scenarios.

neural information processing systems | 2008