Robert Karam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert Karam is active.

Explore More

Publication

Featured researches published by Robert Karam.

Proceedings of the IEEE | 2015

Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories

Robert Karam; Ruchir Puri; Swaroop Ghosh; Swarup Bhunia

Content-addressable memory (CAM) and associative memory (AM) are types of storage structures that allow searching by content as opposed to searching by address. Such memory structures are used in diverse applications ranging from branch prediction in a processor to complex pattern recognition. In this paper, we review the emerging challenges and opportunities in implementing different varieties of CAM/AM structures. Beyond-CMOS silicon and nonsilicon memory technologies hold significant promise in implementing dense, fast, and energy-efficient CAM/AM structures. We describe circuit/architecture level implementations of CAM/AM using these technologies, as well as novel applications in different domains, including informatics, text analytics, data mining, and reconfigurable computing platforms.

IEEE Transactions on Very Large Scale Integration Systems | 2015

MAHA: An Energy-Efficient Malleable Hardware Accelerator for Data-Intensive Applications

Somnath Paul; Aswin Raghav Krishna; Wenchao Qian; Robert Karam; Swarup Bhunia

For data-intensive applications, energy expended in on-chip computation constitutes only a small fraction of the total energy consumption. The primary contribution comes from transporting data between off-chip memory and on-chip computing elements-a limitation referred to as the Von-Neumann bottleneck. In such a scenario, improving the compute energy through parallel processing or on-chip hardware acceleration brings minor improvements to the total energy requirement of the system. We note that an effective solution to mitigate the Von-Neumann bottleneck is to develop a framework that enables computing in off-chip nonvolatile memory arrays, where the data reside permanently. In this paper, we present a malleable hardware (MAHA) reconfigurable framework that modifies nonvolatile CMOS-compatible flash memory array for on-demand reconfigurable computing. MAHA is a spatio-temporal mixed-granular hardware reconfigurable framework, which utilizes the memory for storage as well as lookup table-based computation (hence malleable) and uses a low-overhead hierarchical interconnect fabric for communication between processing elements. A detailed design of the malleable hardware together with a comprehensive application mapping flow is presented. Design overheads carefully estimated at the 45-nm technology node indicate that for a set of common kernels, MAHA achieves a 91X improvement in energy efficiency over a software-only solution with negligible impact on memory performance in normal mode. The proposed design changes incur only 6% memory area overhead.

IEEE Transactions on Biomedical Engineering | 2016

Real-Time Classification of Bladder Events for Effective Diagnosis and Treatment of Urinary Incontinence

Robert Karam; Dennis J. Bourbeau; Steve Majerus; Iryna Makovey; Howard B. Goldman; Margot S. Damaser; Swarup Bhunia

Diagnosis of lower urinary tract dysfunction with urodynamics has historically relied on data acquired from multiple sensors using nonphysiologically fast cystometric filling. In addition, state-of-the-art neuromodulation approaches to restore bladder function could benefit from a bladder sensor for closed-loop control, but a practical sensor and automated data analysis are not available. We have developed an algorithm for real-time bladder event detection based on a single in situ sensor, making it attractive for both extended ambulatory bladder monitoring and closed-loop control of stimulation systems for diagnosis and treatment of bladder overactivity. Using bladder pressure data acquired from 14 human subjects with neurogenic bladder, we developed context-aware thresholding, a novel, parameterized, user-tunable algorithmic framework capable of real-time classification of bladder events, such as detrusor contractions, from single-sensor bladder pressure data. We compare six event detection algorithms with both single-sensor and two-sensor systems using a metric termed Conditional Stimulation Score, which ranks algorithms based on projected stimulation efficacy and efficiency. We demonstrate that adaptive methods are more robust against day-to-day variations than static thresholding, improving sensitivity and specificity without parameter modifications. Relative to other methods, context-aware thresholding is fast, robust, highly accurate, noise-tolerant, and amenable to energy-efficient hardware implementation, which is important for mapping to an implant device.

design, automation, and test in europe | 2014

Energy-efficient hardware acceleration through computing in the memory

Somnath Paul; Robert Karam; Swarup Bhunia; Ruchir Puri

Energy-efficiency has emerged as a major barrier to performance scalability for modern processors. We note that significant part of processors energy requirement is contributed by processor-memory communication. To address the energy issue in processors, we propose a novel hardware accelerator framework that transforms high-density memory array into a configurable computing resource to accelerate variety of tasks - both compute- and data-intensive. It exploits the block-based architecture of nanoscale memory to create a spatially connected array of lightweight processors, each of which uses a memory block as its local memory. The proposed framework provides some unique advantages for hardware acceleration compared to conventional accelerators: 1) memory array provides large set of parallel resources with high bandwidth, which can be configured to perform computing in spatio/temporal manner leading to dramatic reduction in processor-memory traffic; 2) it brings the computing engine close to the data, thus drastically minimizing the von Neumann bottleneck; 3) finally, it exploits the advances in memory technologies and integration approaches e.g. 3D integration to achieve better technology scalability compared to alternative reconfigurable accelerator platforms. Simulation results for several data-intensive applications show that the proposed computing approach provides significant improvement in energy-efficiency compared to software while achieving significantly lower hardware overhead.

IEEE Transactions on Very Large Scale Integration Systems | 2016

Energy-Efficient Adaptive Hardware Accelerator for Text Mining Application Kernels

Robert Karam; Ruchir Puri; Swarup Bhunia

Text mining is a growing field of applications, which enables the analysis of large text data sets using statistical methods. In recent years, exponential increase in the size of these data sets has strained existing systems, requiring more computing power, server hardware, networking interconnects, and power consumption. For practical reasons, this trend cannot continue in the future. Instead, we propose a reconfigurable hardware accelerator designed for text analytics systems, which can simultaneously improve performance and reduce power consumption. Situated near the last level of memory, it mitigates the need for high-bandwidth processor-to-memory connections, instead capitalizing on close data proximity, massively parallel operation, and analytic-inspired functional units to maximize energy efficiency, while remaining flexible to easily map common text analytic kernels. A field-programmable gate array-based emulation framework demonstrates the functional correctness of the system, and a full eight-core accelerator is synthesized for power, area, and delay estimates. The accelerator can achieve two to three orders of magnitude improvement in energy efficiency versus CPU and general-purpose graphics processing unit (GPU) for various text mining kernels. As a case study, we demonstrate how indexing performance of Lucene, a popular text search and analytics platform, can be improved by an average of 70% over CPU and GPU while significantly reducing data transfer energy and latency.

international midwest symposium on circuits and systems | 2015

Energy-efficient reconfigurable computing using Spintronic memory

Robert Karam; Kai Yang; Swarup Bhunia

Reconfigurable computing platforms enable rapid prototyping of arbitrary logic, but purely spatial fabrics suffer from issues with scalability and power consumption. Novel reconfigurable frameworks are being developed which similarly allow arbitrary function mapping, but do so with a mixture of spatial and temporal computing, improving scalability and energy efficiency over purely spatial fabrics. Embedded memories within these frameworks enable rapid function evaluation through lookup table operations, making the memory read/write behavior and power consumption critical design considerations. Emerging nonvolatile nanoscale memories demonstrate enhanced cell density, reliability, and read access performance over modern memory devices, promising vast improvements in energy efficiency for memory-based reconfigurable hardware platforms. Using Spintronic memory, an average 5% improvement in EDP over FPGA can be achieved in a memory-based framework, and tailoring the mapping to exploit features of spintronic memory can further improve EDP an average 1.6%.

reconfigurable computing and fpgas | 2016

Robust bitstream protection in FPGA-based systems through low-overhead obfuscation

Robert Karam; Tamzidul Hoque; Sandip Ray; Mark Tehranipoor; Swarup Bhunia

Reconfigurable hardware, such as Field Programmable Gate Arrays (FPGAs), are being increasingly deployed in diverse application areas including automotive systems, critical infrastructures, and the emerging Internet of Things (IoT), to implement customized designs. However, securing FPGA-based designs against piracy, reverse engineering, and tampering is challenging, especially for systems that require remote upgrade. In many cases, existing solutions based on bit-stream encryption may not provide sufficient protection against these attacks. In this paper, we present a novel obfuscation approach for provably robust protection of FPGA bitstreams at low overhead that goes well beyond the protection offered by bitstream encryption. The approach works with existing FPGA architectures and synthesis flows, and can be used with encryption techniques, or by itself for power and area-constrained systems. It leverages “FPGA dark silicon” — unused resources within the configurable logic blocks — to efficiently obfuscate the true functionality. We provide a detailed threat model and security analysis for the approach. We have developed a complete application mapping framework that integrates with the Altera Quartus II software. Using this CAD framework, we achieve provably strong security against all major attacks on FPGA bitstreams with an average 13% latency and 2% total power overhead for a set of benchmark circuits, as well as several large-scale open-source IP blocks on commercial FPGA.

great lakes symposium on vlsi | 2016

Security Primitive Design with Nanoscale Devices: A Case Study with Resistive RAM

Robert Karam; Rui Liu; Pai Yu Chen; Shimeng Yu; Swarup Bhunia

Inherent stochastic physical mechanisms in emerging nonvolatile memories (NVMs), such as resistive random-access-memory (RRAM), have recently been explored for hardware security applications. Unlike the conventional silicon Physical Unclonable Functions (PUFs) that are solely based on manufacturing process variation, RRAM has some intrinsic randomness in its physical mechanisms that can be utilized as entropy sources; for instance, resistance variation, random telegraph noise, and probabilistic switching behaviors. This paper reviews the challenges and opportunities in building security primitives with emerging devices. In particular, it presents research progress of RRAM-based hardware security primitives, including PUF and True Random Number Generator (TRNG).

IEEE Transactions on Multi-Scale Computing Systems | 2016

Design and Validation for FPGA Trust under Hardware Trojan Attacks

Sanchita Mal-Sarkar; Robert Karam; Seetharam Narasimhan; Anandaroop Ghosh; Aswin Raghav Krishna; Swarup Bhunia

Field programmable gate arrays (FPGAs) are being increasingly used in a wide range of critical applications, including industrial, automotive, medical, and military systems. Since FPGA vendors are typically fabless, it is more economical to outsource device production to off-shore facilities. This introduces many opportunities for the insertion of malicious alterations of FPGA devices in the foundry, referred to as hardware Trojan attacks, that can cause logical and physical malfunctions during field operation. The vulnerability of these devices to hardware attacks raises serious security concerns regarding hardware and design assurance. In this paper, we present a taxonomy of FPGA-specific hardware Trojan attacks based on activation and payload characteristics along with Trojan models that can be inserted by an attacker. We also present an efficient Trojan detection method for FPGA based on a combined approach of logic-testing and side-channel analysis. Finally, we propose a novel design approach, referred to as Adapted Triple Modular Redundancy (ATMR), to reliably protect against Trojan circuits of varying forms in FPGA devices. We compare ATMR with the conventional TMR approach. The results demonstrate the advantages of ATMR over TMR with respect to power overhead, while maintaining the same or higher level of security and performances as TMR. Further improvement in overhead associated with ATMR is achieved by exploiting reconfiguration and time-sharing of resources.

great lakes symposium on vlsi | 2014

Trade-off between energy and quality of service through dynamic operand truncation and fusion

Wenchao Qian; Robert Karam; Swarup Bhunia

Energy efficiency has emerged as a major design concern for embedded and portable electronics. Conventional approaches typically impact performance and often require significant design-time modifications. In this paper, we propose a novel approach for improving energy efficiency through judicious fusion of operations. The proposed approach has two major distinctions: (1) the fusion is enabled by operand truncation, which allows representing multiple operations into a reasonably sized lookup table (LUT); and (2) it works for large varieties of functions. Most applications in the domain of digital signal processing (DSP) and graphics can tolerate some computation error without large degradation in output quality. Our approach improves energy efficiency with graceful degradation in quality. The proposed fusion approach can be applied to trade-off energy efficiency with quality at run time and requires virtually no circuit or architecture level modifications in a processor. Using our software tool for automatic fusion and truncation, the effectiveness of the approach is studied for four common applications. Simulation results show promising improvements (19-90\%) in energy delay product with minimal impact on quality.

Explore More