Suman Mamidi
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Suman Mamidi.
signal processing systems | 2006
Michael J. Schulte; John Glossner; Sanjay Jinturkar; Mayan Moudgill; Suman Mamidi; Stamatis Vassiliadis
Embedded digital signal processors for software defined radio have stringent design constraints including high computational bandwidth, low power consumption, and low interrupt latency. Furthermore, due to rapidly evolving communication standards with increasing code complexity, these processors must be compiler-friendly, so that code for them can quickly be developed in a high-level language. In this paper, we present the design of the Sandblaster Processor, a low-power multithreaded digital signal processor for software defined radio. The processor uses a unique combination of token triggered threading, powerful compound instructions, and SIMD vector operations to provide real-time baseband processing capabilities with very low power consumption. We describe the processor’s architecture and microarchitecture, along with various techniques for achieving high performance and low power dissipation. We also describe the processor’s programming environment and the SB3010 platform, a complete system-on-chip solution for software defined radio. Using a super-computer class vectorizing compiler, the SB3010 achieves real-time performance in software on a variety of communication protocols including 802.11b, GPS, AM/FM radio, Bluetooth, GPRS, and WCDMA. In addition to providing a programmable platform for SDR, the processor also provides efficient support for a wide variety of digital signal processing and multimedia applications.
international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004
Michael J. Schulte; C. John Glossner; Suman Mamidi; Mayan Moudgill; Stamatis Vassiliadis
Embedded digital signal processors for baseband communication systems have stringent design constraints including high computational bandwidth, low power consumption, and low interrupt latency. Furthermore, these processors should be compiler-friendly, so that code for them can quickly be developed in a high-level language. This paper presents the design of a high-performance, low-power digital signal processor for baseband communication systems. The processor uses token triggered threading, SIMD vector processing, and powerful compound instructions to provide real-time baseband processing capabilities with very low power consumption. Using a super-computer class vectorizing compiler, the processor achieves real-time performance on a 2Mbps WCDMA transmission system.
compilers, architecture, and synthesis for embedded systems | 2005
Suman Mamidi; Emily R. Blem; Michael J. Schulte; C. John Glossner; Daniel Iancu; Andrei Iancu; Mayan Moudgill; Sanjay Jinturkar
Software defined radios, which provide a programmable solution for implementing the physical layer processing of multiple communication standards, are widely recognized as one of the most important new technologies for wireless communication systems. Emerging communication standards, however, require tremendous processing capabilities to perform high-bandwidth physical-layer processing in real time. In this paper, we present instruction set extensions for several important communication algorithms including convolutional encoding, Viterbi decoding, turbo decoding, and Reed-Solomon encoding and decoding. The performance benefits of these extensions are evaluated using a supercomputer class vectorizing compiler and the Sandblaster low-power multithreaded processor for software defined radio. The proposed instruction set extensions provide significant performance improvements, while maintaining a high degree of programmability.
application-specific systems, architectures, and processors | 2005
Suman Mamidi; Michael J. Schulte; Daniel Iancu; A. Iancu; John Glossner
Reed-Solomon codes are an important class of error correcting codes used in many applications related to communications and digital storage. The fundamental operations in Reed-Solomon encoding and decoding involve Galois field arithmetic which is not directly supported in general purpose processors. On the other hand, pure hardware implementations of Reed-Solomon coders are not programmable. In this paper, we present a novel algorithm to perform Reed-Solomon encoding. We also propose four new instructions for Galois field arithmetic. We show that by using the instructions, we can speedup Reed-Solomon decoding by a factor of 12 compared to a pure software approach, while still maintaining programmability.
Microprocessors and Microsystems | 2009
Suman Mamidi; Emily R. Blem; Michael J. Schulte; John Glossner; Daniel Iancu; Andrei Iancu; Mayan Moudgill; Sanjay Jinturkar
Software defined radios provide programmable solutions for implementing the physical layer processing of multiple communication standards. Mobile devices implementing these standards require high-performance processors to perform high-bandwidth physical layer processing in real time. In this paper, we present instruction set extensions for several important communication algorithms including cyclic redundancy checking, convolutional encoding, Viterbi decoding, turbo decoding, and Reed-Solomon encoding and decoding. We also present hardware designs for implementing these extensions, along with estimates of their area, critical path delay, and power consumption. The performance benefits of these extensions are evaluated using a supercomputer-class vectorizing compiler and the Sandblaster low-power multithreaded processor for software defined radio. The proposed instruction set extensions provide significant performance improvements at relatively low cost, while maintaining a high degree of programmability.
application-specific systems, architectures, and processors | 2004
Michael J. Schulte; Kai Chirca; John Glossner; Haoran Wang; Suman Mamidi; Pablo Balzola; Stamatis Vassiliadis
We present the design of a carry skip adder that achieves low power dissipation and high-performance operation. The carry skip adders delay and power dissipation are reduced by dividing the adder into variable-sized blocks that balance the delay of inputs to the carry chain. This grouping reduces active power by minimizing extraneous glitches and transitions. Each block also uses highly optimized complementing carry look-ahead logic to reduce delay. Compared to previous designs, the adder architecture decreases power consumption by reducing the number of transistors, logic levels, and glitches. A 32-bit carry skip adder design that uses our approach has been implemented in 130 nm CMOS technology. At 1.2 V and 25 C, the 32-bit adder has a critical path delay of 921 ps and average power dissipation normalized to 600 MHz operation of 0.786 mW. We also present a technique to quickly perform saturating addition, which is useful in a variety of digital signal processing and multimedia applications. Our technique for fast saturation is based on techniques for carry select addition and works particularly well when the input and output operands can have different formats. A 40-bit carry skip adder that uses our technique for fast saturation has critical path delays of 1149 ps in 130 nm technology at 1.2 V and 25 C and 560 ps in 90nm technology at 1.0 V and 25 C. The 40-bit adders average power dissipation normalized to 600 MHz operation is 0.928 mW in 130 nm technology and 0.335 mW in 90 nm technology.
Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06. | 2006
Daniel Iancu; Hua Ye; John Glossner; Michael J. Schulte; Suman Mamidi; Jarmo Takala
In this paper we describe on iterative algorithm for concatenated convolutional Reed-Solomon decoders that improve the spectral efficiency of the communication system by increasing the error correction capabilities and as a consequence lowering the retransmission rate. In our method, the decoding process starts assuming the received code word has at most t errors, where 2t+1 is the Reed-Solomon codes minimum distance. If, after the decoding process, all the syndromes are zero the decoding is successful; otherwise there were more than t errors encountered. At this point, the decoder assumes s erasure positions based on the erasure information coming from the convolutional decoder. If the error locator polynomial has degree equal to r = (2t-s)/2, then most likely the error positions are in the current Galois-field and a second decoding algorithm is performed. Otherwise, s = s+2 erasures are assumed and again the degree of the error locator polynomial is checked. This continues until the maximum number of erasures, 2t, is reached. The Reed-Solomon decoder is executed entirely in software on the Sandbridge processor, which features special instructions for single instruction multiple data (SIMD) Galois field (GF) multiplication and other SIMD operations. Multiple decoder algorithms with different degrees of complexity are stored in external memory, such that for a particular RS data packet the one with less computational complexity can be employed, depending on the error/erasure information. By using our method, the packet retransmission rate is decreased, resulting in improved spectral efficiency. The improved spectral efficiency is reflected by a total link budget improvement of up to one dB
asilomar conference on signals, systems and computers | 2007
Christipher D. Jenkins; Suman Mamidi; Michael J. Schulte; John Glossner
Software-defined radio (SDR) is an emerging technology that facilitates having multiple wireless communication protocols on one device. Previous work has shown that current W-CDMA, GPS, GSM, and WiMAX applications can run on this class of device while consuming significant processing power. Next generation wireless networks require speeds in excess of 50 Mbps. Some of the fastest AES software implementations only achieve 20 Mbps single-threaded performance on our reference platform. In order to have secure software-defined radio, the security processing gap must be addressed. This paper presents instruction set architecture (ISA) extensions for the Sandblaster DSP for AES processing.
application specific systems architectures and processors | 2007
Suman Mamidi; Michael J. Schulte; Daniel Iancu; C. John Glossner
Explicitly multithreaded processors and reconfigurable hardware have individually proven to be useful in the design of wireless communication systems. However, new techniques are needed to satisfy the processing requirements of emerging wireless communication standards, which have high throughput requirements for a wide variety of algorithms. This paper presents an efficient technique for adding reconfigurable functional units, called Polymorphic Hardware Accelerators (PHAs), to multicore, multithreaded Digital Signal Processors (DSPs). This paper discusses architectural support to facilitate management and sharing of PHAs on a multithreaded system. The proposed technique shows an average speedup of 6.8 on important wireless communication algorithms in the EEMBC Telecom Benchmark Suite and Department of Defenses Joint Tactical Radio System Software Communication Architecture (JTRS SCA) Hardware Supplement.
asilomar conference on signals, systems and computers | 2006
Suman Mamidi; Michael J. Schulte; Zaipeng Xie; Mihai Sima; Daniel lancu; John Glossner
Software defined radios provide programmable solutions for implementing the physical layer processing of multiple communication standards. Mobile devices implementing these standards require high performance processors with specialized arithmetic units to perform high bandwidth physical-layer processing in real time. In this paper, we present arithmetic units that improve the performance of several important communication algorithms. The benefits of these arithmetic units and their corresponding instruction set extensions are evaluated using a vectorizing compiler and the Sandblaster low-power multithreaded processor for software defined radio. Instruction set extensions that use these specialized arithmetic units provide significant performance improvements at relatively low cost.