Sanjive Agarwala
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sanjive Agarwala.
international solid-state circuits conference | 2002
Sanjive Agarwala; P. Koeppen; Timothy D. Anderson; Anthony M. Hill; M. Ales; Raguram Damodaran; Lewis Nardini; P. Wiley; Steven Mullinnix; J. Leach; Anthony J. Lell; Manzur Gill; J. Golston; D. Hoyle; Arjun Rajagopal; Abhijeet Ashok Chachad; M. Agarwala; R. Castille; N. Common; John Apostol; H. Mahmood; Manjeri Krishnan; Duc Quang Bui; Quang-Dieu An; Peter Groves; Luong Nguyen; N.S. Nagaraj; R. Simar
A 600 MHz VLIW DSP, which implements the C64x VelociTI.2/spl trade/ architecture delivers 4800 MIPS, 2400 (16 b) or 4800 (8 b) million multiply accumulates at 0.3 mW/MMAC (16 b). The chip has 64 M transistors and dissipates 718 mW at 600 MHz and 1.2 V, and 200 mW at 300 MHz and 0.9 V. It has an 8-way VLIW DSP core, a 2-level memory system, and 2.4 GB/s I/O bandwidth. The DSP chip is implemented in 0.13 μm CMOS technology with 6-layer copper metalization.
international conference on computer design | 2000
Timothy D. Anderson; Sanjive Agarwala
The increasing level of system-level integration coupled with the higher clock frequency of todays processors is increasing the power consumption of VLSI integrated circuits more rapidly than improvements in IC manufacturing can reduce power consumption. This paper presents a method for reducing the power consumption of DSP processors through the introduction of a two-way decoded loop-cache. By retaining decoded instruction information from two loops, the method has been shown to eliminate an average of 83% of instruction fetches and 84% of instruction decode activity.
international solid-state circuits conference | 2007
Sanjive Agarwala; Arjun Rajagopal; Anthony M. Hill; M. Joshi; Steven Mullinnix; Timothy D. Anderson; Raguram Damodaran; Lewis Nardini; P. Wiley; P. Groves; John Apostol; Manzur Gill; J. Flores; Abhijeet Ashok Chachad; A. Hales; K. Chirca; K. Panda; R. Venkatasubramanian; P. Eyres; R. Veiamuri; A. Rajaram; Manjeri Krishnan; J. Nelson; J. Frade; M. Rahman; N. Mahmood; U. Narasimha; S. Sinha; S. Krishnan; W. Webster
The combined processing power of three 1+GHz DSP cores and 65nm 7M CMOS integration delivers a WCDMA macro base-station on a single chip. The 300M transistor IC can perform up to 24000MIPS, 8000 16b MMACs per second, coupled with symbol-rate and chip-rate acceleration and dissipates less than 6W.
international conference on computer design | 2000
Sanjive Agarwala; Charles L. Fuoco; Timothy D. Anderson; Dave Comisky; Christopher L. Mobley
With the explosion of Digital Signal Processor (DSP) applications, there is a constant requirement for increased processing capability. This in turn requires rapid performance scaling in both operations per cycle and cycles per second, both of which result in increased MIPS/MMACS/MFLOPs. The memory system has to sustain the increased frequency and bandwidth demands in order to meet the data requirements of the DSP. Traditionally, DSP system architectures have on-chip addressable RAM, which is accessible by both the central processing unit (CPU) and the direct memory access (DMA). However, RAM frequencies are not scaling along with CPU clock rates, and as a result only relatively small RAM sizes are able to meet the frequency goals. This is in direct contrast to the increasing program size requirements seen by DSP applications, which in turn require even more on-chip RAM. This paper proposes a solution which has caches and RAMs coexisting in a homogeneous environment and working seamlessly together allowing high frequencies while still maintaining the DSP goals of low cost and low power. This multi-level memory system architecture has been implemented on the Texas Instruments (TI) TMS320C6211 C6x DSP.
international conference on vlsi design | 2012
Raguram Damodaran; Timothy D. Anderson; Sanjive Agarwala; Rama Venkatasubramanian; Michael Gill; Dhileep Gopalakrishnan; Anthony M. Hill; Abhijeet Ashok Chachad; Dheera Balasubramanian; Naveen Bhoria; Jonathan (Son) Hung Tran; Duc Quang Bui; Mujibur Rahman; Shriram D. Moharil; Matthew D. Pierson; Steven Mullinnix; Hung Ong; David Thompson; Krishna Chaithanya Gurram; Oluleye Olorode; Nuruddin Mahmood; Jose Luis Flores; Arjun Rajagopal; Soujanya Narnur; Daniel Wu; Alan Hales; Kyle Peavy; Robert Sussman
The next-generation C66x DSP integrated fixed and floating-point DSP implemented in TSMC 40nm process is presented in this paper. The DSP core runs at 1.25GHz at 0.9V and has a standby power consumption of 800mW. The core transistor count is 21.5 million. The DSP core features 8-way VLIW floating point Data path and a two level memory system and delivers 40 GMACS or 10 GFLOPS floating point MAC performance at 1.25GHz.
international conference on vlsi design | 2004
Sanjive Agarwala; Paul Wiley; Arjun Rajagopal; Anthony M. Hill; Raguram Damodaran; Lewis Nardini; Timothy D. Anderson; Steven Mullinnix; Jose Luis Flores; Heping Yue; Abhijeet Ashok Chachad; John Apostol; Kyle Castille; Usha Narasimha; Tod D. Wolf; N. S. Nagaraj; Manjeri Krishnan; Luong Nguyen; Todd Kroeger; Michael Gill; Peter Groves; Bill Webster; Joel J. Graber; Christine Karlovich
The 800MHz System-on-Chip implements the C64x VLIW DSP VelociTI.2/spl trade/ Architecture and delivers 6400 MIPS, 3200 16-bit MMACs, 6400 8-bit MMACs at 0.17 mW/MMAC (8 bit). The chip is implemented in state of the art 90 nm CMOS technology with 7-layer copper metalization. The core dissipates 1080 mW at 800 MHz, 1.2V. The system-on-chip is targeted for high performance wireless infrastructure application. It has an 8-way VLIW DSP core, a 2-level memory system, and an I/O bandwidth of 3.2GB/s.
international symposium on vlsi technology systems and applications | 1991
Hau-Yung Chen; Sanjive Agarwala; Santanu Dutta; Doug Matzke; Patrick W. Bosshart; Steve Lusky; Paul Kollaritsch
Circuit-level optimization techniques such as buffer insertion, gate input reordering and transistor sizing are commonly practiced by circuit designers to deliver high performance circuits. All these techniques are indispensable in shortening the design cycle and in improving the delay performance. The authors describe the algorithms implementing these circuit optimization techniques which allow push-button, high-performance synthesis within the DROID Auto-Full-Custom design system.<<ETX>>
international symposium on circuits and systems | 1994
Sanjive Agarwala; Patrick W. Bosshart
This paper describes an algorithm for doing timing directed circuit optimizations in linear time. An optimal gate input reordering scheme for performance improvement, and a gate sizing scheme for both performance improvement and area reduction is presented. The algorithm uses a combination and extent of partial validity of timing delays in obtaining linear runtime performance. The algorithm has been applied for gate input reordering and BiCMOS deselection in designs targeted for a BiCMOS gate array. At an average in linear time, a 50% reduction in BiCMOS site utilization, and a 5% gain in design performance through reordering has been achieved.<<ETX>>
Archive | 2000
Charles L. Fuoco; David A. Comisky; Sanjive Agarwala; Raguram Damodaran
Archive | 2000
Charles L. Fuoco; Sanjive Agarwala; David A. Comisky; Christopher L. Mobley