Is this you? Create Your Porfile

Vason P. Srini

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vason P. Srini is active.

Explore More

Publication

Featured researches published by Vason P. Srini.

design automation conference | 1998

A multiprocessor DSP system using PADDI-2

Roy A. Sutton; Vason P. Srini; Jan M. Rabaey

We have integrated an image processing system built around PADDI-2, a custom 48 node MIMD parallel DSP. The system includes image processing algorithms, a graphical SFG tool, a simulator, routing tools, compilers, hardware configuration and debugging tools, application development libraries, and software implementations for hardware verification. The system board, connected to a SPARCstation via a custom Sbus controller, contains 384 processors in 8 VLSI chips. The software environment supports a multiprocessor system under development (VGI-1). The software tools and libraries are modular, with implementation dependencies isolated in layered encapsulations.

IEEE Computer | 2006

A vision for supporting autonomous navigation in urban environments

Vason P. Srini

The autonomous navigation systems (ANS), such as autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and unmanned submersible vehicles (USVs), and modern vehicles with actuators, sensors, and computer control perform three basic functions: context gathering using sensors, processing, and action. Most researchers have put all three functions into the ANS or the robot itself to overcome occlusions and handle the environments dynamics. However, this causes the ANS and robotic systems to be bulky and expensive. It also impedes the introduction of vehicles with ANS in urban environments, where they must coexist with existing cars and highways. The approach presented distributes the context-gathering and processing functions using sensor networks and wireless communications technologies to reduce costs and make ANS widespread. The system uses sensors mounted on moving vehicles and stationary objects such as lampposts, traffic lights, toll plazas, and buildings to gather information at different levels.

international symposium on microarchitecture | 1985

Compiling Prolog into microcode: a case study using the NCR/32-000

Barry S. Fagin; Yale N. Patt; Vason P. Srini; Alvin M. Despain

A proven method of obtaining high performance for Prolog programs is to first translate them into the instruction set of Warrens Abstract Machine, or W-code [1]. From that point, there are several models of execution available. This paper describes one of them:- the compilation of W-code directly into the vertical microcode of a general purpose host processor, the NCR/32-000. The result is the fastest functioning Prolog system known to the authors. We describe the implementation, provide benchmark measurements, and analyze our results.

international conference on supercomputing | 1988

A two-tier memory architecture for high-performance multiprocessor systems

Tam M. Nguyen; Vason P. Srini; Alvin M. Despain

Performance of high-speed multiprocessor systems is limited by the available bandwidth to memory and the need to synchronize write sharable data. This paper presents a new memory system that separates synchronization related data from others. The memory system has two tiers: synchronization memory and high bandwidth (HB) memory. The synchronization memory consists of snooping caches connected to a bus and is used to store synchronization variables such as locks and semaphores. The HB memory is used to store the bulk of the application program code and data. It contains caches and a high bandwidth interconnection network to memory, such as the crossbar, but does not have full snooping among caches. The two tier memory system has been evaluated by analyzing the memory behavior of the simulated parallel execution of Prolog programs. Initial results indicate that the two tier memory system potentially reduces memory interference and speeds up synchronization. Three different schemes have been studied for the caches on the HB memory and the results are presented. The two-tier memory system has potential applications in areas where synchronization is light to medium and local data is often accessed.

international conference on advanced communication technology | 2006

Data fusion applied on autonomous ground vehicles

Emanuel Taropa; Vason P. Srini; Wong-Jong Lee; Tack-Don Han

Accurate environment sensing has become crucial in various domains ranging from remote control, remote collaboration, navigation in autonomous vehicles and disaster prevention systems. The common characteristics for these applications are: diverse sensors operating at different rates resulting in the heterogeneity of data sources, constant processing time constraints, and real-time decision making processes based on the collected information. A reliable data fusion model must address each of these requirements and must allow the user to constantly monitor the flow of data in the framework. This paper proposes a hierarchical model for data fusion using real time objects and an application program interface (API)

SPIE's International Symposium on Optical Science, Engineering, and Instrumentation | 1998

Parallel DSP with memory and I/O processors

Vason P. Srini; John Thendean; Sain-Zee Ueng; Jan M. Rabaey

The design and implementation of a parallel digital signal processing systems on a chip containing 64 computational processors, 16 memory processors, and 16 I/O processors is described. The processors are interconnected by two levels of segmented buses. Each computational processor has a 16- bit data path and a control unit. The instruction set of the 16-bit processor supports computations on streams of data present in video, graphics, image processing, and digital communication applications. Twos complement arithmetic, saturation arithmetic, and packed instructions are supported. Higher data precision such as 32-bit and 64-bit can be achieved by cascading processors. The instruction memory of each computational processor has sixteen 40-bit words. Data streaming through the processor is manipulated by the instructions in the instruction memory. Multiple operations can be performed in a single cycle in a processor. A handshake protocol is used for synchronization between the sending and receiving processors. Six programmable registers are available in each computational processor for storing data. Each memory processor has a 256 X 16 storage unit for storing additional data. The memory processors can be statically configured as a delay line, FIFO, lookup table or random access memory. For each memory processor there are four FSMs supporting the four configurations. The I/O processors are provided for external communication. Multiple parallel processing chips, digital output from sensors, and SRAM chips can be interconnected using the I/O processors. The VLSI chips implementing the processes is organized as 16 clusters interconnected by a statically programmable hierarchical bus structure. The buses are segmented by programming the switches on the bus. Each cluster has six 16-bit data buses and four 2-bit control buses for supporting communication between four computational processors, one memory processor, and one I/O processor. In addition, adjacent processors can communicate using a bypass bus. The clusters are interconnected by sixteen 16-bit data buses and eight 2-bit control buses. Each cluster has 60 programmable switches to control the communication between the intracluster and intercluster buses. Each processor has 17 programmable switches to control the connections to the intracluster buses.

international conference on systems | 1990

The Aquarius-IIU system

Darren R. Busing; Vason P. Srini; Georges E. Smine; Michael J. Carlton; Alvin M. Despain

A description is given of Aquarius IIU, a complex system integrating a high-performance symbolic microprocessor, an instruction prefetcher, snooping data and instruction caches, a VME bus interface, and a set of controllers. Aquarius IIU is based on the high performance VLSI-PLM chip that runs the Warren abstract machine instruction set. Many of these nodes have been connected using a shared bus to form a multiprocessor which has its own shared memory and snooping caches and is used as a backend Prolog engine to the host (SUN3/160). On every node, there are two controllers per data and instruction cache that cooperate to support Berkeleys snooping cache-lock state protocol, which minimizes bus traffic associated with locking blocks. The nodes share memory using the signals of the VME bus; the page faults and memory management are handled by the host. A top-down method was used in the design of the Aquarius IIU node, while a bottom-up method was used in the simulations. In designing and simulating complex systems such as the Aquarius IIU, the procedure followed was found to be advantageous.<<ETX>>

Microprocessors and Microsystems | 2007

Dynamic power management of DRAM using accessed physical addresses

Jung-Hi Min; Hojung Cha; Vason P. Srini

Power management is an important part of handheld systems such as PDAs, smartphones, and other battery operated digital devices. A handheld system can transition the nodes of a DRAM to low power state to reduce energy consumption. We propose an efficient method for dynamic power management (DPM) of DRAM based on accessed physical addresses. The proposed method also reduces the number of times resynchronization is done. There is no need to collect scattered pages, as in conventional page clustering mechanisms that focus on virtual memory (VM). Simulation result shows that the proposed method reduces Energy*Delay Product by as much as 75% when compared to DRAMs with no DPM.

Parallel and distributed methods for image processing. Conference | 1997

Architecture for web-based image processing

Vason P. Srini; Matt D. Armstrong; Sayf Alalusi; John Thendean; Sain-Zee Ueng; David P. Bushong; Erek S. Borowski; Elaine Chao; Jan M. Rabaey

A computer systems architecture for processing medical images and other data coming over the Web is proposed. The architecture comprises a Java engine for communicating images over the Internet, storing data in local memory, doing floating point calculations, and a coprocessor MIMD parallel DSP for doing fine-grained operations found in video, graphics, and image processing applications. The local memory is shared between the Java engine and the parallel DSP. Data coming from the Web is stored in the local memory. This approach avoids the frequent movement of image data between a host processors memory and an image processors memory, found in many image processing systems. A low-power and high performance parallel DSP architecture containing lots of processors interconnected by a segmented hierarchical network has been developed. The instruction set of the 16-bit processor supports video, graphics, and image processing calculations. Twos complement arithmetic, saturation arithmetic, and packed instructions are supported. Higher data precision such as 32-bit and 64-bit can be achieved by cascading processors. A VLSI chip implementation of the architecture containing 64 processors organized in 16 clusters and interconnected by a statically programmable hierarchical bus is in progress. The buses are segmentable by programming switches on the bus. The instruction memory of each processor has sixteen 40-bit words. Data streaming through the processor is manipulated by the instructions. Multiple operations can be performed in a single cycle in a processor. A low-power handshake protocol is used for synchronization between the sender and the receiver of data. Temporary storage for data and filter coefficients is provided in each chip. A 256 by 16 memory unit is included in each of the 16 clusters. The memory unit can be used as a delay line, FIFO, lookup table or random access memory. The architecture is scalable with technology. Portable multimedia terminals like U.C. Berkeleys InfoPad can be developed using the proposed parallel DSP architecture, color display, pen interface, and wireless network communication for use in clinics, hospitals, homes, offices, and factories.

IEEE Transactions on Microwave Theory and Techniques | 2017

A Wideband All-Digital CMOS RF Transmitter on HDI Interposers With High Power and Efficiency

Nai-Chung Kuo; Bonjern Yang; Angie Wang; Lingkai Kong; Charles Wu; Vason P. Srini; Elad Alon; Borivoje Nikolic; Ali M. Niknejad

This paper demonstrates a wideband CMOS all-digital polar transmitter with flip-chip connection to three high-density-interconnection PCB interposers. The interposers are designed to extract power from a CMOS open-drain inverse Class-D power amplifier core. For a wide frequency range from 0.7 to 3.5 GHz, continuous-wave output power higher than 25.5 dBm and drain efficiency (DE) above 40% are demonstrated. The low-band package achieves a peak power of 29.2 dBm at 1.1 GHz with DE of 60%, the mid-band package outputs 28.8 dBm at 1.5 GHz with DE of 56%, and the high-band package generates 26 dBm at 3 GHz with DE of 49%. The amplitude modulation (AM) is achieved by digitally modulating the switch conductance of the inverse Class-D core, and the on-chip phase modulation is achieved by digitally weighing the in-phase and quadrature bias currents in the IQ mixer. Detailed modulation tests, involving 64 quadrature amplitude modulation (QAM) and 20-MHz WLAN and LTE signals, exhibit excellent power and efficiency at 0.6, 1.2, 1.8, 2.4, 3, and 3.6 GHz. The associated specifications on spectral masks and error vector magnitudes are satisfied.

Explore More