Viktor K. Prasanna | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Viktor K. Prasanna is active.

Explore More

Publication

Featured researches published by Viktor K. Prasanna.

field-programmable custom computing machines | 2001

Fast Regular Expression Matching Using FPGAs

Reetinder P. S. Sidhu; Viktor K. Prasanna

This paper presents an efficient method for finding matches to a given regular expression in given text using FPGAs. To match a regular expression of length n, a serial machine requires 0(2^n) memory and takes 0(1) time per text character. The proposed approach reqiures only 0(n^2) space and still process a text character in 0(1) time (one clock cycle).The improvement is due to the Nondetermineistic Finite Automaton (NFA) used to perform the matching. As far as the authors are aware, this is the first prctical use of a nondeterministic state machine on programmable logic. Furthermore, the paper presents a simple, fast algorithm that quickly constructs the NFA for the given regular expression. Fast NFA construction is crucial because the NFA structure depends on the regular expression, which is known only at runtime. Implementations of the algorithm for conventional FPGAs and the self-reconfigurable Gate Array (SRGA) are described. To evaluate performance, the NFA logic was mapped onto the Virtex XCV100 FPGA and the SRGA. Also, the performance of GNU grep for matching regular expressions was evaluated on an 800 MHz Pentium III machine. The proposed approach was faster than best case grep performance in most cases. It was orders of magnitude faster than worst case grep performance. Logic for the largest NFA considered fit in less than a 1000 CLBs while DFA storage for grep in the worst case consumed a few hundred megabytes.

international conference on computer communications | 2004

Energy-latency tradeoffs for data gathering in wireless sensor networks

Yang Yu; Bhaskar Krishnamachari; Viktor K. Prasanna

We study the problem of scheduling packet transmissions for data gathering in wireless sensor networks. The focus is to explore the energy-latency tradeoffs in wireless communication using techniques such as modulation scaling. The data aggregation tree - a multiple-source single-sink communication paradigm - is employed for abstracting the packet flow. We consider a real-time scenario where the data gathering must be performed within a specified latency constraint. We present algorithms to minimize the overall energy dissipation of the sensor nodes in the aggregation tree subject to the latency constraint. For the off-line problem, we propose (a) a numerical algorithm for the optimal solution, and (h) a pseudo-polynomial time approximation algorithm based on dynamic programming. We also discuss techniques for handling interference among the sensor nodes. Simulations have been conducted for both long-range communication and short-range communication. The simulation results show that compared with the classic shutdown technique, between 20% to 90% energy savings can be achieved by our techniques, under different settings of several key system parameters. We also develop an on-line distributed protocol that relies only on the local information available at each sensor node within the aggregation tree. Simulation results show that between 15% to 90% energy conservation can be achieved by the on-line protocol. The adaptability of the protocol with respect to variations in the packet size and latency constraint is also demonstrated through several run-time scenarios.

IEEE Computer | 1993

Heterogeneous computing: challenges and opportunities

Ashfaq A. Khokhar; Viktor K. Prasanna; Muhammad Shaaban; Cho Li Wang

The issues and problems posed by heterogeneous computing are discussed. They include design of algorithms for applications, partitioning and mapping of application tasks, interconnection requirements, and the design of programming environments. The use of heterogeneous computing in image understanding is reviewed. An example vision task is presented, and the different types of parallelism used in the example are identified.<<ETX>>

IEEE Network | 2004

Issues in designing middleware for wireless sensor networks

Yang Yu; Bhaskar Krishnamachari; Viktor K. Prasanna

Wireless sensor networks are being developed for a variety of applications. With the continuing advances in network and application design, appropriate middleware is needed to provide both standardized and portable system abstractions, and the capability to support and coordinate concurrent applications on sensor networks. In this article, we first identify several design principles for such middleware. These principles motivate a cluster-based lightweight middleware framework that separates application semantics from the underlying hardware, operating system, and network infrastructure. We propose a layered architecture for each cluster that consists of a cluster control layer and a resource management layer. Key design issues and related challenges within this framework that deserve further investigation are outlined. Finally, we discuss a technique for energy-efficient resource allocation in a single-hop cluster, which serves as a basic primitive for the development of the resource management layer.

field programmable gate arrays | 2005

Sparse Matrix-Vector multiplication on FPGAs

Ling Zhuo; Viktor K. Prasanna

Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of SpMXV. In this paper, we propose an FPGA-based design for SpMXV. Our design accepts sparse matrices in Compressed Row Storage format, and makes no assumptions about the sparsity structure of the input matrix. The design employs IEEE-754 format double-precision floating-point multipliers/adders, and performs multiple floating-point operations as well as I/O operations in parallel. The performance of our design for SpMXV is evaluated using various sparse matrices from the scientific computing community, with the Xilinx Virtex-II Pro XC2VP70 as the target device. The MFLOPS performance increases with the hardware resources on the device as well as the available memory bandwidth. For example, when the memory bandwidth is 8 GB/s, our design achieves over 350 MFLOPS for all the test matrices. It demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure. Besides solving SpMXV problem, our design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs.

Proceedings of the IEEE | 2002

Reconfigurable computing systems

Kiran Bondalapati; Viktor K. Prasanna

Reconfigurable computing is emerging as the new paradigm for satisfying the simultaneous demand for application performance and flexibility. The ability to customize the architecture to match the computation and the data flow of the application has demonstrated significant performance benefits compared to general purpose architectures. Computer vision applications are one class of applications that have significant heterogeneity in their computation and communication structures. At the low level, vision algorithms have regular repetitive computations operating on large sets of image data with predictable data dependencies. At the higher level, the computations have irregular dependencies. Computer vision application characteristics have significant overlap with the advantages of reconfigurable architectures. The main focus of the paper is on outlining the methodologies required to realize the potential of reconfigurable architectures for vision applications. After giving a broad introduction to reconfigurable computing, the advantages of utilizing reconfigurable architectures for vision applications are outlined and illustrated using example computations. The paper discusses the development of fundamental configurable computing models that abstract the underlying hardware for high-level application mapping. The Hybrid System Architecture Model and algorithms utilizing the model are illustrated to demonstrate a formal framework. The paper also outlines ongoing research and provides a comprehensive list of references for further reading.

Mobile Networks and Applications | 2005

Energy-balanced task allocation for collaborative processing in wireless sensor networks

Yang Yu; Viktor K. Prasanna

We propose an energy-balanced allocation of a real-time application onto a single-hop cluster of homogeneous sensor nodes connected with multiple wireless channels. An epoch-based application consisting of a set of communicating tasks is considered. Each sensor node is equipped with discrete dynamic voltage scaling (DVS). The time and energy costs of both computation and communication activities are considered. We propose both an Integer Linear Programming (ILP) formulation and a polynomial time 3-phase heuristic. Our simulation results show that for small scale problems (with ≤10 tasks), up to 5x lifetime improvement is achieved by the ILP-based approach, compared with the baseline where no DVS is used. Also, the 3-phase heuristic achieves up to 63% of the system lifetime obtained by the ILP-based approach. For large scale problems (with 60–100 tasks), up to 3.5x lifetime improvement can be achieved by the 3-phase heuristic. We also incorporate techniques for exploring the energy-latency tradeoffs of communication activities (such as modulation scaling), which leads to 10x lifetime improvement in our simulations. Simulations were further conducted for two real world problems – LU factorization and Fast Fourier Transformation (FFT). Compared with the baseline where neither DVS nor modulation scaling is used, we observed up to 8x lifetime improvement for the LU factorization algorithm and up to 9x improvement for FFT.

field programmable gate arrays | 2004

Time and area efficient pattern matching on FPGAs

Zachary K. Baker; Viktor K. Prasanna

Pattern matching for network security and intrusion detection demands exceptionally high performance. Much work has been done in this field, and yet there is still significant room for improvement in efficiency, flexibility, and throughput. We develop a novel linear-array string matching architecture using a buffered, two-comparator variation on the Knuth-Morris-Pratt(KMP) algorithm. For small (16 or fewer characters) patterns, it competes favorably with the state-of-the-art while providing better scalability and reconfiguration, and more efficient hardware utilization. The area efficiency compared to other approaches improves further still as the pattern size increases because only the tables increase in size.KMP is a well-known, efficient string matching technique using a single comparator and a precomputed transition table. We add a second comparator and an input buffer, allowing the system to accept at least one character in each cycle and terminate after a number of clock cycles at maximum equal to the length of the input string plus the size of the buffer. The system also provides a clean, modular route to reconfiguring the patterns on-the-fly and scaling the system to support more units, using several rows of linear array elements. In this paper, we prove the bound on the buffer size and running time, and provide performance comparisons against other approaches.

Computing in Science and Engineering | 2013

Cloud-Based Software Platform for Big Data Analytics in Smart Grids

Yogesh Simmhan; Saima Aman; Alok Gautam Kumbhare; Rongyang Liu; Sam Stevens; Qunzhi Zhou; Viktor K. Prasanna

This article focuses on a scalable software platform for the Smart Grid cyber-physical system using cloud technologies. Dynamic Demand Response (D2R) is a challenge-application to perform intelligent demand-side management and relieve peak load in Smart Power Grids. The platform offers an adaptive information integration pipeline for ingesting dynamic data; a secure repository for researchers to share knowledge; scalable machine-learning models trained over massive datasets for agile demand forecasting; and a portal for visualizing consumption patterns, and validated at the University of Southern Californias campus microgrid. The article examines the role of clouds and their tradeoffs for use in the Smart Grid Cyber-Physical Sagileystem.

international parallel and distributed processing symposium | 2004

Analysis of high-performance floating-point arithmetic on FPGAs

Gokul Govindu; Ling Zhuo; Seonil Choi; Viktor K. Prasanna

Summary form only given. FPGAs are increasingly being used in the high performance and scientific computing community to implement floating-point based hardware accelerators. We analyze the floating-point multiplier and adder/subtractor units by considering the number of pipeline stages of the units as a parameter and use throughput/area as the metric. We achieve throughput rates of more than 240 Mhz (200 Mhz) for single (double) precision operations by deeply pipelining the units. To illustrate the impact of the floating-point units on a kernel, we implement a matrix multiplication kernel based on our floating-point units and show that a state-of-the-art FPGA device is capable of achieving about 15 GFLOPS (8 GFLOPS) for the single (double) precision floating-point based matrix multiplication. We also show that FPGAs are capable of achieving up to 6x improvement (for single precision) in terms of the GFLOPS/W (performance per unit power) metric over that of general purpose processors. We then discuss the impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel.

Explore More