Kenneth L. Rice | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenneth L. Rice is active.

Explore More

Publication

Featured researches published by Kenneth L. Rice.

reconfigurable computing and fpgas | 2009

FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition

Kenneth L. Rice; Mohammad Ashraf Bhuiyan; Tarek M. Taha; Christopher N. Vutsinas; Melissa C. Smith

There has been a strong push recently to examine biological scale simulations of neuromorphic algorithms to achieve stronger inference capabilities than current computing algorithms. The recent Izhikevich spiking neuron model is ideally suited for such large scale cortical simulations due to its efficiency and biological accuracy. In this paper we explore the feasibility of using FPGAs for large scale simulations of the Izhikevich model. We developed a modularized processing element to evaluate a large number of Izhikevich spiking neurons in a pipelined manner. This approach allows for easy scalability of the model to larger FPGAs. We utilized a character recognition algorithm based on the Izhikevich model for this study and scaled up the algorithm to use over 9000 neurons. The FPGA implementation of the algorithm on a Xilinx Virtex 4 provided a speedup of approximately 8.5 times an equivalent software implementation on a 2.2 GHz AMD Opteron core. Our results indicate that FPGAs are suitable for large scale cortical simulations utilizing the Izhikevich spiking neuron model.

Optical Engineering | 2009

Design and acceleration of phase-only filter- based optical pattern recognition for fingerprint identification

Kenneth L. Rice; Tarek M. Taha; Arshad Chowdhury; Abdul A. S. Awwal; Damon L. Woodard

We present the use of phase-only filter-based correlation for fingerprint pattern identification. The main advantage of this approach is that it is distortion tolerant and can be realized in optical or electronic parallel hardware. Given that real-world fingerprints are almost never perfect, distortion tolerance can prove to be very important for this application. Our results indicate that the algorithm can identify prints with 58% of the data missing on average. With large fingerprint databases, identification can be a computationally challenging task. The high parallelism in the phase-only correlation filter makes it ideally suited to field programmable gate array (FPGA)-based hardware acceleration. We examine the FPGA-based acceleration of the fingerprint algorithm. On a Xilinx Virtex II Pro FPGA, we achieve speedups of about 47 times over an optimized C implementation of the algorithm on a 2.2-GHz AMD Opteron processor. Our FPGA implementation is optimized to allow efficient processing of large databases.

Applied Optics | 2009

Hardware accelerated optical alignment of lasers using beam-specific matched filters

Abdul A. S. Awwal; Kenneth L. Rice; Tarek M. Taha

Accurate automated alignment of laser beams in the National Ignition Facility (NIF) is essential for achieving extreme temperature and pressure required for inertial confinement fusion. The alignment achieved by the integrated control systems relies on algorithms processing video images to determine the position of the laser beam images in real time. Alignment images that exhibit wide variations in beam quality require a matched-filter algorithm for position detection. One challenge in designing a matched-filter-based algorithm is to construct a filter template that is resilient to variations in imaging conditions while guaranteeing accurate position determination. A second challenge is to process images for thousands of templates in under a second, as may be required in future high-energy laser systems. This paper describes the development of a new analytical template that captures key recurring features present in the beam image to accurately estimate the beam position under good image quality conditions. Depending on the features present in a particular beam, the analytical template allows us to create a highly tailored template containing only those selected features. The second objective is achieved by exploiting the parallelism inherent in the algorithm to accelerate processing using parallel hardware that provides significant performance improvement over conventional processors. In particular, a Xilinx Virtex II Pro field programmable gate array (FPGA) hardware implementation processing 32 templates provided a speed increase of about 253 times over an optimized software implementation running on a 2.2 GHz AMD Opteron core.

international parallel and distributed processing symposium | 2008

A neocortex model implementation on reconfigurable logic with streaming memory

Christopher N. Vutsinas; Tarek M. Taha; Kenneth L. Rice

In this paper we study the acceleration of a new class of cognitive processing applications based on the structure of the neocortex. Our focus is on a model of the visual cortex used for image recognition developed by George and Hawkins. We propose techniques to accelerate the algorithm using reconfigurable logic, specifically a streaming memory architecture utilizing available off-chip memory. We discuss the design of a streaming memory access unit enabling a large number of processing elements to be placed on a single FPGA thus increasing throughput. We present an implementation of our approach on a Cray XD1 and discuss possible extension to further increase throughput. Our results indicate that using a two FPGA design with streaming memory gives a speedup of 71.9 times over a purely software implementation.

Microprocessors and Microsystems | 2009

A context switching streaming memory architecture to accelerate a neocortex model

Christopher N. Vutsinas; Tarek M. Taha; Kenneth L. Rice

A novel architecture to accelerate a neocortex inspired cognitive model is presented. The architecture utilizes a collection of context switchable processing elements (PEs). This enables time multiplexing of nodes in the model onto available PEs. A streaming memory system is designed to enable high-throughput computation and efficient use of memory resources. Several scheduling algorithms were examined to efficiently assign network nodes to the PEs. Multiple parallel FPGA-accelerated implementations were evaluated on a Cray XD1. Networks of varying complexity were tested and indicate that hardware acceleration can provide an average throughput gain of 184 times over equivalent parallel software implementations.

Proceedings of SPIE | 2008

Higher accuracy template for corner cube reflected image

Abdul A. S. Awwal; Kenneth L. Rice; Richard R. Leach; Tarek M. Taha

Video images of laser beams are analyzed to determine the position of the laser beams for alignment purpose in the National Ignition Facility (NIF). Algorithms process beam images to facilitate automated laser alignment. One such beam image, known as the corner cube reflected pinhole image, exhibits wide beam quality variations that are processed by a matched-filter-based algorithm. The challenge is to design a representative template that captures these variations while at the same time assuring accurate position determination. This paper describes the development of a new analytical template to accurately estimate the center of a beam with good image quality. The templates are constructed to exploit several key recurring features observed in the beam images. When the beam image quality is low, the algorithm chooses a template that contains fewer features. The algorithm was implemented using a Xilinx Virtex II Pro FPGA implementation that provides a speedup of about 6.4 times over a baseline 3GHz Pentium 4 processor.

international symposium on neural networks | 2011

GPGPU acceleration of Cellular Simultaneous Recurrent Networks adapted for maze traversals

Kenneth L. Rice; Tarek M. Taha; Khan M. Iftekharuddin; Keith Anderson; Teddy Salan

At present, a major initiative in the research community is investigating new ways of processing data that capture the efficiency of the human brain in hardware and software. This has resulted in increased interest and development of bio-inspired computing approaches in software and hardware. One such bio-inspired approach is Cellular Simultaneous Recurrent Networks (CSRNs). CSRNs have been demonstrated to be very useful in solving state transition type problems, such as maze traversals. Although powerful in image processing capabilities, CSRNs have high computational demands with increasing input problem size. In this work, we revisit the maze traversal problem to gain an understanding of the general processing of CSRNs. We use a 2.67 GHz Intel Xeon X5550 processor coupled with an NVIDIA Tesla C2050 general purpose graphical processing unit (GPGPU) to create several novel accelerated CSRN implementations as a means of overcoming the high computational cost. Additionally, we explore the use of decoupled extended Kalman filters in the CSRN training phase and find a significant reduction in runtime with negligible change in accuracy. We find in our results that we can achieve average speedups of 21.73 and 3.55 times for the training and testing phases respectively when compared to optimized C implementations. The main bottleneck in training performance was a matrix inversion computation. Therefore, we utilize several methods to reduce the effects of the matrix inversion computation.

Infotech@Aerospace 2011 | 2011

Accelerating CSRN based face recognition on an NVIDIA GPGPU

Kenneth L. Rice; Tarek M. Taha; Ronald Miller; Khan M. Iftekharuddin; Keith Anderson; Teddy Salan

Unmanned aerial vehicles (UAVs) are being equipped with high definition cameras to survey a wide range of low-contrast and diverse environments. From data captured by UAVs, image analysts can determine adversarial threats proficiently. However, there is simply too much data and not enough analysts to do this processing efficiently. Enabling computing systems to mimic the processes in the human brain to process sych data would be of significant benefit. CSRNs (cellular simultaneous recurrent networks) are capable of solving several spatial processing tasks that are carried out by human. In particular, they have been shown to be capable of pose invariant face recognition. Given the highly recurrent nature of CSRNs (a property also seen in the human cortex), the computational demands of these algorithms grow with input size. Therefore the acceleration of CSRNs would be highly beneficial. In this paper we examine the acceleration of CSRNs applied to face recognition. We develop optimized implementations of the algorithm on an Intel Xeon 2.67 GHz processor and an NVIDIA Tesla C2050 GPGPU (general purpose graphical processing unit). Our results show that the GPGPU is 22.9 times faster than the CPU implementation.

The Journal of Supercomputing | 2009