Tarek M. Taha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tarek M. Taha is active.

Explore More

Publication

Featured researches published by Tarek M. Taha.

IEEE Transactions on Computers | 2008

An Instruction Throughput Model of Superscalar Processors

Tarek M. Taha; D. Scott Wills

Advances in semiconductor technology enable larger processor design spaces, leading to increasingly complex systems. At an initial stage, designers must evaluate many architecture design points to achieve a suitable design. Currently, most architecture exploration is performed using cycle accurate simulators. Although accurate, these tools are slow, thus limiting a comprehensive design search. The vast design space of todays complex processors and time to market economic pressures motivate the need for faster architectural evaluation methods. This paper presents a superscalar processor performance model that enables rapid exploration of the architecture design space for superscalar processors. It supplements current design tools by quickly identifying promising areas for more thorough and time consuming exploration with traditional tools. The model estimates the instruction throughput of a superscalar processor based on early architectural design parameters and application properties. It has been validated with the SimpleScalar out-of-order simulator. The core of the model, which executes 1.6 million times faster, produces instruction throughput estimates that are with within 5.5 percent of the corresponding SimpleScalar values.

2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing | 2009

Character recognition with two spiking neural network models on multicore architectures

Mohammad Ashraf Bhuiyan; Rommel Jalasutram; Tarek M. Taha

This paper presents the use of the Izhikevich and Hodgkin Huxley neuron models for image recognition. The former is more biologically accurate than the commonly used integrate and fire neuron model but has similar low computational requirements. Brain scale cortex models tend to use the more biological neuron models. The results of this work show that the Izhikevich model can be used for image recognition and would be a good candidate for a large scale visual cortex model. Neural networks based on these models are developed and applied to character recognition. They were able to identify 48 24×24 images and their noisy versions. The networks were accelerated using modern multicore processors and showed significant speedups. Such processors are likely to be used for developing high performance, large scale implementations of these image recognition networks.

international symposium on neural networks | 2009

Parallelizing two classes of neuromorphic models on the Cell multicore architecture

Tarek M. Taha; Pavan Yalamanchili; Mohammad Ashraf Bhuiyan; Rommel Jalasutram; Sumod K. Mohan

There is a significant interest in the research community to develop large scale, high performance implementations of cortical models. These have the potential to provide significantly stronger information processing capabilities than current computing algorithms. At present we are investigating the implementation of six neuromorphic computational models on a large Sony PlayStation 3 cluster at the Air Force research lab. These six models span two classes of neuromorphic algorithms: hierarchical Bayesian and spiking neural networks. In this paper we have presented the performance gain of these six neuromorphic computational models for implementations on the Cell multicore processor on the PlayStation 3. We show that the Cell multicore architecture can provide significant performance gains for these models. We compare the performance gains of the two classes and see that the hierarchical Bayesian class provides higher speedups than the spiking network class in general. This is primarily due to the higher computational load per node in the former class. Our results indicate that the Cell processor based PlayStation 3 would provide a good platform for the large scale implementation of the classes of neuromorphic models examined.

acm southeast regional conference | 2005

Estimating critical region parallelism to guide platform retargeting

Kiruthika Selvamani; Tarek M. Taha

Applications of different categories contain varying levels of data instruction, and thread-level parallelism inherently. New processing architectures are increasingly employing varying forms of these parallel execution mechanisms to boost performance. However, existing code assets are limited to sequential expressions of what should be highly parallel algorithms. These applications cannot take advantage of new parallel execution mechanisms unless their codes are retargeted to the new platforms. Automated retargeting compilers do exist, but are not efficient when the new architecture platforms are significantly different. On the other hand, rewriting applications manually is an expensive process, particularly for undocumented legacy applications.This paper presents a lightweight dynamic analysis technique for characterizing the types of parallelism inherent within the critical portions a given program to estimate the potential benefit of retargeting. Current parallelism extraction methods do not analyze the various forms of parallelism and treat the whole code homogenously. Since applications in general contain certain critical code regions which have high contribution towards the overall execution time, analyzing their parallelism gives a clearer insight on the potential of the whole application. The technique is validated on Spec95 and MediaBench benchmarks widely used to evaluate processor performance.

workshop on computer architecture education | 2006

PSATSim: an interactive graphical superscalar architecture simulator for power and performance analysis

Clint W. Smullen; Tarek M. Taha

Two of the most important design issues for modern processors are power and performance. It is important for students in computer organization classes to understand the tradeoff between these two issues. This paper presents PSATSim, a graphical simulator that allows student to configure the design of a speculative out-of-order execution superscalar processor and see the effect of the design on both power and performance. The simulator explicitly shows the relationship between instructions within a processor by visually tagging instructions. The use of a graphical simulator makes it simple for instructors to demonstrate the execution of instructions within these architectures and the interactions among processor components.

international parallel and distributed processing symposium | 2008

A neocortex model implementation on reconfigurable logic with streaming memory

Christopher N. Vutsinas; Tarek M. Taha; Kenneth L. Rice

In this paper we study the acceleration of a new class of cognitive processing applications based on the structure of the neocortex. Our focus is on a model of the visual cortex used for image recognition developed by George and Hawkins. We propose techniques to accelerate the algorithm using reconfigurable logic, specifically a streaming memory architecture utilizing available off-chip memory. We discuss the design of a streaming memory access unit enabling a large number of processing elements to be placed on a single FPGA thus increasing throughput. We present an implementation of our approach on a Cray XD1 and discuss possible extension to further increase throughput. Our results indicate that using a two FPGA design with streaming memory gives a speedup of 71.9 times over a purely software implementation.

Microprocessors and Microsystems | 2009

A context switching streaming memory architecture to accelerate a neocortex model

Christopher N. Vutsinas; Tarek M. Taha; Kenneth L. Rice

A novel architecture to accelerate a neocortex inspired cognitive model is presented. The architecture utilizes a collection of context switchable processing elements (PEs). This enables time multiplexing of nodes in the model onto available PEs. A streaming memory system is designed to enable high-throughput computation and efficient use of memory resources. Several scheduling algorithms were examined to efficiently assign network nodes to the PEs. Multiple parallel FPGA-accelerated implementations were evaluated on a Cray XD1. Networks of varying complexity were tested and indicate that hardware acceleration can provide an average throughput gain of 184 times over equivalent parallel software implementations.

Proceedings of SPIE | 2008

Higher accuracy template for corner cube reflected image

Abdul A. S. Awwal; Kenneth L. Rice; Richard R. Leach; Tarek M. Taha

Video images of laser beams are analyzed to determine the position of the laser beams for alignment purpose in the National Ignition Facility (NIF). Algorithms process beam images to facilitate automated laser alignment. One such beam image, known as the corner cube reflected pinhole image, exhibits wide beam quality variations that are processed by a matched-filter-based algorithm. The challenge is to design a representative template that captures these variations while at the same time assuring accurate position determination. This paper describes the development of a new analytical template to accurately estimate the center of a beam with good image quality. The templates are constructed to exploit several key recurring features observed in the beam images. When the beam image quality is low, the algorithm chooses a template that contains fewer features. The algorithm was implemented using a Xilinx Virtex II Pro FPGA implementation that provides a speedup of about 6.4 times over a baseline 3GHz Pentium 4 processor.

acm southeast regional conference | 2009

Implementing a hierarchical Bayesian visual cortex model on multi-core processors

Pavan Yalamanchili; Sumod K. Mohan; Tarek M. Taha

Recent scientific studies of the brain have led to new models of information processing. Some of these models are based on Hierarchical Bayesian Networks and have several benefits over traditional neural networks. Large scale implementations of brain models have the potential for strong inference capabilities, and hierarchical Bayesian models lend themselves well to large scales. Multi-core processors are currently the standard architectural approach utilized for high performance computing platforms. In this paper we examine the parallelization and optimization of Deans hierarchical Bayesian model onto two multi-core architectures: the nine-core IBM Cell and the quad-core Intel Xeon processors. This is the first study of the parallelization of this class of models onto multi-core processors. We evaluate two parallelization strategies and examine the performance of the model as it is scaled. Our results indicate that the Cell processor can provide speedups of up to 108 times over a serial implementation of the model for the network sizes examined. The quad-core Intel Xeon processor provided a speedup of 36 times for the same model configuration.

Optics and Photonics for Information Processing XI | 2017

Optical beam classification using deep learning: a comparison with rule- and feature-based classification

Abdul A. S. Awwal; Tarek M. Taha; Roger Lowe-Webb; Md. Zahangir Alom

Deep-learning methods are gaining popularity because of their state-of-the-art performance in image classification tasks. In this paper, we explore classification of laser-beam images from the National Ignition Facility (NIF) using a novel deeplearning approach. NIF is the world’s largest, most energetic laser. It has nearly 40,000 optics that precisely guide, reflect, amplify, and focus 192 laser beams onto a fusion target. NIF utilizes four petawatt lasers called the Advanced Radiographic Capability (ARC) to produce backlighting X-ray illumination to capture implosion dynamics of NIF experiments with picosecond temporal resolution. In the current operational configuration, four independent short-pulse ARC beams are created and combined in a split-beam configuration in each of two NIF apertures at the entry of the pre-amplifier. The subaperture beams then propagate through the NIF beampath up to the ARC compressor. Each ARC beamlet is separately compressed with a dedicated set of four gratings and recombined as sub-apertures for transport to the parabola vessel, where the beams are focused using parabolic mirrors and pointed to the target. Small angular errors in the compressor gratings can cause the sub-aperture beams to diverge from one another and prevent accurate alignment through the transport section between the compressor and parabolic mirrors. This is an off-normal condition that must be detected and corrected. The goal of the off-normal check is to determine whether the ARC beamlets are sufficiently overlapped into a merged single spot or diverged into two distinct spots. Thus, the objective of the current work is three-fold: developing a simple algorithm to perform off-normal classification, exploring the use of Convolutional Neural Network (CNN) for the same task, and understanding the inter-relationship of the two approaches. The CNN recognition results are compared with other machine-learning approaches, such as Deep Neural Network (DNN) and Support Vector Machine (SVM). The experimental results show around 96% classification accuracy using CNN; the CNN approach also provides comparable recognition results compared to the present feature-based off-normal detection. The feature-based solution was developed to capture the expertise of a human expert in classifying the images. The misclassified results are further studied to explain the differences and discover any discrepancies or inconsistencies in current classification.

Explore More