Michael DeBole
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael DeBole.
field programmable custom computing machines | 2008
Kevin M. Irick; Michael DeBole; Vijaykrishnan Narayanan; Aman Gayasen
In real-time video mining applications it is desirable to extract information about human subjects, such as gender, ethnicity, and age, from grayscale frontal face images. Many algorithms have been developed in the machine learning, statistical data mining, and pattern classification communities that perform such tasks with remarkable accuracy. Many of these algorithms, however, when implemented in software, suffer poor frame rates due to the amount and complexity of the computation involved. This paper presents an FPGA friendly implementation of a Gaussian Radial Basis SVM well suited to classification of grayscale images. We identify a novel optimization of the SVM formulation that dramatically reduces the computational inefficiency of the algorithm. The implementation achieves 88.6% detection accuracy in gender classification which is to the same degree of accuracy of software implementations using the same classification mechanism.
design automation conference | 2012
Ahmed Al Maashri; Michael DeBole; Matthew Cotter; Nandhini Chandramoorthy; Yang Xiao; Vijaykrishnan Narayanan; Chaitali Chakrabarti
Video analytics introduce new levels of intelligence to automated scene understanding. Neuromorphic algorithms, such as HMAX, are proposed as robust and accurate algorithms that mimic the processing in the visual cortex of the brain. HMAX, for instance, is a versatile algorithm that can be repurposed to target several visual recognition applications. This paper presents the design and evaluation of hardware accelerators for extracting visual features for universal recognition. The recognition applications include object recognition, face identification, facial expression recognition, and action recognition. These accelerators were validated on a multi-FPGA platform and significant performance enhancement and power efficiencies were demonstrated when compared to CMP and GPU platforms. Results demonstrate as much as 7.6X speedup and 12.8X more power-efficient performance when compared to those platforms.
asia and south pacific design automation conference | 2008
David Atienza; G. De Micheli; Luca Benini; José L. Ayala; P.G. Del Valle; Michael DeBole; Vijaykrishnan Narayanan
Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges to maintain manufacturing yield rates and reliable devices in their expected lifetimes for latest nanometer-scale dimensions. In fact, new system and processor microarchitectures require new reliability-aware design methods and exploration tools that can face these challenges without significantly increasing manufacturing cost, reducing system performance or imposing large area overheads due to redundancy. In this paper we overview the latest approaches in reliability modeling and variability-tolerant design for latest technology nodes, and advocate the need of reliability- aware design for forthcoming consumer electronics. Moreover, we illustrate with a case study of an embedded processor that effective reliability-aware design can be achieved in nanometer-scale devices through integral design approaches that covers modeling and exploration of reliability effects, and hardware-software architectural techniques to provide reliability-enhanced solutions at both microarchitectural- and system-level.
International Journal of Parallel Programming | 2009
Michael DeBole; Ramakrishnan Krishnan; Varsha Balakrishnan; Wenping Wang; Hong Luo; Yu Wang; Yuan Xie; Yu Cao; Narayanan Vijaykrishnan
Degradation of device parameters over the lifetime of a system is emerging as a significant threat to system reliability. Among the aging mechanisms, wearout resulting from Negative Bias Temperature Instability (NBTI) is of particular concern in deep submicron technology generations. While there has been significant effort at the device and circuit level to model and characterize the impact of NBTI, the analysis of NBTI’s impact at the architectural level is still at its infancy. To facilitate architectural level aging analysis, a tool capable of evaluating NBTI vulnerabilities early in the design cycle has been developed that evaluates timing degradation due to NBTI. The tool includes workload-based temperature and performance degradation analysis across a variety of technologies and operating conditions, revealing a complex interplay between factors influencing NBTI timing degradation.
asia and south pacific design automation conference | 2009
Michael DeBole; Krishnan Ramakrishnan; Varsha Balakrishnan; Wenping Wang; Hong Luo; Yu Wang; Yuan Xie; Yu Cao; Narayanan Vijaykrishnan
Degradation of device parameters over the lifetime of a system is emerging as a significant threat to system reliability. Among the aging mechanisms, wearout resulting from NBTI is of particular concern in deep submicron technology generations. To facilitate architectural level aging analysis, a tool capable of evaluating NBTI vulnerabilities early in the design cycle has been developed. The tool includes workload-based temperature and performance degradation analysis across a variety of technologies and operating conditions, revealing a complex interplay between factors influencing NBTI timing degradation.
Ipsj Transactions on System Lsi Design Methodology | 2012
Sungho Park; Ahmed Al Maashri; Kevin M. Irick; Aarti Chandrashekhar; Matthew Cotter; Nandhini Chandramoorthy; Michael DeBole; Vijaykrishnan Narayanan
Neuromorphic vision algorithms are biologically-inspired computational models of the primate visual pathway. They promise robustness, high accuracy, and high energy efficiency in advanced image processing applications. Despite these potential benefits, the realization of neuromorphic algorithms typically exhibit low performance even when executed on multi-core CPU and GPU platforms. This is due to the disparity in the computational modalities prominent in these algorithms and those modalities most exploited in contemporary computer architectures. In essence, acceleration of neuromorphic algorithms requires adherence to specific computational and communicational requirements. This paper discusses these requirements and proposes a framework for mapping neuromorphic vision applications on a System-on-Chip, SoC. A neuromorphic object detection and recognition on a multi-FPGA platform is presented with performance and power efficiency comparisons to CMP and GPU implementations.
field-programmable logic and applications | 2007
Kevin M. Irick; Michael DeBole; Vijaykrishnan Narayanan; Rajeev Sharma; Hankyu Moon; Satish Mummareddy
An integral part of interactive computing environments are systems that have the ability to process information about their users in real-time. In many cases it is desirable to not only recognize a human user but also to extract as much information about the user as possible, such as gender, ethnicity, age, etc. In this paper we present an FPGA implementation of a neural network configured specifically for performing face detection and gender classification in real-time video streams. Our streaming architecture performs the face and gender classification tasks at 30 frames per second on a small sized Virtex-4 FPGA, at accuracy comparable to that of a leading commercial software implementation.
signal processing systems | 2011
Ahmed Al Maashri; Michael DeBole; Chi-Li Yu; Vijaykrishnan Narayanan; Chaitali Chakrabarti
Neuromorphic vision algorithms are biologically inspired algorithms that follow the processing that takes place in the visual cortex. These algorithms have proved to match classical computer vision algorithms in classification performance and even outperformed them in some instances. However, neuromorphic algorithms suffer from high complexity leading to poor execution times when running on general purpose processors, making them less attractive for real-time applications. FPGAs, on the other hand, have become true signal processing platforms due to their lightweight, low power consumption and massive parallel computational resources. This paper describes an FPGA-based hardware architecture that accelerates an object classification cortical model, HMAX. Compared to a CPU implementation, this hardware accelerator offers 23X (89X) speedup when mapped to a single-FPGA (multi-FPGA) platform, while maintaining a classification accuracy of 92.5%.
international conference on computer aided design | 2011
Michael DeBole; Ahmed Al Maashri; Matthew Cotter; Chi-Li Yu; Chaitali Chakrabarti; Vijaykrishnan Narayanan
Implementations of neuromorphic algorithms are traditionally implemented on platforms which consume significant power, falling short of their biologically underpinnings. Recent improvements in FPGA technology have led to FPGAs becoming a platform in which these rapidly evolving algorithms can be implemented. Unfortunately, implementing designs on FPGAs still prove challenging for nonexperts, limiting their use in the neuroscience domain. In this paper, a FPGA framework is presented which enables neuroscientists to compose multi-FPGA systems for a cortical object classification model. This is demonstrated by mapping this algorithm onto two distinct platforms providing speedups of up to ∼28X over a reference CPU implementation.
asia and south pacific design automation conference | 2009
Srinath Sridharan; Michael DeBole; Guangyu Sun; Yuan Xie; Vijaykrishnan Narayanan
As technology scales, interconnect delays begin to dominate the performance of modern microprocessors. The ability to reduce the length of global wires has become an important design constraint, however only a subset of those global wires is critical for determining performance. The introduction of three-dimensional (3D) ICs has created the opportunity to reduce global wiring lengths and shorter interconnect delays through the intelligent placement of functional blocks. In this paper, a floorplanner for 3D chips is proposed that organizes functional blocks according to critical microarchitectural communication paths. The floorplanner identifies the potential triggers, in the form of feedback delays, which are responsible for the largest communication costs and places the contributing functional blocks in such a way that those costs are minimized. With our criticality driven 3D placement there is an average IPC improvement of 22% over 2D placement. Over criticality unaware 3D placement, criticality driven 3D placement shows an average IPC improvement of 8%.