Is this you? Create Your Porfile

Atif Hashmi

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Atif Hashmi is active.

Explore More

Publication

Featured researches published by Atif Hashmi.

ieee international symposium on workload characterization | 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators

Tianshi Chen; Yunji Chen; Marc Duranton; Qi Guo; Atif Hashmi; Mikko H. Lipasti; Andrew Nere; Shi Qiu; Michèle Sebag; Olivier Temam

Recent technology trends have indicated that, although device sizes will continue to scale as they have in the past, supply voltage scaling has ended. As a result, future chips can no longer rely on simply increasing the operational core count to improve performance without surpassing a reasonable power budget. Alternatively, allocating die area towards accelerators targeting an application, or an application domain, appears quite promising, and this paper makes an argument for a neural network hardware accelerator. After being hyped in the 1990s, then fading away for almost two decades, there is a surge of interest in hardware neural networks because of their energy and fault-tolerance properties. At the same time, the emergence of high-performance applications like Recognition, Mining, and Synthesis (RMS) suggest that the potential application scope of a hardware neural network accelerator would be broad. In this paper, we want to highlight that a hardware neural network accelerator is indeed compatible with many of the emerging high-performance workloads, currently accepted as benchmarks for high-performance micro-architectures. For that purpose, we develop and evaluate software neural network implementations of 5 (out of 12) RMS applications from the PARSEC Benchmark Suite. Our results show that neural network implementations can achieve competitive results, with respect to application-specific quality metrics, on these 5 RMS applications.

Frontiers in Neurology | 2013

Sleep-dependent synaptic down-selection (I): modeling the benefits of sleep on memory consolidation and integration.

Andrew Nere; Atif Hashmi; Chiara Cirelli; Giulio Tononi

Sleep can favor the consolidation of both procedural and declarative memories, promote gist extraction, help the integration of new with old memories, and desaturate the ability to learn. It is often assumed that such beneficial effects are due to the reactivation of neural circuits in sleep to further strengthen the synapses modified during wake or transfer memories to different parts of the brain. A different possibility is that sleep may benefit memory not by further strengthening synapses, but rather by renormalizing synaptic strength to restore cellular homeostasis after net synaptic potentiation in wake. In this way, the sleep-dependent reactivation of neural circuits could result in the competitive down-selection of synapses that are activated infrequently and fit less well with the overall organization of memories. By using computer simulations, we show here that synaptic down-selection is in principle sufficient to explain the beneficial effects of sleep on the consolidation of procedural and declarative memories, on gist extraction, and on the integration of new with old memories, thereby addressing the plasticity-stability dilemma.

international symposium on computer architecture | 2011

Automatic abstraction and fault tolerance in cortical microachitectures

Atif Hashmi; Hugues Berry; Olivier Temam; Mikko H. Lipasti

Recent advances in the neuroscientific understanding of the brain are bringing about a tantalizing opportunity for building synthetic machines that perform computation in ways that differ radically from traditional Von Neumann machines. These brain-like architectures, which are premised on our understanding of how the human neocortex computes, are highly fault-tolerant, averaging results over large numbers of potentially faulty components, yet manage to solve very difficult problems more reliably than traditional algorithms. A key principle of operation for these architectures is that of automatic abstraction: independent features are extracted from highly disordered inputs and are used to create abstract invariant representations of the external entities. This feature extraction is applied hierarchically, leading to increasing levels of abstraction at higher levels in the hierarchy. This paper describes and evaluates a biologically plausible computational model for this process, and highlights the inherent fault tolerance of the biologically-inspired algorithm. We introduce a stuck-at fault model for such cortical networks, and describe how this model maps to hardware faults that can occur on commodity GPGPU cores used to realize the model in software. We show experimentally that the model software implementation can intrinsically preserve its functionality in the presence of faulty hardware, without requiring any reprogramming or recompilation. This model is a first step towards developing a comprehensive and biologically plausible understanding of the computational algorithms and microarchitecture of computing systems that mimic the human cortex, and to applying them to the robust implementation of tasks on future computing systems built of faulty components.

architectural support for programming languages and operating systems | 2011

A case for neuromorphic ISAs

Atif Hashmi; Andrew Nere; James Jamal Thomas; Mikko H. Lipasti

The desire to create novel computing systems, paired with recent advances in neuroscientific understanding of the brain, has led researchers to develop neuromorphic architectures that emulate the brain. To date, such models are developed, trained, and deployed on the same substrate. However, excessive co-dependence between the substrate and the algorithm prevents portability, or at the very least requires reconstructing and retraining the model whenever the substrate changes. This paper proposes a well-defined abstraction layer -- the Neuromorphic instruction set architecture, or NISA -- that separates a neural applications algorithmic specification from the underlying execution substrate, and describes the Aivo framework, which demonstrates the concrete advantages of such an abstraction layer. Aivo consists of a NISA implementation for a rate-encoded neuromorphic system based on the cortical column abstraction, a state-of-the-art integrated development and runtime environment (IDE), and various profile-based optimization tools. Aivos IDE generates code for emulating cortical networks on the host CPU, multiple GPGPUs, or as boolean functions. Its runtime system can deploy and adaptively optimize cortical networks in a manner similar to conventional just-in-time compilers in managed runtime systems (e.g. Java, C#). We demonstrate the abilities of the NISA abstraction by constructing a cortical network model of the mammalian visual cortex, deploying on multiple execution substrates, and utilizing the various optimization tools we have created. For this hierarchical configuration, Aivos profiling based network optimization tools reduce the memory footprint by 50% and improve the execution time by a factor of 3x on the host CPU. Deploying the same network on a single GPGPU results in a 30x speedup. We further demonstrate that a speedup of 480x can be achieved by deploying a massively scaled cortical network across three GPGPUs. Finally, converting a trained hierarchical network to C/C++ boolean constructs on the host CPU results in 44x speedup.

international parallel and distributed processing symposium | 2011

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Andrew Nere; Atif Hashmi; Mikko H. Lipasti

Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverse-engineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent learning model inspired by the structural and functional properties of the mammalian neocortex. Our cortical network, like the brain, exhibits massive amounts of processing parallelism, making todays GPGPUs a highly attractive and readily-available hardware accelerator for such a model. Furthermore, we consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose optimizations such as a software work-queue structure and pipelining the hierarchical layers of the cortical network to mitigate such problems. Our analysis provides important insight into the GPU architecture details including the number of cores, the memory system, and the global thread scheduler. Additionally, we create a runtime profiling tool for our parallel learning algorithm which proportionally distributes the cortical network across the host CPU as well as multiple GPUs, whether homogeneous or heterogeneous, that may be available to the system. Using the profiling tool with these optimizations on Nvidias CUDA framework, we achieve up to 60x speedup over a single-threaded CPU implementation of the model.

Frontiers in Neurology | 2013

Sleep-Dependent Synaptic Down-Selection (II): Single-Neuron Level Benefits for Matching, Selectivity, and Specificity.

Atif Hashmi; Andrew Nere; Giulio Tononi

In a companion paper (1), we used computer simulations to show that a strategy of activity-dependent, on-line net synaptic potentiation during wake, followed by off-line synaptic depression during sleep, can provide a parsimonious account for several memory benefits of sleep at the systems level, including the consolidation of procedural and declarative memories, gist extraction, and integration of new with old memories. In this paper, we consider the theoretical benefits of this two-step process at the single-neuron level and employ the theoretical notion of Matching between brain and environment to measure how this process increases the ability of the neuron to capture regularities in the environment and model them internally. We show that down-selection during sleep is beneficial for increasing or restoring Matching after learning, after integrating new with old memories, and after forgetting irrelevant material. By contrast, alternative schemes, such as additional potentiation in wake, potentiation in sleep, or synaptic renormalization in wake, decrease Matching. We also argue that, by selecting appropriate loops through the brain that tie feedforward synapses with feedback ones in the same dendritic domain, different subsets of neurons can learn to specialize for different contingencies and form sequences of nested perception-action loops. By potentiating such loops when interacting with the environment in wake, and depressing them when disconnected from the environment in sleep, neurons can learn to match the long-term statistical structure of the environment while avoiding spurious modes of functioning and catastrophic interference. Finally, such a two-step process has the additional benefit of desaturating the neuron’s ability to learn and of maintaining cellular homeostasis. Thus, sleep-dependent synaptic renormalization offers a parsimonious account for both cellular and systems level effects of sleep on learning and memory.

2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing | 2009

Cortical columns: Building blocks for intelligent systems

Atif Hashmi; Mikko H. Lipasti

The neocortex appears to be a very efficient, uniformly structured, and hierarchical computational system [25], [23], [24]. Researchers have made significant efforts to model intelligent systems that mimic these neocortical properties to perform a broad variety of pattern recognition and learning tasks. Unfortunately, many of these systems have drifted away from their cortical origins and incorporate or rely on attributes and algorithms that are not biologically plausible. In contrast, this paper describes a model for an intelligent system that is motivated by the properties of cortical columns, which can be viewed as the basic functional unit of the neocortex [35], [16]. Our model extends predictability minimization [30] to mimic the behavior of cortical columns and incorporates neocortical properties such as hierarchy, structural uniformity, and plasticity, and enables adaptive, hierarchical independent feature detection. Initial results for an unsupervised learning task-identifying independent features in image data-are quite promising, both in a single-level and a hierarchical organization modeled after the visual cortex. The model is also able to forget learned patterns that no longer appear in the dataset, demonstrating its adaptivity, resilience, and stability under changing input conditions.

high-performance computer architecture | 2013

Bridging the semantic gap: Emulating biological neuronal behaviors with simple digital neurons

Andrew Nere; Atif Hashmi; Mikko H. Lipasti; Giulio Tononi

The advent of non von Neumann computational models, specifically neuromorphic architectures, has engendered a new class of challenges for computer architects. On the one hand, each neuron-like computational element must consume minimal power and area to enable scaling up to biological scales of billions of neurons; this rules out direct support for complex and expensive features like floating point and transcendental functions. On the other hand, to fully benefit from cortical properties and operations, neuromorphic architectures must support complex non-linear neuronal behaviors. This semantic gap between the simple and power-efficient processing elements and complex neuronal behaviors has rekindled a RISC vs. CISC-like debate within the neuromorphic hardware design community. In this paper, we address the aforementioned semantic gap for a recently-described digital neuromorphic architecture that constitutes simple Linear-Leak Integrate-and-Fire (LLIF) spiking neurons as processing primitives. We show that despite the simplicity of LLIF primitives, a broad class of complex neuronal behaviors can be emulated by composing assemblies of such primitives with low area and power overheads. Furthermore, we demonstrate that for the LLIF primitives without built-in mechanisms for synaptic plasticity, two well-known neural learning rules-spike timing dependent plasticity and Hebbian learning-can be emulated via assemblies of LLIF primitives. By bridging the semantic gap for one such system we enable neuromorphic system developers, in general, to keep their hardware design simple and power-efficient and at the same time enjoy the benefits of complex neuronal behaviors essential for robust and accurate cortical simulation.

international conference on computer design | 2008

Accelerating search and recognition with a TCAM functional unit

Atif Hashmi; Mikko H. Lipasti

World data is increasing rapidly, doubling almost every three years[1][2]. To comprehend and use this data effectively, search and recognition (SR) applications will demand more computational power in the future. The inherent speedups that these applications get due to frequency scaling will no longer exist as processor vendors move away from frequency scaling and towards multi-core architectures. Thus, modifications to both the structure of SR applications and current processor architectures are required to meet the computational needs of these workloads. This paper describes a novel hardware acceleration scheme to improve the performance of SR applications. The hardware accelerator relies on Ternary Content-Addressable Memory and some straightforward ISA extensions to deliver a promising speedup of 3.0-4.0 for SR workloads like Template Matching, BLAST, and multi-threaded applications using Software Transactional Memory (STM).

IJCCI (Selected Papers) | 2012

A Cortically Inspired Learning Model

Atif Hashmi; Mikko H. Lipasti

We describe a biologically plausible learning model inspired by the structural and functional properties of the cortical columns present in the mammalian neocortex. The strength and robustness of our model is ascribed to its biologically plausible, uniformly structured, and hierarchically distributed processing units with their localized learning rules. By modeling cortical columns rather than individual neurons as our fundamental processing units, we get hierarchical learning networks that are computationally less demanding and better suited for studying higher cortical properties like independent feature detection, plasticity, etc. Another interesting attribute of our model is the use of feedback processing paths to generate invariant representation to robustly recognize variations of the same patterns and to determine the set of features sufficient for recognizing different patterns in the input dataset. We train and test our hierarchical networks using synthetic digit images as well as a subset of handwritten digit images obtained from the MNIST database. Our results show that our cortical networks use unsupervised feedforward processing as well as supervised feedback processing to robustly recognize handwritten digits.

Explore More