Is this you? Create Your Porfile

Andrew Nere

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew Nere is active.

Explore More

Publication

Featured researches published by Andrew Nere.

ieee international symposium on workload characterization | 2012

BenchNN: On the broad potential application scope of hardware neural network accelerators

Tianshi Chen; Yunji Chen; Marc Duranton; Qi Guo; Atif Hashmi; Mikko H. Lipasti; Andrew Nere; Shi Qiu; Michèle Sebag; Olivier Temam

Recent technology trends have indicated that, although device sizes will continue to scale as they have in the past, supply voltage scaling has ended. As a result, future chips can no longer rely on simply increasing the operational core count to improve performance without surpassing a reasonable power budget. Alternatively, allocating die area towards accelerators targeting an application, or an application domain, appears quite promising, and this paper makes an argument for a neural network hardware accelerator. After being hyped in the 1990s, then fading away for almost two decades, there is a surge of interest in hardware neural networks because of their energy and fault-tolerance properties. At the same time, the emergence of high-performance applications like Recognition, Mining, and Synthesis (RMS) suggest that the potential application scope of a hardware neural network accelerator would be broad. In this paper, we want to highlight that a hardware neural network accelerator is indeed compatible with many of the emerging high-performance workloads, currently accepted as benchmarks for high-performance micro-architectures. For that purpose, we develop and evaluate software neural network implementations of 5 (out of 12) RMS applications from the PARSEC Benchmark Suite. Our results show that neural network implementations can achieve competitive results, with respect to application-specific quality metrics, on these 5 RMS applications.

Frontiers in Neurology | 2013

Sleep-dependent synaptic down-selection (I): modeling the benefits of sleep on memory consolidation and integration.

Andrew Nere; Atif Hashmi; Chiara Cirelli; Giulio Tononi

Sleep can favor the consolidation of both procedural and declarative memories, promote gist extraction, help the integration of new with old memories, and desaturate the ability to learn. It is often assumed that such beneficial effects are due to the reactivation of neural circuits in sleep to further strengthen the synapses modified during wake or transfer memories to different parts of the brain. A different possibility is that sleep may benefit memory not by further strengthening synapses, but rather by renormalizing synaptic strength to restore cellular homeostasis after net synaptic potentiation in wake. In this way, the sleep-dependent reactivation of neural circuits could result in the competitive down-selection of synapses that are activated infrequently and fit less well with the overall organization of memories. By using computer simulations, we show here that synaptic down-selection is in principle sufficient to explain the beneficial effects of sleep on the consolidation of procedural and declarative memories, on gist extraction, and on the integration of new with old memories, thereby addressing the plasticity-stability dilemma.

architectural support for programming languages and operating systems | 2011

A case for neuromorphic ISAs

Atif Hashmi; Andrew Nere; James Jamal Thomas; Mikko H. Lipasti

The desire to create novel computing systems, paired with recent advances in neuroscientific understanding of the brain, has led researchers to develop neuromorphic architectures that emulate the brain. To date, such models are developed, trained, and deployed on the same substrate. However, excessive co-dependence between the substrate and the algorithm prevents portability, or at the very least requires reconstructing and retraining the model whenever the substrate changes. This paper proposes a well-defined abstraction layer -- the Neuromorphic instruction set architecture, or NISA -- that separates a neural applications algorithmic specification from the underlying execution substrate, and describes the Aivo framework, which demonstrates the concrete advantages of such an abstraction layer. Aivo consists of a NISA implementation for a rate-encoded neuromorphic system based on the cortical column abstraction, a state-of-the-art integrated development and runtime environment (IDE), and various profile-based optimization tools. Aivos IDE generates code for emulating cortical networks on the host CPU, multiple GPGPUs, or as boolean functions. Its runtime system can deploy and adaptively optimize cortical networks in a manner similar to conventional just-in-time compilers in managed runtime systems (e.g. Java, C#). We demonstrate the abilities of the NISA abstraction by constructing a cortical network model of the mammalian visual cortex, deploying on multiple execution substrates, and utilizing the various optimization tools we have created. For this hierarchical configuration, Aivos profiling based network optimization tools reduce the memory footprint by 50% and improve the execution time by a factor of 3x on the host CPU. Deploying the same network on a single GPGPU results in a 30x speedup. We further demonstrate that a speedup of 480x can be achieved by deploying a massively scaled cortical network across three GPGPUs. Finally, converting a trained hierarchical network to C/C++ boolean constructs on the host CPU results in 44x speedup.

international parallel and distributed processing symposium | 2011

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Andrew Nere; Atif Hashmi; Mikko H. Lipasti

Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverse-engineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent learning model inspired by the structural and functional properties of the mammalian neocortex. Our cortical network, like the brain, exhibits massive amounts of processing parallelism, making todays GPGPUs a highly attractive and readily-available hardware accelerator for such a model. Furthermore, we consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose optimizations such as a software work-queue structure and pipelining the hierarchical layers of the cortical network to mitigate such problems. Our analysis provides important insight into the GPU architecture details including the number of cores, the memory system, and the global thread scheduler. Additionally, we create a runtime profiling tool for our parallel learning algorithm which proportionally distributes the cortical network across the host CPU as well as multiple GPUs, whether homogeneous or heterogeneous, that may be available to the system. Using the profiling tool with these optimizations on Nvidias CUDA framework, we achieve up to 60x speedup over a single-threaded CPU implementation of the model.

PLOS ONE | 2012

A Neuromorphic Architecture for Object Recognition and Motion Anticipation Using Burst-STDP

Andrew Nere; Umberto Olcese; David Balduzzi; Giulio Tononi

In this work we investigate the possibilities offered by a minimal framework of artificial spiking neurons to be deployed in silico. Here we introduce a hierarchical network architecture of spiking neurons which learns to recognize moving objects in a visual environment and determine the correct motor output for each object. These tasks are learned through both supervised and unsupervised spike timing dependent plasticity (STDP). STDP is responsible for the strengthening (or weakening) of synapses in relation to pre- and post-synaptic spike times and has been described as a Hebbian paradigm taking place both in vitro and in vivo. We utilize a variation of STDP learning, called burst-STDP, which is based on the notion that, since spikes are expensive in terms of energy consumption, then strong bursting activity carries more information than single (sparse) spikes. Furthermore, this learning algorithm takes advantage of homeostatic renormalization, which has been hypothesized to promote memory consolidation during NREM sleep. Using this learning rule, we design a spiking neural network architecture capable of object recognition, motion detection, attention towards important objects, and motor control outputs. We demonstrate the abilities of our design in a simple environment with distractor objects, multiple objects moving concurrently, and in the presence of noise. Most importantly, we show how this neural network is capable of performing these tasks using a simple leaky-integrate-and-fire (LIF) neuron model with binary synapses, making it fully compatible with state-of-the-art digital neuromorphic hardware designs. As such, the building blocks and learning rules presented in this paper appear promising for scalable fully neuromorphic systems to be implemented in hardware chips.

Frontiers in Neurology | 2013

Sleep-Dependent Synaptic Down-Selection (II): Single-Neuron Level Benefits for Matching, Selectivity, and Specificity.

Atif Hashmi; Andrew Nere; Giulio Tononi

In a companion paper (1), we used computer simulations to show that a strategy of activity-dependent, on-line net synaptic potentiation during wake, followed by off-line synaptic depression during sleep, can provide a parsimonious account for several memory benefits of sleep at the systems level, including the consolidation of procedural and declarative memories, gist extraction, and integration of new with old memories. In this paper, we consider the theoretical benefits of this two-step process at the single-neuron level and employ the theoretical notion of Matching between brain and environment to measure how this process increases the ability of the neuron to capture regularities in the environment and model them internally. We show that down-selection during sleep is beneficial for increasing or restoring Matching after learning, after integrating new with old memories, and after forgetting irrelevant material. By contrast, alternative schemes, such as additional potentiation in wake, potentiation in sleep, or synaptic renormalization in wake, decrease Matching. We also argue that, by selecting appropriate loops through the brain that tie feedforward synapses with feedback ones in the same dendritic domain, different subsets of neurons can learn to specialize for different contingencies and form sequences of nested perception-action loops. By potentiating such loops when interacting with the environment in wake, and depressing them when disconnected from the environment in sleep, neurons can learn to match the long-term statistical structure of the environment while avoiding spurious modes of functioning and catastrophic interference. Finally, such a two-step process has the additional benefit of desaturating the neuron’s ability to learn and of maintaining cellular homeostasis. Thus, sleep-dependent synaptic renormalization offers a parsimonious account for both cellular and systems level effects of sleep on learning and memory.

high-performance computer architecture | 2013

Bridging the semantic gap: Emulating biological neuronal behaviors with simple digital neurons

Andrew Nere; Atif Hashmi; Mikko H. Lipasti; Giulio Tononi

The advent of non von Neumann computational models, specifically neuromorphic architectures, has engendered a new class of challenges for computer architects. On the one hand, each neuron-like computational element must consume minimal power and area to enable scaling up to biological scales of billions of neurons; this rules out direct support for complex and expensive features like floating point and transcendental functions. On the other hand, to fully benefit from cortical properties and operations, neuromorphic architectures must support complex non-linear neuronal behaviors. This semantic gap between the simple and power-efficient processing elements and complex neuronal behaviors has rekindled a RISC vs. CISC-like debate within the neuromorphic hardware design community. In this paper, we address the aforementioned semantic gap for a recently-described digital neuromorphic architecture that constitutes simple Linear-Leak Integrate-and-Fire (LLIF) spiking neurons as processing primitives. We show that despite the simplicity of LLIF primitives, a broad class of complex neuronal behaviors can be emulated by composing assemblies of such primitives with low area and power overheads. Furthermore, we demonstrate that for the LLIF primitives without built-in mechanisms for synaptic plasticity, two well-known neural learning rules-spike timing dependent plasticity and Hebbian learning-can be emulated via assemblies of LLIF primitives. By bridging the semantic gap for one such system we enable neuromorphic system developers, in general, to keep their hardware design simple and power-efficient and at the same time enjoy the benefits of complex neuronal behaviors essential for robust and accurate cortical simulation.

architectural support for programming languages and operating systems | 2010

Cortical architectures on a GPGPU

Andrew Nere; Mikko H. Lipasti

As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to program if the current trend of adding more parallel cores continues to follow in the future. However, recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energy-efficient possibility. In this paper we describe a GPGPU extension to an intelligent model based on the mammalian neocortex. The GPGPU is a readily-available architecture that fits well with the parallel cortical architecture inspired by the basic building blocks of the human brain. Using NVIDIAs CUDA framework, we have achieved up to 273x speedup over our unoptimized C++ serial implementation. We also consider two inefficiencies inherent to our initial design: multiple kernel-launch overhead and poor utilization of GPGPU resources. We propose using a software work-queue structure to solve the former, and pipelining the cortical architecture during training phase for the latter. Additionally, from our success in extending our model to the GPU, we speculate the necessary hardware requirements for simulating the computational abilities of mammalian brains.

Journal of Parallel and Distributed Computing | 2013