Greg Snider | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Greg Snider is active.

Explore More

Publication

Featured researches published by Greg Snider.

Nanotechnology | 2007

Self-organized computation with unreliable, memristive nanodevices

Greg Snider

Nanodevices have terrible properties for building Boolean logic systems: high defect rates, high variability, high death rates, drift, and (for the most part) only two terminals. Economical assembly requires that they be dynamical. We argue that strategies aimed at mitigating these limitations, such as defect avoidance/reconfiguration, or applying coding theory to circuit design, present severe scalability and reliability challenges. We instead propose to mitigate device shortcomings and exploit their dynamical character by building self-organizing, self-healing networks that implement massively parallel computations. The key idea is to exploit memristive nanodevice behavior to cheaply implement adaptive, recurrent networks, useful for complex pattern recognition problems. Pulse-based communication allows the designer to make trade-offs between power consumption and processing speed. Self-organization sidesteps the scalability issues of characterization, compilation and configuration. Network dynamics supplies a graceful response to device death. We present simulation results of such a network—a self-organized spatial filter array—that demonstrate its performance as a function of defects and device variation.

Nanotechnology | 2004

CMOS-like logic in defective, nanoscale crossbars

Greg Snider; Philip J. Kuekes; R. Stanley Williams

We present an approach to building defect-tolerant, nanoscale compute fabrics out of assemblies of defective crossbars of configurable FETs and switches. The simplest structure, the complementary/symmetry array, can implement AND-OR-INVERT functions, which are powerful enough to implement general computation. These arrays can be combined to create logic blocks capable of implementing sum-of-product functions, and still larger computations, such as state machines, can be obtained by adding additional routing blocks. We demonstrate the defect tolerance of such structures through experimental studies of the compilation of a small microprocessor onto a crossbar fabric with varying defect rates and compiler mapping parameters.

field programmable custom computing machines | 1997

Defect tolerance on the Teramac custom computer

W. Culbertson; Rick Amerson; Richard J. Carter; Philip J. Kuekes; Greg Snider

Teramac is a large custom computer which works correctly despite the fact that three quarters of its FPGAs contain defects. This is accomplished through unprecedented use of defect tolerance, which substantially reduces Teramacs cost and permits it to have an unusually complex interconnection network. Teramac tolerates defective resources, like gates and wires, that are introduced during the manufacture of its FPGAs and other components, and during assembly of the system. We have developed methods to precisely locate defects. User designs are mapped onto the system by a completely automated process that avoids the defects and hides the defect tolerance from the user. Defective components are not physically removed from the system.

field programmable gate arrays | 1995

Teramac-configurable custom computing

Rick Amerson; Richard J. Carter; W. Culbertson; Phillip J. Kuekes; Greg Snider

Prototypes are invaluable for studying special purpose parallel architectures and custom computing. We have built a configurable custom computing engine, based on field programmable gate arrays, to enable experiments on an interesting scale. The Teramac configurable hardware system can execute synchronous logic designs of up to one million gates at rates up to one megahertz. Search and retrieval of nontext data from very large databases can be greatly accelerated using special purpose parallel hardware. We are using Teramac to conduct experiments with special purpose processors involving search of nontext databases.

application-specific systems, architectures, and processors | 2000

High-level synthesis of nonprogrammable hardware accelerators

Robert Schreiber; Shail Aditya; B. Ramakrishna Rau; Vinod Kathail; Scott A. Mahlke; Santosh G. Abraham; Greg Snider

The PICO-N system automatically synthesizes embedded nonprogrammable accelerators to be used as co-processors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (very-long instruction word) processors, their controller local memory, and interfaces. The system also modifies the users application software to make use of the generated accelerator. The user indicates the throughput to be achieved by specifying the number of processors and their initiation interval. In experimental comparisons, PICO-N designs are slightly more costly than hand-designed accelerators with the same performance.

field programmable gate arrays | 2002

Performance-constrained pipelining of software loops onto reconfigurable hardware

Greg Snider

Retiming and slowdown are algorithms that can be used to pipeline synchronous circuits. Iterative modulo scheduling is an algorithm for software pipelining in the presence of resource constraints. Integrating the best features of both yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware. It also fits naturally into a design space exploration process to trade-off speed for power, energy or area.

Genetic Programming and Evolvable Machines | 2001

A High-Performance, Pipelined, FPGA-Based Genetic Algorithm Machine

Barry Shackleford; Greg Snider; Richard J. Carter; Etsuko Okushi; Mitsuhiro Yasuda; Katsuhiko Seo; Hiroto Yasuura

Accelerating a genetic algorithm (GA) by implementing it in a reconfigurable field programmable gate array (FPGA) is described. The implemented GA features: random parent selection, which conserves selection circuitry; a steady-state memory model, which conserves chip area; survival of fitter child chromosomes over their less-fit parent chromosomes, which promotes evolution. A net child chromosome generation rate of one per clock cycle is obtained by pipelining the parent selection, crossover, mutation, and fitness evaluation functions. Complex fitness functions can be further pipelined to maintain a high-speed clock cycle. Fitness functions with a pipeline initiation interval of greater than one can be plurally implemented to maintain a net evaluated-chromosome throughput of one per clock cycle. Two prototypes are described: The first prototype (c. 1996 technology) is a multiple-FPGA chip implementation, running at a 1 MHz clock rate, that solves a 94-row × 520-column set covering problem 2,200× faster than a 100 MHz workstation running the same algorithm in C. The second prototype (Xilinx XVC300) is a single-FPGA chip implementation, running at a 66 MHZ clock rate, that solves a 36-residue protein folding problem in a 2-d lattice 320× faster than a 366 MHz Pentium II. The current largest FPGA (Xilinx XCV3200E) has circuitry available for the implementation of 30 fitness function units which would yield an acceleration of 9,600× for the 36-residue protein folding problem.

field programmable gate arrays | 1996

Plasma: An FPGA for Million Gate Systems

Rick Amerson; Richard J. Carter; W. Culbertson; Phillip J. Kuekes; Greg Snider; Lyle Albertson

Prototypes are invaluable for studying special purpose parallel architectures and custom computing. This paper describes a new FPGA, called Plasma- the heart of a configurable custom computing engine (Teramac) that can execute synchronous logic designs up to one million gates at rates up to one megahertz. Plasma FPGAs using 0.8 micron CMOS are packaged in large multichip modules (MCMs). A large custom circuit may be mapped onto the hardware in approximately two hours, without user intervention. Plasma introduces some innovative architecture concepts including hardware support for large multiported register files.

IEEE Computer | 2011

From Synapses to Circuitry: Using Memristive Memory to Explore the Electronic Brain

Greg Snider; Rick Amerson; Dick Carter; Hisham Abdalla; Muhammad Shakeel Qureshi; Jasmin Léveillé; Massimiliano Versace; Heather Ames; Sean Patrick; Benjamin Chandler; Anatoli Gorchetchnikov; Ennio Mingolla

In a synchronous digital platform for building large cognitive models, memristive nanodevices form dense, resistive memories that can be placed close to conventional processing circuitry. Through adaptive transformations, the devices can interact with the world in real time.

field programmable gate arrays | 2002

FPGA implementation of neighborhood-of-four cellular automata random number generators

Barry Shackleford; Motoo Tanaka; Richard J. Carter; Greg Snider

Random number generators (RNGs) based upon neighborhood-of-four cellular automata (CA) with asymmetrical, non-local connections are explored. A number of RNGs that pass Marsaglias rigorous Diehard suite of random number tests have been discovered. A neighborhood size of four allows a single CA cell to be implemented with a four-input lookup table and a one-bit register which are common building blocks in popular field programmable gate arrays (FPGAs). The investigated networks all had periodic (wrap around) boundary conditions with either 1-d, 2-d, or 3-d interconnection topologies. Trial designs of 64-bit networks using a Xilinx XCV1000-6 FPGA predict a maximum clock rate of 214 MHz to 230 MHz depending upon interconnection topology.

Explore More