Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Richard J. Carter is active.

Publication


Featured researches published by Richard J. Carter.


field programmable custom computing machines | 1997

Defect tolerance on the Teramac custom computer

W. Culbertson; Rick Amerson; Richard J. Carter; Philip J. Kuekes; Greg Snider

Teramac is a large custom computer which works correctly despite the fact that three quarters of its FPGAs contain defects. This is accomplished through unprecedented use of defect tolerance, which substantially reduces Teramacs cost and permits it to have an unusually complex interconnection network. Teramac tolerates defective resources, like gates and wires, that are introduced during the manufacture of its FPGAs and other components, and during assembly of the system. We have developed methods to precisely locate defects. User designs are mapped onto the system by a completely automated process that avoids the defects and hides the defect tolerance from the user. Defective components are not physically removed from the system.


field programmable gate arrays | 1995

Teramac-configurable custom computing

Rick Amerson; Richard J. Carter; W. Culbertson; Phillip J. Kuekes; Greg Snider

Prototypes are invaluable for studying special purpose parallel architectures and custom computing. We have built a configurable custom computing engine, based on field programmable gate arrays, to enable experiments on an interesting scale. The Teramac configurable hardware system can execute synchronous logic designs of up to one million gates at rates up to one megahertz. Search and retrieval of nontext data from very large databases can be greatly accelerated using special purpose parallel hardware. We are using Teramac to conduct experiments with special purpose processors involving search of nontext databases.


Genetic Programming and Evolvable Machines | 2001

A High-Performance, Pipelined, FPGA-Based Genetic Algorithm Machine

Barry Shackleford; Greg Snider; Richard J. Carter; Etsuko Okushi; Mitsuhiro Yasuda; Katsuhiko Seo; Hiroto Yasuura

Accelerating a genetic algorithm (GA) by implementing it in a reconfigurable field programmable gate array (FPGA) is described. The implemented GA features: random parent selection, which conserves selection circuitry; a steady-state memory model, which conserves chip area; survival of fitter child chromosomes over their less-fit parent chromosomes, which promotes evolution. A net child chromosome generation rate of one per clock cycle is obtained by pipelining the parent selection, crossover, mutation, and fitness evaluation functions. Complex fitness functions can be further pipelined to maintain a high-speed clock cycle. Fitness functions with a pipeline initiation interval of greater than one can be plurally implemented to maintain a net evaluated-chromosome throughput of one per clock cycle. Two prototypes are described: The first prototype (c. 1996 technology) is a multiple-FPGA chip implementation, running at a 1 MHz clock rate, that solves a 94-row × 520-column set covering problem 2,200× faster than a 100 MHz workstation running the same algorithm in C. The second prototype (Xilinx XVC300) is a single-FPGA chip implementation, running at a 66 MHZ clock rate, that solves a 36-residue protein folding problem in a 2-d lattice 320× faster than a 366 MHz Pentium II. The current largest FPGA (Xilinx XCV3200E) has circuitry available for the implementation of 30 fitness function units which would yield an acceleration of 9,600× for the 36-residue protein folding problem.


field programmable gate arrays | 2002

FPGA implementation of neighborhood-of-four cellular automata random number generators

Barry Shackleford; Motoo Tanaka; Richard J. Carter; Greg Snider

Random number generators (RNGs) based upon neighborhood-of-four cellular automata (CA) with asymmetrical, non-local connections are explored. A number of RNGs that pass Marsaglias rigorous Diehard suite of random number tests have been discovered. A neighborhood size of four allows a single CA cell to be implemented with a four-input lookup table and a one-bit register which are common building blocks in popular field programmable gate arrays (FPGAs). The investigated networks all had periodic (wrap around) boundary conditions with either 1-d, 2-d, or 3-d interconnection topologies. Trial designs of 64-bit networks using a Xilinx XCV1000-6 FPGA predict a maximum clock rate of 214 MHz to 230 MHz depending upon interconnection topology.


Nanotechnology | 2011

Lognormal switching times for titanium dioxide bipolar memristors: origin and resolution

Gilberto Medeiros-Ribeiro; Frederick A. Perner; Richard J. Carter; Hisham Abdalla; Matthew D. Pickett; R. Stanley Williams

We measured the switching time statistics for a TiO(2) memristor and found that they followed a lognormal distribution, which is a potentially serious problem for computer memory and data storage applications. We examined the underlying physical phenomena that determine the switching statistics and proposed a simple analytical model for the distribution based on the drift/diffusion equation and previously measured nonlinear drift behavior. We designed a closed-loop switching protocol that dramatically narrows the time distribution, which can significantly improve memory circuit performance and reliability.


conference on high performance computing (supercomputing) | 2007

High-performance ethernet-based communications for future multi-core processors

Michael S. Schlansker; Nagabhushan Chitlur; Erwin Oertli; Paul M. Stillwell; Linda J. Rankin; Dennis R. Bradford; Richard J. Carter; Jayaram Mudigonda; Nathan L. Binkert; Norman P. Jouppi

Data centers and HPC clusters often incorporate specialized networking fabrics to satisfy system requirements. However, Ethernets low cost and high performance are causing a shift from specialized fabrics toward standard Ethernet. Although Ethernets low-level performance approaches that of specialized fabrics, the features that these fabrics provide such as reliable in-order delivery and flow control are implemented, in the case of Ethernet, by endpoint hardware and software. Unfortunately, current Ethernet endpoints are either slow (commodity NICs with generic TCP/IP stacks) or costly (offload engines). To address these issues, the JNIC project developed a novel Ethernet endpoint. JNICs hardware and software were specifically designed for the requirements of high-performance communications within future data-centers and compute clusters. The architecture combines capabilities already seen in advanced network architectures with new innovations to create a comprehensive solution for scalable and high-performance Ethernet. We envision a JNIC architecture that is suitable for most in-data-center communication needs.


nasa dod conference on evolvable hardware | 2002

High-performance cellular automata random number generators for embedded probabilistic computing systems

Barry Shackleford; Motoo Tanaka; Richard J. Carter; Greg Snider

High-performance random number generators (RNGs) can be economically implemented in popular field programmable gate arrays without the need for arithmetic circuitry by employing cellular automata (CA) with a neighborhood size of four and an asymmetrical, non-local neighborhood connection scheme. Each cell (i.e., RNG bit) requires only a single 4-input lookup table and a single flip-flop. From each of various 1-d, 2-d, and 3-d networks with periodic boundary conditions, the 1000 highest entropy CA RNGs were selected from the set of 65,536 possible uniform (all CA truth tables the same) implementations. Each set of 1000 high-entropy CA was then submitted to Marsaglias DIEHARD suite of random number tests. A number of 64-bit, neighbor-of-four CA-based RNGs have been discovered that pass all tests in DIEHARD without resorting to either site spacing or time spacing to improve the RNG quality.


field programmable logic and applications | 1995

The Teramac Configurable Computer Engine

Greg Snider; Philip J. Kuekes; W. Bruce Culbertson; Richard J. Carter; Arnold S. Berger; Rick Amerson

The difficulty in creating a configurable machine lies in providing enough wires that placement and routing can be done with no human intervention. Several researchers have previously used tens of FPGAs to create configurable custom machines [8–11]; Teramac allows experiments using many hundreds of FPGAs by providing a routing-rich environment for implementing user designs by using custom FPGAs, MCMs and PC boards.


Computers & Graphics | 1997

Implementations of Cube-4 on the Teramac custom computing machine

Urs Kanus; Michael Meißner; Wolfgang Straßer; Hanspeter Pfister; Arie E. Kaufman; Rick Amerson; Richard J. Carter; W. Bruce Culbertson; Philip J. Kuekes; Greg Snider

Abstract We present two implementations of the Cube-4 volume rendering architecture, developed at SUNY Stony Brook, on the Teramac custom computing machine. Cube-4 uses a slice-parallel ray-casting algorithm that allows for a parallel and pipelined implementation of ray-casting. Tri-linear interpolation, surface normal estimation from interpolated samples, shading, classification, and compositing are part of the rendering pipeline. Using the partitioning schemes introduced in this paper, Cube-4 is capable of rendering in real-time large datasets (e.g., 10243) with a limited number of rendering pipelines. Teramac is a hardware simulator developed at Hewlett-Packard Research Laboratories. Teramac belongs to the new class of custom computing machines, which combine the speed of special-purpose hardware with the flexibility of general-purpose computers. Using Teramac as a development tool, we implemented two working Cube-4 prototypes capable of rendering 1283 datasets in 0.65 s at a very low 0.96 MHz processing frequency. The results from these implementations indicate scalable performance with the number of rendering pipelines and real-time frame-rates for high-resolution datasets.


custom integrated circuits conference | 1996

An FPGA for multi-chip reconfigurable logic

Rick Amerson; Richard J. Carter; W. Culbertson; Phillip J. Kuekes; Greg Snider; Lyle Albertson

The Plasma chip, designed specifically to address issues important to custom computing machines (CCM), completes a 100% fully automatic place and route in approximately three seconds. Plasma FPGAs using 0.8 micron CMOS are packaged in large multichip modules (MCMs). Plasma introduces some innovative architecture concepts including hardware support for large multiported register files.

Collaboration


Dive into the Richard J. Carter's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge