Zachary K. Baker
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zachary K. Baker.
field-programmable custom computing machines | 2007
Zachary K. Baker; Maya Gokhale; Justin L. Tripp
The matched filter is an important kernel in the processing of hyperspectral data. The filter enables researchers to sift useful data from instruments that span large frequency bands and can produce Gigabytes of data in seconds. In this work, we evaluate the performance of a matched filter algorithm implementation on an FPGA-accelerated co-processor (Cray XD-1), the IBM Cell microprocessor, and the NVIDIA GeForce 7900 GTX GPU graphics card. We provide extensive discussion of the challenges and opportunities afforded by each platform. In particular, we explore the problems of partitioning the filter most efficiently between the host CPU and the co-processor. Using our results, we derive several performance metrics that provide the optimal solution for a variety of application situations.
radiation effects data workshop | 2012
Heather Quinn; Paul S. Graham; Keith Morgan; Zachary K. Baker; Michael P. Caffrey; David A. Smith; Randy Bell
This paper provides information regarding the use of the Xilinx Virtex-4 field-programmable gate array in a spacecraft deployed to low-earth orbit. The results are compared to pre-deployment accelerated and fault-injection testing.
IEEE Transactions on Nuclear Science | 2013
Heather Quinn; Paul S. Graham; Keith Morgan; Zachary K. Baker; Michael P. Caffrey; David A. Smith; Mike Wirthlin; Randy Bell
This paper provides information regarding the use of the Xilinx Virtex-4 field-programmable gate array (FPGA) in a spacecraft deployed to low-earth orbit. The results are compared to pre-deployment accelerated single-event effects (SEEs) and fault-injection testing.
field-programmable custom computing machines | 2007
Zachary K. Baker; Maya Gokhale
Shortest path algorithms are key elements of many graph problems. They are used in such applications as online direction finding and navigation, and modeling of traffic for large scale simulations of major metropolitan areas. As shortest path algorithm are execution bottlenecks, it is beneficial to move their execution to parallel hardware such as field programmable gate arrays (FPGAs). One of the innovations of this approach is the use of a small bubble sort core to produce the extract-min function. While bubble sort is not usually considered an appropriate algorithm for any non-trivial usage, it is appropriate in this case as it can produce a single minimum out of the list in O(n) cycles, where n is the number of elements in the vertex list. The cost of this min operation does not impact the running time of the architecture, because the queue depth for fetching the next set of edges from memory is roughly equivalent to the number of cores in the system. Additionally, this work provides a collection of simulation results that model the behavior of the node queue in hardware. The results show that a hardware queue, implementing a small bubble-type minimum function, need only be on the order of 16 elements to provide both correct and optimal paths. With support for a large DRAM graph store with SRAM-based caching on a Cray XD-1 FPGA-accelerated system, the system provides a speedup of roughly 50x over the CPU-based implementation.
reconfigurable computing and fpgas | 2010
Zachary K. Baker; Mark E. Dunham; Keith Morgan; Michael Pigue; M. Stettler; Paul S. Graham; Eric N. Schmierer; J. Power
Los Alamos has recently completed the latest in a series of Reconfigurable Software Radios, which incorporates several key innovations in both hardware design and algorithms. Due to our focus on satellite applications, each design must extract the best size, weight, and power performance possible from the ensemble of Commodity Off-the-Shelf (COTS) parts available at the time of design. A large component of our work lies in determining if a given part will survive in space and how it will fail under various space radiation conditions. Using two Xilinx Virtex 4 FPGAs, we have achieved 1 TeraOps/second signal processing on a 1920 Megabit/second datastream. This processing capability enables very advanced algorithms such as our wideband RF compression scheme to operate at the source, allowing bandwidth-constrained applications to deliver previously unattainable performance. This paper will discuss the design of the payload, making electronics survivable in the radiation of space, and techniques for debug.
IEEE Transactions on Nuclear Science | 2015
Heather Quinn; Zachary K. Baker; Tom Fairbanks; Justin L. Tripp; George Duran
Commercially available microprocessors could be useful to the space community for noncritical computations. There are many possible components that are smaller, lower-power, and less expensive than traditional radiation-hardened microprocessors. Many commercial microprocessors have issues with single-event effects (SEEs), such as single-event upsets (SEUs) and single-event transients (SETs), that can cause the microprocessor to calculate an incorrect result or crash. In this paper we present the Trikaya technique for masking SEUs and SETs through software mitigation techniques. Test results show that this technique can be very effective at masking errors, making it possible to fly these microprocessors for a variety of missions.
field-programmable custom computing machines | 2009
Zachary K. Baker; Joshua S. Monson
Often we are faced with the situation that the behavior of a circuit changes in an unpredictable way when a chassis cover is attached or the system is not easily accessible. For instance, in a deployed environment, such as space, hardware can malfunction in unpredictable ways. What can a designer do to ascertain the cause of the problem? Register interrogations only go so far, and sometimes the problem being debugged is register transactions themselves, or the problem lies in the FPGA programming. This work provides a solution; namely, the ability to drive a JTAG chain via an on-board microcontroller and support a read/write register interface running a logic analyzer core. This is achieved without the use of a JTAG cable or any external interface. We have demonstrated the functionality of the prototype system using a Xilinx Spartan 3E FPGA and a Microchip PIC18f2550 microcontroller. This paper will discuss the implementation details as well as present case studies describing how the tools have aided satellite hardware development at Los Alamos National Laboratory.
international parallel and distributed processing symposium | 2008
Zachary K. Baker; Reid B. Porter
Vector and data-flow processors are particularly strong at dense, regular computation. Sparse, irregular data layouts cause problems because their unpredictable data access patterns prevent computational pipelines from filling effectively. A number of algorithms in image processing have been proposed which are not dense, and instead apply local neighborhood operations to a sparse, irregular set of points. Sparse and irregular data transfer is difficult for modern processors because they have more processing power than memory bandwidth. However, if the computation can be expanded while not increasing the bandwidth, modern processors can be made more efficient. The application targeted in this paper is patch matching over large scenes. Given two sequential frames of video data, corresponding points between the two frames are found. Correspondences are determined by comparing small image patches around each point. By rotating and comparing patches of the image over a range of angles, it is possible to more accurately match them through the scene. Rotation and interpolation are required to produce an appropriate image to compare against. Results for CPU, FPGA, and GPU are presented, with FPGA far outperforming the GPU or CPU due to its potential for high levels of hardware parallelism as the total volume of computation increases.
dependable systems and networks | 2012
Ashwin A. Mendon; Ron Sass; Zachary K. Baker; Justin L. Tripp
A fast hardware-based checkpoint-restart mechanism is proposed in this paper. A circuit was developed and implemented on an FPGA as a proof-of-concept. Further the size and performance of this circuit was analyzed by instrumenting the cores and taking measurements with a commercial solid state (SATA2) drive. The same tests were measured using a modern Linux server with a conventional PCIe SATA2 host bus adaptor. The results suggest that the circuit would be a tiny fraction of a modern CMOS chip (less than 2%) while providing a significant performance advantage over a software-only solution.
reconfigurable computing and fpgas | 2009
Mark E. Dunham; Zachary K. Baker; M. Stettler; Michael Pigue; Paul S. Graham; Eric N. Schmierer; J. Power
Los Alamos has recently completed the latest in a series of Reconfigurable Software Radios, which incorporates several key innovations in both hardware design and algorithms. Due to our focus on satellite applications, each design must extract the best size, weight, and power performance possible from the ensemble of Commodity Off-the-Shelf (COTS) parts available at the time of design. In this case we have achieved 1 TeraOps/second signal processing on a 1920 Megabit/second datastream, while using only 53 Watts mains power, 5.5 kg, and 3 liters. This processing capability enables very advanced algorithms such as our wideband RF compression scheme to operate remotely, allowing network bandwidth constrained applications to deliver previously unattainable performance.