John R. Humphrey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John R. Humphrey is active.

Explore More

Publication

Featured researches published by John R. Humphrey.

IEEE Antennas and Wireless Propagation Letters | 2003

Hardware implementation of a three-dimensional finite-difference time-domain algorithm

James P. Durbano; Fernando E. Ortiz; John R. Humphrey; Mark S. Mirotznik; Dennis W. Prather

In order to take advantage of the significant benefits afforded by computational electromagnetic techniques, such as the finite-difference time-domain (FDTD) method, solvers capable of analyzing realistic problems in a reasonable time frame are required. Although software-based solvers are frequently used, they are often too slow to be of practical use. To speed up computations, hardware-based implementations of the FDTD method have recently been proposed. Although these designs are functionally correct, to date, they have not provided a practical and scalable solution. To this end, we have developed an architecture that not only overcomes the limitations of previous accelerators, but also represents the first three-dimensional FDTD accelerator implemented in physical hardware. We present a high-level view of the system architecture and describe the basic functionality of each module involved in the computational flow. We then present our implementation results and compare them with current PC-based FDTD solutions. These results indicate that hardware solutions will, in the near future, surpass existing PC throughputs, and will ultimately rival the performance of PC clusters.

Proceedings of SPIE, the International Society for Optical Engineering | 2007

GPU-based accelerated 2D and 3D FDTD solvers

Daniel K. Price; John R. Humphrey; Eric J. Kelmelis

Our group has employed the use of modern graphics processor units (GPUs) for the acceleration of finite-difference based computational electromagnetics (CEM) codes. In particular, we accelerated the well-known Finite-Difference Time-Domain (FDTD) method, which is commonly used for the analysis of electromagnetic phenomena. This algorithm uses difference-based approximations for Maxwells Equations to simulate the propagation of electromagnetic fields through space and materials. The method is very general and is applicable to a wide array of problems, but runtimes can be very long so acceleration is highly desired. In this paper we present GPU-based accelerated solvers for the FDTD method in both its 2D and 3D embodiments.

field-programmable custom computing machines | 2006

Floating-Point Accumulation Circuit for Matrix Applications

Michael R. Bodnar; John R. Humphrey; Petersen F. Curt; Dennis W. Prather

Many scientific algorithms require floating-point reduction operations, or accumulations, including matrix-vector-multiply (MVM), vector dot-products, and the discrete cosine transform (DCT). Because FPGA implementations of each of these algorithms are desirable, it is clear that a high-performance, floatingpoint accumulation unit is necessary. However, this type of circuit is difficult to design in an FPGA environment due to the deep pipelining of the floatingpoint arithmetic units, which is needed in order to attain high performance designs (Durbano et al., 2004, Leeser and Wang, 2004). A deep pipeline requires special handling in feedback circuits because of the long delay, which is further complicated by a continuous input data stream. Proposed accumulator architectures, which overcome such performance bottlenecks, are described in Zuo et al. (2005) and Zuo and Prassana (2005). This paper presents a floating-point accumulation circuit that is a natural evolution of this work. The system can handle streams of arbitrary length, requires modest area, and can handle interrupted data inputs. In contrast to the designs proposed by Zhuo et al., the proposed architecture maintains buffers for partial result storage which utilize significantly less embedded memory resources, while maintaining fixed size and speed characteristics, regardless of stream length. The results for both single- and double-precision accumulation architectures was verified in a Virtex-II 8000-4 part clocked at more than 150 MHz, and the power of this design was demonstrated in a computationally intense, matrix-matrix-multiply application

ieee antennas and propagation society international symposium | 2004

Hardware acceleration of the 3D finite-difference time-domain method

James P. Durbano; John R. Humphrey; Fernando E. Ortiz; Petersen F. Curt; Dennis W. Prather; Mark S. Mirotznik

Although the importance of fast, accurate computational electromagnetic (CEM) solvers is readily apparent, how to construct them is not. By nature, CEM algorithms are both computationally and memory intensive. Furthermore, the serial nature of most software-based implementations does not take advantage of the inherent parallelism found in many CEM algorithms. In an attempt to exploit parallelism, supercomputers and computer clusters are employed. However, these solutions can be prohibitively expensive and frequently impractical. Thus, a CEM accelerator or CEM co-processor would provide the community with much-needed processing power. This would enable iterative designs and designs that would otherwise be impractical to analyze. To this end, we are developing a full-3D, hardware-based accelerator for the finite-difference time-domain (FDTD) method (K.S. Yee, IEEE Trans. Antennas and Propag., vol. 14, pp. 302-307, 1966). This accelerator provides speedups of up to three orders of magnitude over single-PC solutions and will surpass the throughputs of the PC clusters. In this paper, we briefly summarize previous work in this area, where it has fallen short, and how our work fills the void. We then describe the current status of this project, summarizing our achievements to date and the work that remains. We conclude with the projected results of our accelerator.

Proceedings of SPIE | 2006

Modeling and simulation of nanoscale devices with a desktop supercomputer

Eric J. Kelmelis; James P. Durbano; John R. Humphrey; Fernando E. Ortiz; Petersen F. Curt

Designing nanoscale devices presents a number of unique challenges. As device features shrink, the computational demands of the simulations necessary to accurately model them increase significantly. This is a result of not only the increasing level of detail in the device design itself, but also the need to use more accurate models. The approximations that are generally made when dealing with larger devices break down as feature sizes decrease. This can be seen in the optics field when contrasting the complexity of physical optics models with those requiring a rigorous solution to Maxwells equations. This added complexity leads to more demanding calculations, stressing computational resources and driving research to overcome these limitations. There are traditionally two means of improving simulation times as model complexity grows beyond available computational resources: modifying the underlying algorithms to maintain sufficient precision while reducing overall computations and increasing the power of the computational system. In this paper, we explore the latter. Recent advances in commodity hardware technologies, particularly field-programmable gate arrays (FPGAs) and graphics processing units (GPUs), have allowed the creation of desktop-style devices capable of outperforming PC clusters. We will describe the key hardware technologies required to build such a device and then discuss their application to the modeling and simulation of nanophotonic devices. We have found that FPGAs and GPUs can be used to significantly reduce simulation times and allow for the solution of much large problems.

field-programmable custom computing machines | 2003

Implementation of three-dimensional FPGA-based FDTD solvers: an architectural overview

James P. Durbano; Fernando E. Ortiz; John R. Humphrey; Dennis W. Prather; Mark S. Mirotznik

Maxwells equations, which govern electromagnetic propagation, are a system of coupled, differential equations. As such, they can be represented in difference form, thus allowing their numerical solution. By implementing both the temporal and spatial derivatives of Maxwells equations in difference form, we arrive at one of the most common computational electromagnetic algorithms, the Finite-Difference Time-Domain (FDTD) method (Yee, 1966). In this technique, the region of interest is sampled to generate a grid of points, hereafter referred to as a mesh. The discretized form of Maxwells equations is then solved at each point in the mesh to determine the associated electromagnetic fields. In this extended abstract, we present an architecture that overcomes the previous limitations. We begin with a high-level description of the computational flow of this architecture.

Proceedings of SPIE | 2010

Analyzing the Impact of Data Movement on GPU Computations

Daniel K. Price; John R. Humphrey; Kyle E. Spagnoli; Aaron Paolini

Recently, GPU computing has taken the scientific computing landscape by storm, fueled by the attractive nature of the massively parallel arithmetic hardware. When porting their code, researchers rely on a set of best practices that have been developed over the few years that general purpose GPU computing has been employed. This paper challenges a widely held belief that transfers to and from the GPU device must be minimized to achieve the best speedups over existing codes by presenting a case study on CULA, our library for dense linear algebra computation on GPU. Among the topics to be discussed include the relationship between computation and transfer time for both synchronous and asynchronous transfers, as well as the impact that data allocations have on memory performance and overall solution time.

Proceedings of SPIE | 2006

Integrated optical chemical sensor using a dispersion-guided photonic crystal structure

Richard K. Martin; Ahmed Sharkawy; John R. Humphrey; Eric J. Kelmelis; Dennis W. Prather

There is a growing need for miniature low-cost chemical sensors for use in monitoring environmental conditions. Applications range from environmental pollution monitoring, industrial process control and homeland security threat detection to biomedical diagnostics. Integrated opto-chemical sensors can provide the required functionality by monitoring chemistry induced changes in the refractive, absorptive, or luminescent properties of materials. Mach-Zehnder (MZ) interferometers, using the phase induced from a chemically reactive film, have shown success for such applications but typically are limited to one chemical analysis per sensor. In this paper we present a MZ-like sensor using the dispersion properties of a photonic crystal lattice. Properly engineered dispersion guiding enables the creation of multiple parallel MZ-like sensors monitoring different chemical reactions in a device much smaller than a typical MZ sensor. The phase shift induced in one arm of the photonic crystal structure by the chemical reaction of a special film induces a change in the sensor output. The use of a dispersion guiding photonic crystal structure enables the use of lower refractive index materials because the creation of a bandgap is not necessary. This in turn increases coupling efficiency into the device. Other advantages of this type of structure include the ability to guide both TE and TM modes as well as reduced sensitivity to fabrication tolerances. Two-dimensional FDTD analysis is used to optimize and model the effectiveness of the structure.

Proceedings of SPIE, the International Society for Optical Engineering | 2006

Accelerated modeling and simulation with a desktop supercomputer

Eric J. Kelmelis; John R. Humphrey; James P. Durbano; Fernando E. Ortiz

The performance of modeling and simulation tools is inherently tied to the platform on which they are implemented. In most cases, this platform is a microprocessor, either in a desktop PC, PC cluster, or supercomputer. Microprocessors are used because of their familiarity to developers, not necessarily their applicability to the problems of interest. We have developed the underlying techniques and technologies to produce supercomputer performance from a standard desktop workstation for modeling and simulation applications. This is accomplished through the combined use of graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and standard microprocessors. Each of these platforms has unique strengths and weaknesses but, when used in concert, can rival the computational power of a high-performance computer (HPC). By adding a powerful GPU and our custom designed FPGA card to a commodity desktop PC, we have created simulation tools capable of replacing massive computer clusters with a single workstation. We present this work in its initial embodiment: simulators for electromagnetic wave propagation and interaction. We discuss the trade-offs of each independent technology, GPUs, FPGAs, and microprocessors, and how we efficiently partition algorithms to take advantage of the strengths of each while masking their weaknesses. We conclude by discussing enhancing the computational performance of the underlying desktop supercomputer and extending it to other application areas.

Proceedings of SPIE | 2013

Advances in computational fluid dynamics solvers for modern computing environments

Daniel Hertenstein; John R. Humphrey; Aaron Paolini; Eric J. Kelmelis

EM Photonics has been investigating the application of massively multicore processors to a key problem area: Computational Fluid Dynamics (CFD). While the capabilities of CFD solvers have continually increased and improved to support features such as moving bodies and adjoint-based mesh adaptation, the software architecture has often lagged behind. This has led to poor scaling as core counts reach the tens of thousands. In the modern High Performance Computing (HPC) world, clusters with hundreds of thousands of cores are becoming the standard. In addition, accelerator devices such as NVIDIA GPUs and Intel Xeon Phi are being installed in many new systems. It is important for CFD solvers to take advantage of the new hardware as the computations involved are well suited for the massively multicore architecture. In our work, we demonstrate that new features in NVIDIA GPUs are able to empower existing CFD solvers by example using AVUS, a CFD solver developed by the Air Force Research Labratory (AFRL) and the Volcanic Ash Advisory Center (VAAC). The effort has resulted in increased performance and scalability without sacrificing accuracy. There are many well-known codes in the CFD space that can benefit from this work, such as FUN3D, OVERFLOW, and TetrUSS. Such codes are widely used in the commercial, government, and defense sectors.

Explore More