Andrew G. Schmidt
University of North Carolina at Charlotte
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew G. Schmidt.
field-programmable custom computing machines | 2007
Ron Sass; William V. Kritikos; Andrew G. Schmidt; Srinivas Beeravolu; Parag Beeraka
While medium- and large-sized computing centers have increasingly relied on clusters of commodity PC hardware to provide cost-effective capacity and capability, it is not clear that this technology will scale to the PetaFLOP range. It is expected that semiconductor technology will continue its exponential advancements over next fifteen years; however, new issues are rapidly emerging and the relative importance of current performance metrics are shifting. Future PetaFLOP architectures will require system designers to solve computer architecture problems ranging from how to house, power, and cool the machine, all the while remaining sensitive to cost. The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers. This paper describes the nascent projects objectives and a 64-node prototype cluster. Specifically, the aim is to provide an detailed motivation for the project, describe the design principles guiding development, and present a preliminary performance assessment. Microbenchmark results are reported to answer several pragmatic questions about key subsystems, including the system software, network performance, memory bandwidth, power consumption of nodes in the cluster. Results suggest that the approach is sound.
field-programmable logic and applications | 2008
Yamuna Rajasekhar; William V. Kritikos; Andrew G. Schmidt; Ron Sass
This short paper describes a remote laboratory facility for platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable for use in a remote laboratory. The hardware and software required to implement a remote laboratory has been developed and a remote laboratory facility deployed at the University of North Carolina at Charlotte. Advantages, concerns, and actual costs are reported. The experience of using this facility in a senior/first-year graduate-level platform FPGA course is also described. Although these data are preliminary, survey results and first-hand experience with the laboratory were very encouraging and suggests that further studies on student learning are warranted.
field-programmable custom computing machines | 2011
Andrew G. Schmidt; Bin Huang; Ron Sass; Matthew French
As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers the ability of an FPGA to address another, increasingly important, feature -- resiliency. Specifically, a minimally-invasive monitoring infrastructure operating over a sideband network is presented. This includes a multi-chip protocol, IP cores that implement the protocol, and a tool to instrument existing hardware accelerator FPGA designs. To demonstrate the functionality, the system has been implemented on a cluster of FPGA devices running off-the-shelf MPI and Linux. We demonstrate the ability to do integrated software and hardware accelerator check pointing with restart under a variety of injected faults.
field-programmable custom computing machines | 2009
Andrew G. Schmidt; William V. Kritikos; Rahul R. Sharma; Ron Sass
The Reconfigurable Computing Cluster Project at the University of North Carolina at Charlotte is investigating the feasibility of using FPGAs as compute nodes to scale to PetaFLOP computing. To date the Spirit cluster, consisting of 64 FPGAs, has been assembled for the initial analysis. One important question is how to efficiently communicate among compute cores on-chip as well as between nodes. Tight integration between both the on-chip and off-chip networks is crucial in obtaining high performance and parallelism while minimizing communication overhead. This paper introduces AIREN — Architecture Independent REconfigurable Network — the integration of an on-chip crossbar switched network with an off-chip k-ary d-cube network and presents the results of integrating AIREN with the Spirit cluster.
field-programmable logic and applications | 2009
Shanyuan Gao; Andrew G. Schmidt; Ron Sass
Message-Passing is the dominant programming model for distributed memory parallel computers and Message- Passing Interface (MPI) is the standard. Along with pointto- point send and receive message primitives, MPI includes a set of collective communication operations that are used to synchronize and coordinate groups of tasks. The MPI_Barrier, one of the most important collective procedures, has been extensively studied on a variety of architectures over last twenty years. However, a cluster of Platform FPGAs is a new architecture and offers interesting, resourceefficient options for implementing the barrier operation. This paper describes an FPGA implementation of MPI Barrier. The premise is that barrier (and other collective communication operations) are very sensitive to latency as the number of nodes scales to the tens-of-thousands. The relatively slow processors found on FPGAs will significantly cap performance. The FPGA hardware design implements a tree-based algorithm and is tightly integrated with the custom high-speed on-chip/off-chip network. MPI access is available through a specially-designed kernel module. This effectively offloads the work from the CPU and OS into hardware. The evaluation of this design shows signficant performance gains compared with a conventional software implementation on both an FPGA cluster and a commodity cluster. Further, it suggests that moving other MPI collective operations into hardware would be beneficial.
reconfigurable computing and fpgas | 2012
William V. Kritikos; Andrew G. Schmidt; Ron Sass; Erik K. Anderson; Matthew French
The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale tomeet the performance needs ofmulti-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated.
reconfigurable computing and fpgas | 2012
Andrew G. Schmidt; Neil Steiner; Matthew French; Ron Sass
Designing hardware cores for FPGAs can quickly become a complicated task, difficult even for experienced engineers. With the addition of more sophisticated development tools and maturing high-level language-to-gates techniques, designs can be rapidly assembled; however, when the design is evaluated on the FPGA, the performance may not be what was expected. Therefore, an engineer may need to augment the design to include performance monitors to better understand the bottlenecks in the system or to aid in the debugging of the design. Unfortunately, identifying what to monitor and adding the infrastructure to retrieve the monitored data can be a challenging and time-consuming task. Our work alleviates this effort. We present the Hardware Performance Monitoring Infrastructure (HwPMI), which includes a collection of software tools and hardware cores that can be used to profile the current design, recommend and insert performance monitors directly into the HDL or netlist, and retrieve the monitored data with minimal invasiveness to the design. Three applications are used to demonstrate and evaluate HwPMIs capabilities. The results are highly encouraging as the infrastructure adds numerous capabilities while requiring minimal effort by the designer and low resource overhead to the existing design.
parallel computing | 2012
Andrew G. Schmidt; Siddhartha Datta; Ashwin A. Mendon; Ron Sass
The Reconfigurable Computing Cluster project is exploring novel parallel computing architectures in high performance computing with FPGA devices. Although there are no discrete microprocessors in the system, highly-integrated FPGAs (with embedded processors) are capable of hosting Linux-based systems and can run arbitrary MPI applications. This work present an investigation into accelerating I/O bound streaming applications through the coupling of custom computing cores, a hardware filesystem, and an integrated on-chip and off-chip network on the all-FPGA node cluster. Such an infrastructure enables productivity by minimizing hardware design while maintaining high performance. A hardware implementation of the BLASTn algorithm is used to demonstrate the performance gains and scalability of the custom computing cores across the Spirit cluster. Results show linear speedup across multiple nodes while supporting productivity by eliminating modifications to the original hardware core when scaling up to 512 parallel cores on the cluster.
field-programmable technology | 2010
Shanyuan Gao; Andrew G. Schmidt; Ron Sass
This paper demonstrates the benefits and pit-falls of implementing the collective communication operation reduce in the reconfigurable resources of an FPGA device across a cluster of all-FPGA compute nodes. Specifically, the communication and computation semantics of the MPI_Reduce call from the de facto Message-Passing Interface have been implemented. Using a synthetic benchmark a cluster of 32 FPGA nodes with a 300 MHz PowerPC processor, custom high speed network, and reduce core is compared against a conventional commodity cluster with 3.2 GHz Xeon processors and Gigabit Ethernet. The design is customized to support performing many reduce operations on small datasets while minimizing the amount of on-chip resources used, which is an increasingly common demand from domain scientists. Speedups of ≈2x to ≈800x are reported over that of a commodity cluster for small datasets, which provides significant motivation to continue the investigation into supporting additional collective communication operations directly in hardware.
international workshop on high performance reconfigurable computing technology and applications | 2010
Bin Huang; Andrew G. Schmidt; Ashwin A. Mendon; Ron Sass
As researchers push for Exascale computing, one of the emerging challenges is system resilience. Unlike fault-tolerance which corrects errors, recent reports suggest that resilient systems will need to continue to make progress on an application despite faults. A first step in developing a resilient system is to have robust, scalable system monitoring. The work described here presents a novel, minimally-invasive system monitor that operates over a separate network. We analytically characterize the performance for an arbitrary set of nodes and demonstrate a working implementation of the design. We argue that the hardware approach is inherently superior to the ad hoc, software techniques currently employed in practice.