Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew G. Schmidt is active.

Publication


Featured researches published by Andrew G. Schmidt.


field-programmable custom computing machines | 2007

Reconfigurable Computing Cluster (RCC) Project: Investigating the Feasibility of FPGA-Based Petascale Computing

Ron Sass; William V. Kritikos; Andrew G. Schmidt; Srinivas Beeravolu; Parag Beeraka

While medium- and large-sized computing centers have increasingly relied on clusters of commodity PC hardware to provide cost-effective capacity and capability, it is not clear that this technology will scale to the PetaFLOP range. It is expected that semiconductor technology will continue its exponential advancements over next fifteen years; however, new issues are rapidly emerging and the relative importance of current performance metrics are shifting. Future PetaFLOP architectures will require system designers to solve computer architecture problems ranging from how to house, power, and cool the machine, all the while remaining sensitive to cost. The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers. This paper describes the nascent projects objectives and a 64-node prototype cluster. Specifically, the aim is to provide an detailed motivation for the project, describe the design principles guiding development, and present a preliminary performance assessment. Microbenchmark results are reported to answer several pragmatic questions about key subsystems, including the system software, network performance, memory bandwidth, power consumption of nodes in the cluster. Results suggest that the approach is sound.


field-programmable logic and applications | 2008

Teaching FPGA system design via a remote laboratory facility

Yamuna Rajasekhar; William V. Kritikos; Andrew G. Schmidt; Ron Sass

This short paper describes a remote laboratory facility for platform FPGA education. With the addition of an inexpensive piece of hardware, many commercial off-the-shelf FPGA development boards can be made suitable for use in a remote laboratory. The hardware and software required to implement a remote laboratory has been developed and a remote laboratory facility deployed at the University of North Carolina at Charlotte. Advantages, concerns, and actual costs are reported. The experience of using this facility in a senior/first-year graduate-level platform FPGA course is also described. Although these data are preliminary, survey results and first-hand experience with the laboratory were very encouraging and suggests that further studies on student learning are warranted.


field-programmable custom computing machines | 2011

Checkpoint/Restart and Beyond: Resilient High Performance Computing with FPGAs

Andrew G. Schmidt; Bin Huang; Ron Sass; Matthew French

As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers the ability of an FPGA to address another, increasingly important, feature -- resiliency. Specifically, a minimally-invasive monitoring infrastructure operating over a sideband network is presented. This includes a multi-chip protocol, IP cores that implement the protocol, and a tool to instrument existing hardware accelerator FPGA designs. To demonstrate the functionality, the system has been implemented on a cluster of FPGA devices running off-the-shelf MPI and Linux. We demonstrate the ability to do integrated software and hardware accelerator check pointing with restart under a variety of injected faults.


field-programmable custom computing machines | 2009

AIREN: A Novel Integration of On-Chip and Off-Chip FPGA Networks

Andrew G. Schmidt; William V. Kritikos; Rahul R. Sharma; Ron Sass

The Reconfigurable Computing Cluster Project at the University of North Carolina at Charlotte is investigating the feasibility of using FPGAs as compute nodes to scale to PetaFLOP computing. To date the Spirit cluster, consisting of 64 FPGAs, has been assembled for the initial analysis. One important question is how to efficiently communicate among compute cores on-chip as well as between nodes. Tight integration between both the on-chip and off-chip networks is crucial in obtaining high performance and parallelism while minimizing communication overhead. This paper introduces AIREN — Architecture Independent REconfigurable Network — the integration of an on-chip crossbar switched network with an off-chip k-ary d-cube network and presents the results of integrating AIREN with the Spirit cluster.


field-programmable logic and applications | 2009

Hardware implementation of MPI_Barrier on an FPGA cluster

Shanyuan Gao; Andrew G. Schmidt; Ron Sass

Message-Passing is the dominant programming model for distributed memory parallel computers and Message- Passing Interface (MPI) is the standard. Along with pointto- point send and receive message primitives, MPI includes a set of collective communication operations that are used to synchronize and coordinate groups of tasks. The MPI_Barrier, one of the most important collective procedures, has been extensively studied on a variety of architectures over last twenty years. However, a cluster of Platform FPGAs is a new architecture and offers interesting, resourceefficient options for implementing the barrier operation. This paper describes an FPGA implementation of MPI Barrier. The premise is that barrier (and other collective communication operations) are very sensitive to latency as the number of nodes scales to the tens-of-thousands. The relatively slow processors found on FPGAs will significantly cap performance. The FPGA hardware design implements a tree-based algorithm and is tightly integrated with the custom high-speed on-chip/off-chip network. MPI access is available through a specially-designed kernel module. This effectively offloads the work from the CPU and OS into hardware. The evaluation of this design shows signficant performance gains compared with a conventional software implementation on both an FPGA cluster and a commodity cluster. Further, it suggests that moving other MPI collective operations into hardware would be beneficial.


reconfigurable computing and fpgas | 2012

Redsharc: a programming model and on-chip network for multi-core systems on a programmable chip

William V. Kritikos; Andrew G. Schmidt; Ron Sass; Erik K. Anderson; Matthew French

The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale tomeet the performance needs ofmulti-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated.


reconfigurable computing and fpgas | 2012

HwPMI: an extensible performance monitoring infrastructure for improving hardware design and productivity on FPGAs

Andrew G. Schmidt; Neil Steiner; Matthew French; Ron Sass

Designing hardware cores for FPGAs can quickly become a complicated task, difficult even for experienced engineers. With the addition of more sophisticated development tools and maturing high-level language-to-gates techniques, designs can be rapidly assembled; however, when the design is evaluated on the FPGA, the performance may not be what was expected. Therefore, an engineer may need to augment the design to include performance monitors to better understand the bottlenecks in the system or to aid in the debugging of the design. Unfortunately, identifying what to monitor and adding the infrastructure to retrieve the monitored data can be a challenging and time-consuming task. Our work alleviates this effort. We present the Hardware Performance Monitoring Infrastructure (HwPMI), which includes a collection of software tools and hardware cores that can be used to profile the current design, recommend and insert performance monitors directly into the HDL or netlist, and retrieve the monitored data with minimal invasiveness to the design. Three applications are used to demonstrate and evaluate HwPMIs capabilities. The results are highly encouraging as the infrastructure adds numerous capabilities while requiring minimal effort by the designer and low resource overhead to the existing design.


parallel computing | 2012

Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

Andrew G. Schmidt; Siddhartha Datta; Ashwin A. Mendon; Ron Sass

The Reconfigurable Computing Cluster project is exploring novel parallel computing architectures in high performance computing with FPGA devices. Although there are no discrete microprocessors in the system, highly-integrated FPGAs (with embedded processors) are capable of hosting Linux-based systems and can run arbitrary MPI applications. This work present an investigation into accelerating I/O bound streaming applications through the coupling of custom computing cores, a hardware filesystem, and an integrated on-chip and off-chip network on the all-FPGA node cluster. Such an infrastructure enables productivity by minimizing hardware design while maintaining high performance. A hardware implementation of the BLASTn algorithm is used to demonstrate the performance gains and scalability of the custom computing cores across the Spirit cluster. Results show linear speedup across multiple nodes while supporting productivity by eliminating modifications to the original hardware core when scaling up to 512 parallel cores on the cluster.


field-programmable technology | 2010

Impact of reconfigurable hardware on accelerating MPI_Reduce

Shanyuan Gao; Andrew G. Schmidt; Ron Sass

This paper demonstrates the benefits and pit-falls of implementing the collective communication operation reduce in the reconfigurable resources of an FPGA device across a cluster of all-FPGA compute nodes. Specifically, the communication and computation semantics of the MPI_Reduce call from the de facto Message-Passing Interface have been implemented. Using a synthetic benchmark a cluster of 32 FPGA nodes with a 300 MHz PowerPC processor, custom high speed network, and reduce core is compared against a conventional commodity cluster with 3.2 GHz Xeon processors and Gigabit Ethernet. The design is customized to support performing many reduce operations on small datasets while minimizing the amount of on-chip resources used, which is an increasingly common demand from domain scientists. Speedups of ≈2x to ≈800x are reported over that of a commodity cluster for small datasets, which provides significant motivation to continue the investigation into supporting additional collective communication operations directly in hardware.


international workshop on high performance reconfigurable computing technology and applications | 2010

Investigating resilient high performance reconfigurable computing with minimally-invasive system monitoring

Bin Huang; Andrew G. Schmidt; Ashwin A. Mendon; Ron Sass

As researchers push for Exascale computing, one of the emerging challenges is system resilience. Unlike fault-tolerance which corrects errors, recent reports suggest that resilient systems will need to continue to make progress on an application despite faults. A first step in developing a resilient system is to have robust, scalable system monitoring. The work described here presents a novel, minimally-invasive system monitor that operates over a separate network. We analytically characterize the performance for an arbitrary set of nodes and demonstrate a working implementation of the design. We argue that the hardware approach is inherently superior to the ad hoc, software techniques currently employed in practice.

Collaboration


Dive into the Andrew G. Schmidt's collaboration.

Top Co-Authors

Avatar

Ron Sass

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Matthew French

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

William V. Kritikos

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Sam Skalicky

Rochester Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ashwin A. Mendon

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Shanyuan Gao

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Yamuna Rajasekhar

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Bin Huang

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Erik K. Anderson

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Gabriel Weisz

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge