Randy Renfu Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Randy Renfu Huang is active.

Explore More

Publication

Featured researches published by Randy Renfu Huang.

international symposium on low power electronics and design | 2016

Dissecting Xeon + FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter: Invited Paper

Herman Schmit; Randy Renfu Huang

Intels Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU complex is a significant multiplier for power savings in the datacenter as a whole. Focusing on the domain of machine learning, we will present the recent evolution of data types and operators, and make the case that FPGAs are the path to facilitate this continued evolution. Finally, we will discuss the criticality of the close coupling of the CPU and the FPGA. This coupling facilitates high bandwidth and low latency communication that is required for the development, debugging and deployment of heterogeneous applications.

ieee high performance extreme computing conference | 2017

Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA

Philip Colangelo; Enno Luebbers; Randy Renfu Huang; Martin Margala; Kevin Nealis

Intel®s Xeon® processor with integrated FPGA is a new research platform that provides all the capabilities of a Broadwell Xeon Processor with the added functionality of an Arria 10 FPGA in the same package. In this paper, we present an implementation on this platform to showcase the abilities and effectiveness of utilizing both hardware architectures to accelerate a convolutional based neural network (CNN). We choose a network topology that uses binary weights and low precision activation data to take advantage of the available customizable fabric provided by the FPGA. Further, compared to standard multiply accumulate CNNs, binary weighted networks (BWN) reduce the amount of computation by eliminating the need for multiplication resulting in little to no classification accuracy degradation. Coupling Intels Open Programmable Acceleration Engine (OPAE) with Caffe provides a robust framework that was used as the foundation for our application. Due to the convolution primitives taking the most computation in our network, we offload the feature and weight data to a customized binary convolution accelerator loaded in the FPGA. Employing the low latency Quick Path Interconnect (QPI) that bridges the Broadwell Xeon processor and Arria 10 FPGA, we can carry out fine-grained offloads while avoiding bandwidth bottlenecks. An initial proof of concept design showcasing this new platform that utilizes only a portion of the FPGA core logic exemplifies that by using both the Xeon processor and FPGA together we can improve the throughput by 2× on some layers and by 1.3× overall.

Archive | 2007

Configurable ic with routing circuits with offset connections

Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang

Archive | 2005

Configurable IC's with configurable logic resources that have asymetric inputs and/or outputs

Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang; Jason Redgrave

Archive | 2007

Configurable IC with logic resources with offset connections

Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang

field programmable gate arrays | 2017

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Eriko Nurvitadhi; Ganesh Venkatesh; Jaewoong Sim; Debbie Marr; Randy Renfu Huang; Jason Ong Gee Hock; Yeong Tat Liew; Krishnan Srivatsan; Duncan Moss; Suchit Subhaschandra; Guy Boudoukh

Archive | 2005