Randy Renfu Huang
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Randy Renfu Huang.
international symposium on low power electronics and design | 2016
Herman Schmit; Randy Renfu Huang
Intels Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU complex is a significant multiplier for power savings in the datacenter as a whole. Focusing on the domain of machine learning, we will present the recent evolution of data types and operators, and make the case that FPGAs are the path to facilitate this continued evolution. Finally, we will discuss the criticality of the close coupling of the CPU and the FPGA. This coupling facilitates high bandwidth and low latency communication that is required for the development, debugging and deployment of heterogeneous applications.
ieee high performance extreme computing conference | 2017
Philip Colangelo; Enno Luebbers; Randy Renfu Huang; Martin Margala; Kevin Nealis
Intel®s Xeon® processor with integrated FPGA is a new research platform that provides all the capabilities of a Broadwell Xeon Processor with the added functionality of an Arria 10 FPGA in the same package. In this paper, we present an implementation on this platform to showcase the abilities and effectiveness of utilizing both hardware architectures to accelerate a convolutional based neural network (CNN). We choose a network topology that uses binary weights and low precision activation data to take advantage of the available customizable fabric provided by the FPGA. Further, compared to standard multiply accumulate CNNs, binary weighted networks (BWN) reduce the amount of computation by eliminating the need for multiplication resulting in little to no classification accuracy degradation. Coupling Intels Open Programmable Acceleration Engine (OPAE) with Caffe provides a robust framework that was used as the foundation for our application. Due to the convolution primitives taking the most computation in our network, we offload the feature and weight data to a customized binary convolution accelerator loaded in the FPGA. Employing the low latency Quick Path Interconnect (QPI) that bridges the Broadwell Xeon processor and Arria 10 FPGA, we can carry out fine-grained offloads while avoiding bandwidth bottlenecks. An initial proof of concept design showcasing this new platform that utilizes only a portion of the FPGA core logic exemplifies that by using both the Xeon processor and FPGA together we can improve the throughput by 2× on some layers and by 1.3× overall.
Archive | 2007
Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang
Archive | 2005
Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang; Jason Redgrave
Archive | 2007
Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang
field programmable gate arrays | 2017
Eriko Nurvitadhi; Ganesh Venkatesh; Jaewoong Sim; Debbie Marr; Randy Renfu Huang; Jason Ong Gee Hock; Yeong Tat Liew; Krishnan Srivatsan; Duncan Moss; Suchit Subhaschandra; Guy Boudoukh
Archive | 2005
Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang; Jason Redgrave
Archive | 2010
Jason Redgrave; Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang
Archive | 2012
Randy Renfu Huang; Martin L. Voogel; Jingcao Hu; Steven Teig
Archive | 2007
Herman Schmit; Steven Teig; Brad Hutchings; Randy Renfu Huang; Jason Redgrave