Andy Ye | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andy Ye is active.

Explore More

Publication

Featured researches published by Andy Ye.

field programmable gate arrays | 2009

VPR 5.0: FPGA cad and architecture exploration tools with single-driver routing, heterogeneity and process scaling

Jason Luu; Ian Kuon; Peter Jamieson; Ted Campbell; Andy Ye; Wei Mark Fang; Jonathan Rose

The VPR toolset [6, 7] has been widely used to perform FPGA architecture and CAD research, but has not evolved over the past decade to include many architectural features now present in modern FPGAs. This paper describes a new version of the toolset that includes four significant features: first, it now supports a broad range of single-driver routing architectures [29, 4, 16]. Single-driver routing has significantly different architectural and electrical properties from the multi-driver approach previously modelled, and is now employed in the majority of FPGAs sold. Second, the new release can now model a heterogeneous selection of hard logic blocks, which could include the hard memory and multipliers that are now ubiquitous in FPGAs. Third, we provide optimized electrical models of a wide range of architectures in different process technologies, including a range of area-delay tradeoffs for each single architecture. Prior releases of VPR did not publish even one architecture file with accurate resistance and capacitance parameters. Finally, to maintain robustness and to support future development the release includes a set of regression tests to check functionality and quality of result of the output of the tools. To illustrate the use of the new features, we present a new look at the FPGA area vs. logic block LUT size question that shows that small LUT sizes, with the use of carefully optimized electrical design and single-driver architectures, have better area (relative to 4-LUTs) than previously thought. Another experiment shows that several of the previous architectural results are invariant in moving from multi-driver to single-driver routing architecture and across a range of process technologies.

IEEE Transactions on Very Large Scale Integration Systems | 2006

Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits

Andy Ye; Jonathan Rose

As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are increasingly being used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multibit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, compared to conventional FPGA routing architectures, the multibit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.

ACM Transactions on Reconfigurable Technology and Systems | 2011

VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling

Jason Luu; Ian Kuon; Peter Jamieson; Ted Campbell; Andy Ye; Wei Mark Fang; Kenneth B. Kent; Jonathan Rose

The VPR toolset has been widely used in FPGA architecture and CAD research, but has not evolved over the past decade. This article describes and illustrates the use of a new version of the toolset that includes four new features: first, it supports a broad range of single-driver routing architectures, which have superior architectural and electrical properties over the prior multidriver approach (and which is now employed in the majority of FPGAs sold). Second, it can now model, for placement and routing a heterogeneous selection of hard logic blocks. This is a key (but not final) step toward the incluion of blocks such as memory and multipliers. Third, we provide optimized electrical models for a wide range of architectures in different process technologies, including a range of area-delay trade-offs for each single architecture. Finally, to maintain robustness and support future development the release includes a set of regression tests for the software. To illustrate the use of the new features, we explore several architectural issues: the FPGA area efficiency versus logic block granularity, the effect of single-driver routing, and a simple use of the heterogeneity to explore the impact of hard multipliers on wiring track count.

field programmable gate arrays | 1999

Procedural texture mapping on FPGAs

Andy Ye; David Lewis

Procedural textures can be effectively used to enhance the visual realism of computer rendered images. Procedural textures can provide higher realism for 3-D objects than traditional hardware texture mapping methods which use memory to store 2-D texture images. This paper proposes a new method of hardware texture mapping in which texture images are synthesized using FPGAs. This method is very efficient for texture mapping procedural textures of more than two input variables. By synthesizing these textures on the fly, the large amount of memory required to store their multidimensional texture images is eliminated, making texture mapping of 3-D textures and parameterized textures feasible in hardware. This paper shows that using FPGAs, procedural textures can be synthesized at high speed, with a small hardware cost. Data on the performance and the hardware cost of synthesizing procedural textures in FPGAS are presented. This paper also presents, the FPGA implementations of two Perlin noise based 3-D procedural textures.

field programmable gate arrays | 2005

Using bus-based connections to improve field-programmable gate array density for implementing datapath circuits

Andy Ye; Jonathan Rose

As the logic capacity of Field-Programmable Gate Arrays (FPGAs) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This paper describes such an FPGA routing architecture, called the multi-bit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, comparing to conventional FPGA routing architectures, the multi-bit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall FPGA area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections.

custom integrated circuits conference | 2003

Architecture of datapath-oriented coarse-grain logic and routing for FPGAs

Andy Ye; Jonathan Rose; L. David

In this paper, we propose a new datapath-oriented FPGA architecture that utilizes coarse-grain logic and routing resources to increase the area efficiency of datapath circuits. Using a set of custom-built datapath-oriented CAD tools and a set of datapath benchmarks, we investigated several variants of our proposed architecture. We found that the architecture achieves the highest area efficiency when 40% to 50% of the total routing tracks are coarse-grain. Furthermore, compared to conventional FPGA architectures, our datapath-oriented architecture uses about 10% less area to implement the same circuits.

field-programmable technology | 2004

Using multi-bit logic blocks and automated packing to improve field-programmable gate array density for implementing datapath circuits

Andy Ye; Jonathan Rose

As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components, called bit-slices, it is possible to utilize datapath regularity in order to achieve significant area savings through FPGA architectural innovations. This work describes such an FPGA logic block architecture, called a multi-bit logic block, which employs configuration memory sharing to exploit datapath regularity. It is experimentally shown that, comparing to conventional FPGA logic blocks, the multi-bit logic blocks can achieve 18% to 26% logic block area reduction for implementing datapath circuits, which represents an overall FPGA area saving of 5% to 13%. A packing algorithm for the multi-bit logic block architecture is also proposed in this paper; and it is used to empirically find the best values for several important architectural parameters of the new architecture, including the most area efficient granularity values and the most area efficient amount of configuration memory sharing.

field-programmable technology | 2002

Synthesizing datapath circuits for FPGAs with emphasis on area minimization

Andy Ye; Jonathan Rose; David Lewis

Large circuits, whether they are arithmetic, digital signal processing, switching, or processors, typically contain a greater portion of highly regular datapath logic. Datapath synthesis algorithms preserve these regular structures, so they can be exploited by packing, placement, and routing tools for speed or density. Typical datapath synthesis algorithms, however, sacrifice area to gain regularity. Current algorithms can have as much as 30% to 40% area inflation when compared with traditional flat synthesis algorithms. This paper describes a datapath synthesis algorithm with very low area overhead, which is an enhancement to the module compaction algorithm. We propose two word-level optimizations - multiplexer tree collapsing and operation reordering. They reduce the area inflation to 3%-8% as compared with flat synthesis. Our synthesis results also retain significant amount of regularity from the original designs.

field-programmable logic and applications | 2008

A scalable computing and memory architecture for variable block size motion estimation on Field-Programmable Gate Arrays

Theepan Moorthy; Andy Ye

In this paper, we investigate the use of field-programmable gate arrays (FPGAs) in the design of a highly scalable variable block size motion estimation architecture for the H.264/AVC video encoding standard. The scalability of the architecture allows one to incorporate the system into low cost single FPGA solutions for low-resolution video encoding applications as well as into high performance multi-FPGA solutions targeting high-resolution applications. To overcome the performance gap between FPGAs and application specific integrated circuits, our algorithm intelligently increases its parallelism as the design scales while minimizing the use of memory bandwidth. The core computing unit of the architecture is implemented on FPGAs and its performance is reported. It is shown that the computing unit is able to achieve 28 frames per second (fps) performance for 640x480 resolution VGA video while incurring only 4% device utilization on a Xilinx XC5VLX330 FPGA. With 8 computing units at 37% device utilization, the architecture is able to achieve 31 fps performance for encoding full 1920x1088 progressive HDTV video.

IEEE Transactions on Very Large Scale Integration Systems | 2011

The Effect of Multi-Bit Correlation on the Design of Field-Programmable Gate Array Routing Resources

Phoebe Ping Chen; Andy Ye

As the logic capacity of field-programmable gate arrays (FPGAs) increases, they are being increasingly used to implement large arithmetic-intensive applications. Large arithmetic intensive applications often contain a large proportion of datapath circuits. Since datapath circuits are designed to process multiple-bit-wide data, FPGAs implementing these circuits often have to transport a large amount of multiple-bit-wide signals from one computing element (such as a logic block, a DSP block, or a multi-bit addressable memory cell) to another. In this work, we investigate the area efficiency of FPGA routing resources for transporting multiple-bit-wide signals. It is shown that, for datapath circuits, the switch patterns used by the conventional routing architecture, which uniformly distribute routing switches across the routing tracks, are inefficient for connecting the computing elements to their tracks. The more efficient multi-bit aware patterns, which contain a densely populated single-bit region and a sparsely populated multi-bit region, can be effectively used to reduce the routing area of FPGAs for implementing arithmetic intensive applications by 6%-10%. It is also shown that the further sharing of configuration memory among the switches within the multi-bit aware patterns does not significantly increase their area efficiency since datapath circuits typically contain a mixture of multi-bit and single-bit signals-while configuration memory sharing can substantially increase the area efficiency of routing resources for transporting multi-bit signals, it also significantly reduces their ability for transporting single-bit signals. More importantly, configuration memory sharing can significantly reduce the effectiveness of the enhanced multi-bit aware patterns-patterns that incorporate both multi-bit aware and single-bit oriented switches within a single region in order to increase its ability for transporting both single-bit and multi-bit signals.

Explore More