Xiaoyong Tang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoyong Tang is active.

Explore More

Publication

Featured researches published by Xiaoyong Tang.

design automation conference | 2004

Automatic translation of software binaries onto FPGAs

Gaurav Mittal; David Zaretsky; Xiaoyong Tang; Prithviraj Banerjee

The introduction of advanced FPGA architectures, with built-in DSP support, has given DSP designers a new hardware alternative. By exploiting its inherent parallelism, it is expected that FPGAs can outperform DSP processors. This paper describes the process and considerations for automatically translating binaries targeted for general DSP processors into Register Transfer Level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. The Texas Instruments C6000 DSP processor architecture is chosen as the DSP processor platform, and the Xilinx Virtex II as a target FPGA. Various optimizations are discussed, including data dependency analysis, procedure extraction, induction variable analysis, memory optimizations, and scheduling. Experimental results on resource usage and performance are shown for ten software binary benchmarks. Results show performance gains of 3-20X in the FPGA designs over that of the DSP processors in terms of reductions of execution cycles.

great lakes symposium on vlsi | 2004

Macro-models for high level area and power estimation on FPGAs

Tianyi Jiang; Xiaoyong Tang; Prithviraj Banerjee

As more and more complex applications are implemented on FPGAs, high-level design tools are needed to reduce the design time. A good high-level synthesis tool usually has an automated design space exploration pass to determine the effects of various compiler optimizations on the area and power of the synthesized hardware. Such a pass needs early estimation of area and power. Towards this end, we have developed high-level equation based area and power macro-models for various RTL level operators such as adders, multipliers, and logical operators. The area model is parameterized with the bit width of the device and the power model takes into account input switching activity and input spatial correlation as well as input bit width. These models are derived by actual synthesis of these RTL operators using back-end logic synthesis and place-and-route tools. Compared with the other approaches, our method generated a uniform macro-model for each operator with fewer coefficients and sometimes lower degrees. It is also easier to analyze the power sensitivity to different parameters. Experimental results show that these area and power models are accurate and efficient.

compilers, architecture, and synthesis for embedded systems | 2002

PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations

Debabrata Bagchi; Satrajit Pal; Xiaoyong Tang; Alok N. Choudhary; Prithviraj Banerjee

Chip fabrication technology continues to plunge deeper into sub-micron levels requiring hardware designers to utilize ever-increasing amounts of logic and shorten design time. Toward that end, high-level languages such as C/C++ are becoming popular for hardware description and synthesis in order to more quickly leverage complex algorithms. Similarly, as logic density increases due to technology, power dissipation becomes a progressively more important metric of hardware design. PACT HDL, a C to HDL compiler, merges automated hardware synthesis of high-level algorithms with power and performance optimizations and targets arbitrary hardware architectures, particularly in a System on a Chip (SoC) setting that incorporates reprogrammable and application-specific hardware. PACT HDL is intended for applications well suited to custom hardware implementation such as image and signal processing codes. By making the compiler modular and flexible, optimizations may be executed in any order and at different levels in the compilation process. PACT HDL generates industry standard HDL codes, such as RTL Verilog and VHDL, which may be synthesized and profiled for power using commercial tools. This is the first paper on the PACT compiler project in a series. The compiler framework and introductory optimizations are presented. Later papers will focus on these and other optimizations in detail.

International Journal of Simulation and Process Modelling | 2006

Macro-models for high-level area and power estimation on FPGAs

Tianyi Jiang; Xiaoyong Tang; Prith Banerjee

This paper presents the high-level equation based area and power macro-models for various RTL level operators on FPGAs. The area model is parameterised with the bit width of the device and the power model takes into account input switching activity and input spatial correlation as well as input bit width. These models are derived by actual synthesis of these RTL operators using back-end logic synthesis and place-and-route tools. Compared with the other approaches, our method generated a uniform macro-model for each operator with fewer coefficients and sometimes lower degrees. It is also easier to analyse the power sensitivity to different parameters.

field-programmable custom computing machines | 2004

Overview of the FREEDOM compiler for mapping DSP software to FPGAs

David Zaretsky; M. Mittal; Xiaoyong Tang; Prithviraj Banerjee

Applications that require digital signal processing (DSP) functions are typically mapped onto general purpose DSP processors. With the introduction of advanced FPGA architectures with built-in DSP support, a new hardware alternative is available for DSP designers. By exploiting its inherent parallelism, it is expected that FPGAs can outperform DSP processors. However, the migration of assembly code to hardware is typically a very arduous process. This paper describes the process and considerations for automatically translating software assembly and binary codes targeted for general DSP processors into register transfer level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. The Texas instruments C6000 DSP processor architecture has been used as the DSP processor platform, and the Xilinx Virtex II as the target FPGA. Various optimizations are discussed, including loop unrolling, induction variable analysis, memory and register optimizations, scheduling and resource binding. Experimental results on resource usage and performance are shown for ten software binary benchmarks in the signal processing and image processing domains. Results show performance gains of 3-20x in terms of reductions in execution cycles and 1.3-5x in terms of reductions in execution times for the FPGA designs over that of the DSP processors in terms of reductions in execution cycles.

international conference on vlsi design | 2005

Behavioral synthesis of data-dominated circuits for minimal energy implementation

Xiaoyong Tang; Tianyi Jiang; Prithviraj Banerjee

This paper presents a power estimation and optimization approach in the early stage of behavioral synthesis for unscheduled data-dominated circuits. A methodology for estimating the power consumption of every module in the system is developed using an automatic construction of a novel switching table and the power table. An integer linear programming model is presented to reduce the energy consumption of the circuit through concurrent module selection, binding, and scheduling for a non-scheduled data path. Experimental results of six data-dominated benchmarks show that our technique achieves an average of 29.8% energy savings compared to a traditional area optimal synthesis algorithm where energy is not considered. Additionally, this approach consumes on the average 24.0% and 20.3% less energy compared to two other power-oriented optimization strategies respectively.

symposium on cloud computing | 2003

Compiler optimizations in the PACT HDL behavioral synthesis tool for ASICs and FPGAs

Xiaoyong Tang; Tianyi Jiang; Prithviraj Banerjee

This paper describes the PACT HDL compiler, which allows users to develop algorithms in C and synthesize hardware designs onto FPGAs and ASICs. It also explicitly addresses low power issues during the high-level synthesis stages. Several power-saving compiler optimizations are discussed.

great lakes symposium on vlsi | 2004

Evaluation of scheduling and allocation algorithms while mapping assembly code onto FPGAs

David Zaretsky; Gaurav Mittal; Xiaoyong Tang; Prithviraj Banerjee

Migration of software from older general purpose embedded processors onto newer mixed hardware/software Systems-On-Chip (SOC) platforms is becoming an increasingly important topic. Automatic translation of general purpose software binaries and assembly code onto hardware implementations using FPGAs require sophisticated scheduling and allocation algorithms to maximize the resource utilization of such hardware devices. This paper describes the effects of scheduling and chaining of node operations in a CDFG onto an FPGA. The effects of register allocation on scheduled nodes are also discussed. The Texas Instruments C6000 DSP processor architecture was chosen as the DSP processor platform and assembly code, and the Xilinx Virtex II XC2V250 was chosen as the target FPGA. Results are reported on ten benchmarks, which show that scheduling with chaining operations produces the best results on FPGAs, while the addition of register allocation in fact generates poorer designs in terms of area and frequency.

field programmable gate arrays | 2004

High level area, delay and power estimation for FPGAs

Tianyi Jiang; Xiaoyong Tang; Prithviraj Banerjee

This paper describes an approach for high-level estimation of area, delay and power for FPGA synthesis. This approach has been integrated within the PACT compiler framework which has an automated design space exploration pass that determines the effects of various compiler optimizations on the synthesized hardware. Such a pass needs early estimation of area, delay and power. Towards this end, we have developed area and delay models for various RTL level operators such as adders, multipliers, and logical operators, which are parameterized with the bit widths of the devices. We have also derived high-level equation based power macro-models which take into account input switching activities, input spatial correlation and input bit width. These models are derived by actual synthesis of the RTL operators using back-end logic synthesis and place-and-route tools. Experimental results show that these area, delay and power models are accurate and efficient.

Archive | 2004