Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yiping Fan is active.

Publication


Featured researches published by Yiping Fan.


field programmable gate arrays | 2004

Application-specific instruction generation for configurable processor architectures

Jason Cong; Yiping Fan; Guoling Han; Zhiru Zhang

Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).


Archive | 2008

AutoPilot: A Platform-Based ESL Synthesis System

Zhiru Zhang; Yiping Fan; Wei Jiang; Guoling Han; Changqi Yang; Jason Cong

The rapid increase of complexity in System-on-a-Chip design urges the design community to raise the level of abstraction beyond RTL. Automated behavior-level and system-level synthesis are naturally identified as next steps to replace RTL synthesis and will greatly boost the adoption of electronic system-level (ESL) design. High-level executable specifications, such as C, C++, or SystemC, are also preferred for system-level verification and hardware/software co-design.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2004

Architecture and synthesis for on-chip multicycle communication

Jason Cong; Yiping Fan; Guoling Han; Xun Yang; Zhiru Zhang

For multigigahertz designs in nanometer technologies, data transfers on global interconnects take multiple clock cycles. In this paper, we propose a regular distributed register (RDR) microarchitecture, which offers high regularity and direct support of multicycle on-chip communication. The RDR microarchitecture divides the entire chip into an array of islands so that all local computation and communication within an island can be performed in a single clock cycle. Each island contains a cluster of computational elements, local registers, and a local controller. On top of the RDR microarchitecture, novel layout-driven architectural synthesis algorithms have been developed for multicycle communication, including scheduling-driven placement, placement-driven simultaneous scheduling with rebinding, and distributed control generation, etc. The experimentation on a number of real-life examples demonstrates promising results. For data flow intensive examples, we obtain a 44% improvement on average in terms of the clock period and a 37% improvement on average in terms of the final latency, over the traditional flow. For designs with control flow, our approach achieves a 28% clock-period reduction and a 23% latency reduction on average.


symposium on cloud computing | 2006

Platform-Based Behavior-Level and System-Level Synthesis

Jason Cong; Yiping Fan; Guoling Han; Wei Jiang; Zhiru Zhang

With the rapid increase of complexity in system-on-a-chip (SoC) design, the electronic design automation (EDA) community is moving from RTL (Register Transfer Level) synthesis to behavioral-level and system-level synthesis. The needs of system-level verification and software/hardware co-design also prefer behavior-level executable specifications, such as C or SystemC. In this paper we present the platform-based synthesis system, named xPilot, being developed at UCLA. The first objective of xPilot is to provide novel behavioral synthesis capability for automatically generating efficient RTL code from a C or SystemC description for a given system platform and optimizing the logic, interconnects, performance, and power simultaneously. The second objective of xPilot is to provide a platform-based system-level synthesis capability, including both synthesis for application-specific configurable processors and heterogeneous multi-core systems. Preliminary experiments on FPGAs demonstrate the efficacy of our approach on a wide range of applications and its value in exploring various design tradeoffs.


field programmable gate arrays | 2005

Instruction set extension with shadow registers for configurable processors

Jason Cong; Yiping Fan; Guoling Han; Ashok Jagannathan; Glenn Reinman; Zhiru Zhang

Configurable processors are becoming increasingly popular for modern embedded systems (especially for the field-programmable system-on-a-chip). While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. The application of our approach results in a promising performance improvement.


asia and south pacific design automation conference | 2007

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs

Deming Chen; Jason Cong; Yiping Fan; Zhiru Zhang

In this paper, we present a simultaneous resource allocation and binding algorithm for FPGA power minimization. To fully validate our methodology and result, our work targets a real FPGA architecture - Altera Stratix FPGA, which includes generic logic elements, DSP cores, and memories, etc. We design a high-level power estimator for this architecture and evaluate its estimation accuracy against a commercial gate-level power estimator - Quartus II PowerPlay Analyzer. During the synthesis stage, we pay special attention to interconnections and multiplexers. We concentrate on resource allocation and binding tasks because they are the key steps to determine the interconnections. We use a novel approach to explore the design space. Experimental results show that our high-level power estimator is 8.7% away from PowerPlay Analyzer. Meanwhile, we are able to achieve a significant amount of power reduction (32%) with better circuit speed (16%) compared to a traditional resource allocation and binding algorithm.


asia and south pacific design automation conference | 2005

Bitwidth-aware scheduling and binding in high-level synthesis

Jason Cong; Yiping Fan; Guoling Han; Yizhou Lin; Junjuan Xu; Zhiru Zhang; Xu Cheng

Many high-level description languages, such as C/C++ or Java, lack the capability to specify the bitwidth information for variables and operations. Synthesis from these specifications without bitwidth analysis may introduce wasted resources. Furthermore, conventional high-level synthesis techniques usually focus on uniform-width resources, thus they cannot obtain the full resource savings even with bitwidth information. This work develops a bitwidth-aware synthesis flow, including bitwidth analysis, scheduling and binding, and register allocation and binding, to exploit the multi-bitwidth nature of operations and variables for area-efficient designs. We also develop lower bound estimation to evaluate the efficiency of our proposed solutions for register allocation and binding. The flow is implemented in the MCAS synthesis system (Cong et al., 2004). Experimental results show that our proposed bitwidth-aware synthesis flow reduces area by 36% and wire-length by 52% on average compared to the uniform-width MCAS flow, while achieving the same performance.


design automation conference | 2004

Architecture-level synthesis for automatic interconnect pipelining

Jason Cong; Yiping Fan; Zhiru Zhang

For multi-gigahertz synchronous designs in nanometer technologies, multiple clock cycles are needed to cross the global interconnects, thus making it necessary to have pipelined global interconnects. In this paper we present an architecture-level synthesis solution to support automatic pipelining of on-chip interconnects. Specifically, we extend the recently proposed Regular Distributed Register (RDR) micro-architecture to support interconnect pipelining. We formulate a novel global interconnect sharing problem for global wiring minimization and show that it is polynomial time solvable by transformation to a special case of the real-time scheduling problem. Experimental results show that our approach matches or exceeds the RDR-based approach in performance, with a significant wiring reduction of 15% to 21%.


IEEE Transactions on Very Large Scale Integration Systems | 2010

LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization

Deming Chen; Jason Cong; Yiping Fan; Lu Wan

In this paper, we present a low-power architectural synthesis system (LOPASS) for field-programmable gate-array (FPGA) designs with interconnect power estimation and optimization. LOPASS includes three major components: 1) a flexible high-level power estimator for FPGAs considering the power consumption of various FPGA logic components and interconnects; 2) a simulated-annealing optimization engine that carries out resource selection and allocation, scheduling, functional unit binding, register binding, and interconnection estimation simultaneously to reduce power effectively; and 3) a k-cofamily-based register binding algorithm and an efficient port assignment algorithm that reduce interconnections in the data path through multiplexer optimization. The experimental results show that LOPASS produces promising results on latency optimization compared to an academic high-level synthesis tool SPARK. Compared to an early commercial high-level synthesis tool, namely, Synopsys Behavioral Compiler, LOPASS is 61.6% better on power consumption and 10.6% better on clock period on average. Compared to a current commercial tool, namely, Impulse C, LOPASS is 31.1% better on power reduction with an 11.8% penalty on clock period.


international conference on computer aided design | 2006

Platform-based resource binding using a distributed register-file microarchitecture

Jason Cong; Yiping Fan; Wei Jiang

Behavior synthesis and optimization beyond the register transfer level require an efficient utilization of the underlying platform features. This paper presents a platform-based resource-binding approach using a distributed register-file microarchitecture (DRFM) that makes efficient use of distributed embedded memory blocks as register files in modern FPGAs. A DRFM contains multiple islands, each having a local register file, a functional unit pool and data-routing logic. Compared with the traditional discrete-register counterpart, a DRFM allows use of the platform-featured on-chip memory or register-file IP blocks to implement its local register files, and this results in substantial saving of multiplexing logic and global interconnects. DRFM provides a useful architectural template and a direct optimization objective for minimizing inter-island connections for synthesis algorithms. Based on DRFM, we propose a novel binding algorithm focusing on the minimization of the inter-island connections. By applying our approach, significant reductions on multiplexors and global-interconnections are observed. On the Xilinx Virtex II FPGA platform, our experimental results show a 2times logic area reduction and a 7.8% performance improvement, compared with the traditional discrete-register-based approach

Collaboration


Dive into the Yiping Fan's collaboration.

Top Co-Authors

Avatar

Jason Cong

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guoling Han

University of California

View shared research outputs
Top Co-Authors

Avatar

Wei Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar

Junjuan Xu

University of California

View shared research outputs
Top Co-Authors

Avatar

Xun Yang

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Glenn Reinman

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge