Yangdong Deng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yangdong Deng is active.

Explore More

Publication

Featured researches published by Yangdong Deng.

international symposium on physical design | 2001

Interconnect characteristics of 2.5-D system integration scheme

Yangdong Deng; Wojciech Maly

Growing number of excessively long on-chip wires in modern monolithic ICs is a byproduct of growing chip size. To address this problem instead of placing all systems components in one layer (i.e. in 2-D space) one can use a stack of single layer monolithic ICs (called here a 2.5-D integrated IC). To assess the potential benefits of such a 2.5-D integration schema this paper compares wire length distributions, obtained for 2-D and 2.5-D implementations of benchmark circuits. In the assessment two newly developed floorplanning and placement tools were used. Significant reductions in both total wirelength and worst-case wirelength was observed for the systems implemented as 2.5-D ICs.

international conference on computer aided design | 2009

Taming irregular EDA applications on GPUs

Yangdong Deng; Bo David Wang; Shuai Mu

Recently general purpose computing on graphic processing units (GPUs) is rising as an exciting new trend in high-performance computing. Thus it is appealing to study the potential of GPU for Electronic Design Automation (EDA) applications. However, EDA generally involves irregular data structures such as sparse matrix and graph operations, which pose significant challenges for efficient GPU implementations. In this paper, we propose high-performance GPU implementations for two important irregular EDA computing patterns, Sparse-Matrix Vector Product (SMVP) and graph traversal. On a wide range of EDA problem instances, our SMVP implementations outperform all published work and achieve a speedup of one order of magnitude over the CPU baseline. Upon such a basis, both timing analysis and linear system solution can be considerably accelerated. We also introduce a SMVP based formulation for Breadth-First Search and observe considerable speedup on GPU implementations. Our results suggest that the power of GPU computing can be successfully unleashed through designing GPU-friendly algorithms and/or re-organizing computing structures of current algorithms.

IEEE Transactions on Biomedical Engineering | 2012

A Two-Hop Wireless Power Transfer System With an Efficiency-Enhanced Power Receiver for Motion-Free Capsule Endoscopy Inspection

Tianjia Sun; Xiang Xie; Guolin Li; Yingke Gu; Yangdong Deng; Zhihua Wang

This paper presents a wireless power transfer system for a motion-free capsule endoscopy inspection. Conventionally, a wireless power transmitter in a specifically designed jacket has to be connected to a strong power source with a long cable. To avoid the power cable and allow patients to walk freely in a room, this paper proposes a two-hop wireless power transfer system. First, power is transferred from a floor to a power relay in the patients jacket via strong coupling. Next, power is delivered from the power relay to the capsule via loose coupling. Besides making patients much more conformable, the proposed techniques eliminate the sources of reliability issues arisen from the moving cable and connectors. In the capsule, it is critical to enhance the power conversion efficiency. This paper develops a switch-mode rectifier (rectifying efficiency of 93.6%) and a power combination circuit (enhances combining efficiency by 18%). Thanks to the two-hop transfer mechanism and the novel circuit techniques, this system is able to transfer an average power of 24 mW and a peak power of 90 mW from the floor to a 13 mm × 27 mm capsule over a distance of 1 m with the maximum dc-to-dc power efficiency of 3.04%.

asia and south pacific design automation conference | 2002

System-level point-to-point communication synthesis using floorplanning information [SoC]

Jingcao Hu; Yangdong Deng; Radu Marculescu

In this paper, we present a point-to-point (P2P) communication synthesis methodology for system-on-chip (SOC) design. We consider real-time systems where IP selection, mapping and task scheduling are already fixed. Our algorithm takes the communication task graph (CTG) and IP sizes as inputs and automatically synthesizes a P2P communication network, which satisfies the specified deadlines of the application. As the main contribution, we first formulate the problem of automatic bitwidth synthesis which minimizes total wirelength and then propose an efficient heuristic to solve it. A key element in our approach is a communication-driven floorplanner which considers the communication energy consumption in the objective function. Experimental results show that, compared to standard shared bus architecture, significant power savings can be achieved by using the P2P scheme and communication-driven floorplanning. For instance, for an H.263 encoder we estimate 21.6% savings in energy and 15.1% in terms of wiring resources with an area overhead of only 4%.

IEEE Transactions on Very Large Scale Integration Systems | 2005

2.5-dimensional VLSI system integration

Yangdong Deng; Wojciech Maly

The excessive interconnection delay and fast increasing development cost, as well as complexity of the single-chip integration of different technologies, are likely to become the major stumbling blocks for the success of monolithic system-on-chips. To address the above problems, this paper investigates a new VLSI integration paradigm, the so-called 2.5-dimensional (2.5-D) integration scheme. Using this scheme, a VLSI system is implemented as a three-dimensional stacking of monolithic chips. A cost analysis framework was developed to justify the 2.5-D integration scheme from an economic point of view. Enabling technologies for the new integration scheme are also reviewed.

international conference on computer design | 2003

Physical design of the "2.5D" stacked system

Yangdong Deng; Wojciech Maly

Excessive on-chip wire length and fast increasing fabrication cost have been the main factors impairing the effectiveness of monolithic system-on-chip. We investigate a die stacking based system integration strategy (2.5D system integration) to address these problems. The new scheme is design-tools-enabled rather than technology-driven. We developed a layout design framework, which is able to floorplan, place and route a VLSI design into stacked chips. Our results show that this new scheme has a potential to outperform its monolithic equivalent.

international conference on computer communications | 2011

Exploiting graphics processors for high-performance IP lookup in software routers

Jin Zhao; Xinya Zhang; Xin Wang; Yangdong Deng; Xiaoming Fu

As the physical link speeds grow and the size of routing table continues to increase, IP address lookup has been a challenging problem at routers. There have been growing demands in achieving high-performance IP lookup cost-effectively. Existing approaches typically resort to specialized hardwares, such as TCAM. While these approaches can take advantage of hardware parallelism to achieve high-performance IP lookup, they also have the disadvantage of high cost. This paper investigates a new way to build a cost-effective IP lookup scheme using graphics processor units (GPU). Our contribution here is to design a practical architecture for high-performance IP lookup engine with GPU, and to develop efficient algorithms for routing prefix update operations such as deletion, insertion, and modification. Leveraging GPUs many-core parallelism, the proposed schemes addressed the challenges in designing IP lookup at GPU-based software routers. Our experimental results on real-world route traces show promising gains in IP lookup and update operations.

ACM Transactions on Design Automation of Electronic Systems | 2011

Massively Parallel Logic Simulation with GPUs

Yuhao Zhu; Bo D. Wang; Yangdong Deng

In this article, we developed a massively parallel gate-level logical simulator to address the ever-increasing computing demand for VLSI verification. To the best of the authors’ knowledge, this work is the first one to leverage the power of modern GPUs to successfully unleash the massive parallelism of a conservative discrete event-driven algorithm, CMB algorithm. A novel data-parallel strategy is proposed to manipulate the fine-grain message passing mechanism required by the CMB protocol. To support robust and complete simulation for real VLSI designs, we establish both a memory paging mechanism and an adaptive issuing strategy to efficiently utilize the GPU memory with a limited capacity. A set of GPU architecture-specific optimizations are performed to further enhance the overall simulation performance. On average, our simulator outperforms a CPU baseline event-driven simulator by a factor of 47.4X. This work proves that the CMB algorithm can be efficiently and effectively deployed on modern GPUs without the performance overhead that had hindered its successful applications on previous parallel architectures.

design automation conference | 2010

Distributed time, conservative parallel logic simulation on GPUs

Bo D. Wang; Yuhao Zhu; Yangdong Deng

Logical simulation is the primary method to verify the correctness of IC designs. However, todays complex VLSI designs pose ever higher demand for the throughput of logic simulators. In this work, a parallel logic simulator was developed by leveraging the computing power of modern graphics processing units (GPUs). To expose more parallelism, we implemented a conservative parallel simulation approach, the CMB algorithm, on NVidia GPUs. The simulation processing is mapped to GPU hardware at the finest granularity. With carefully designed data structures and data flow organizations, our GPU based simulator could overcome many problems that hindered efficient implementations of the CMB algorithm on traditional parallel computers. In order to efficiently use the relatively limited capacity of GPU memory, a novel memory management mechanism was proposed to dynamically allocate and recycle GPU memory during simulation. We also introduced a CPU/GPU co-processing strategy for the best usage of computing resources. Experimental results showed that our GPU based simulator could outperform a CPU baseline event driven simulator by a factor of 29.2.

international conference on computer aided design | 2009

Layout-dependent STI stress analysis and stress-aware RF/analog circuit design optimization

Jiying Xue; Zuochang Ye; Yangdong Deng; Hongrui Wang; Liu Yang; Zhiping Yu

With the continuous shrinking of feature size, various effects due to shallow-trench-isolation (STI) stress are becoming more and more significant. The resulting nonuniform distribution of stress affects the MOSFET characteristics and hence changes the circuit behavior. This paper proposes a complete flow to characterize the influence of STI stress on performance of RF/analog circuits based on layout design and process information. An accurate and efficient FEM-based stress simulator has been developed to handle the layout dependence. A comprehensive MOSFET model is also proposed to capture the effects of STI stress on mobility, threshold voltage, and leakage current. The influence of layout-dependent STI stress on the circuit performance is further studied, and the corresponding optimization strategies to circuit design are discussed. A realistic PLL design realized using 90nm CMOS technology is used as a test case for the proposed approach.

Explore More