Charlie Chung-Ping Chen
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Charlie Chung-Ping Chen.
design automation conference | 2001
Tsung-Hao Chen; Charlie Chung-Ping Chen
In this paper, we propose preconditioned Krylov-subspace iterative methods to perform efficient DC and transient simulations for large-scale linear circuits with an emphasis on power delivery circuits. We also prove that a circuit with inductors can be simplified from MNA to NA format, and the matrix becomes an s.p.d matrix. This property makes it suitable for the conjugate gradient with incomplete Cholesky decomposition as the preconditioner, which is faster than other direct and iterative methods. Extensive experimental results on large-scale industrial power grid circuits show that our method is over 200 times faster for DC analysis and around 10 times faster for transient simulation compared to SPICE3. Furthermore, our algorithm reduces over 75% of memory usage than SPICE3 while the accuracy is not compromised.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2002
Ting-Yuan Wang; Charlie Chung-Ping Chen
Recent study shows that the nonuniform thermal distribution not only has an impact on the substrate but also interconnects. Hence, three-dimensional (3-D) thermal analysis is crucial to analyze these effects. In this paper, the authors present and develop an efficient 3-D transient thermal simulator based on the alternating direction implicit (ADI) method for temperature estimation in a 3-D environment. Their simulator, 3D Thermal-ADI, not only has a linear runtime and memory requirement, but also is unconditionally stable. Detailed analysis of the 3-D nonhomogeneous cases and boundary conditions for on-chip VLSI applications are introduced and presented. Extensive experimental results show that our algorithm is not only orders of magnitude faster than the traditional thermal simulation algorithms but also highly accurate and memory efficient. The temperature profile of steady state can also be reached in several iterations. This software will be released via the web for public usage.
international conference on computer aided design | 2005
Jeng-Liang Tsai; Lizheng Zhang; Charlie Chung-Ping Chen
Process variations cause significant timing uncertainty and yield degradation in deep sub-micron technologies. A solution to counter timing uncertainty is post-silicon clock tuning. Existing design approaches for post-silicon-tunable (PST) clock-tree synthesis usually insert a PST clock buffer for each flip-flop or put PST clock buffers across an entire level of a clock-tree. This can cause significant over-design and long tuning time. In this paper, we propose to insert PST clock buffers at both internal and leaf nodes of a clock-tree and use a bottom-up algorithm to reduce the number of candidate PST clock buffer locations. We then provide two statistical-timing-driven optimization algorithms to reduce the hardware cost of a PST clock-tree. Experimental results on ISCAS89 benchmark circuits show that our algorithms achieve up to a 90% area or a 90% number of tunable clock buffer reductions compared to existing design methods.
international conference on computer aided design | 2004
Jeng-Liang Tsai; Dong Hyun Baik; Charlie Chung-Ping Chen; Kewal K. Saluja
In deep sub-micron technologies, process variations can cause significant path delay and clock skew uncertainties thereby lead to timing failure and yield loss. In this paper, we propose a comprehensive clock scheduling methodology that improves timing and yield through both pre-silicon clock scheduling and post-silicon clock tuning. First, an optimal clock scheduling algorithm has been developed to allocate the slack for each path according to its timing uncertainty. To balance the skew that can be caused by process variations, programmable delay elements are inserted at the clock inputs of a small set of flip-flops on the timing critical paths. A delay-fault testing scheme combined with linear programming is used to identify and eliminate timing violations in the manufactured chips. Experimental results show that our methodology achieves substantial yield improvement over a traditional clock scheduling algorithm in many of the ISCAS89 benchmark circuits, and obtain an average yield improvement of 13.6%.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2005
Yu-Min Lee; Yahong Cao; Tsung Hao Chen; Janet Meiling Wang; Charlie Chung-Ping Chen
This paper proposes a general hierarchical analysis methodology, HiPRIME, to efficiently analyze RLKC power delivery systems. After partitioning the circuits into blocks, we develop and apply the IEKS (Improved Extended Krylov Subspace) method to build the multiport Norton equivalent circuits which transform all the internal sources to Norton current sources at ports. Since there are no active elements inside the Norton circuits, passive or realizable model order reduction techniques such as PRIMA can be applied. The significant speed improvement, 700 times faster than Spice with less than 0.2% error and 7 times faster than a state-of-the-art solver, InductWise, is observed. To further reduce the top-level hierarchy runtime, we develop a second-level model reduction algorithm and prove its passivity.
IEEE Transactions on Very Large Scale Integration Systems | 2003
Ting-Yuan Wang; Charlie Chung-Ping Chen
Due to the dramatic increase of clock frequency and integration density, power density and on-chip temperature in high-end very large scale integration (VLSI) circuits rise significantly. To ensure the timing correctness and the reliability of high-end VLSI design, efficient and accurate chip-level transient thermal simulations are of crucial importance. In this paper, we develop and present an efficient transient thermal-simulation algorithm based on the alternating-direction-implicit (ADI) method. Our algorithm, thermal-ADI, not only has a linear run time and memory requirement , but is also unconditionally stable, which ensures that time step is not limited by any stability requirement. Extensive experimental results show that our algorithm is not only orders of magnitude faster than the traditional thermal-simulation algorithms, but also highly accurate and efficient in memory usage.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2004
Jeng-Liang Tsai; Tsung-Hao Chen; Charlie Chung-Ping Chen
Clock distribution is crucial for timing and design convergence in high-performance very large scale integration designs. Minimum-delay/power zero skew buffer insertion/sizing and wire-sizing problems have long been considered intractable. In this paper, we present ClockTune , a simultaneous buffer insertion/sizing and wire-sizing algorithm which guarantees zero skew and minimizes delay and power in polynomial time. Extensive experimental results show that our algorithm executes very efficiently. For example, ClockTune achieves 45/spl times/ delay improvement for buffering and sizing an industrial clock tree with 3101 sink nodes on a 1.2-GHz Pentium IV PC in 16 min, compared with the initial routing. Our algorithm can also be used to achieve useful clock skew to facilitate timing convergence and to incrementally adjust the clock tree for design convergence and explore delay-power tradeoffs during design cycles. ClockTune is available on the web (http://vlsi.ece.wisc.edu/Tools.htm).
international conference on computer aided design | 2002
Tsung-Hao Chen; Clement Luk; Hyungsuk Kim; Charlie Chung-Ping Chen
We develop a robust, efficient, and accurate tool, which integrates inductance extraction and simulation, called INDUCTWISE. This paper advances the state-of-the-art inductance extraction and simulation techniques and contains two major parts. In the first part, INDUCTWISE extractor, we discover the recently proposed inductance matrix sparsification algorithm, the K-method[1], albeit its great benefits of efficiency, has a major flaw on the stability. We provide both a counter example and a remedy for it. A window section algorithm is also presented to preserve the accuracy of the sparsification method. The second part, INDUCTWISE simulator, demonstrates great efficiency of integrating the nodal analysis formulation with the improved K-method. Experimental results show that INDUCTWISE has over 250x speedup compared to SPICE3. The proposed sparsification algorithm accelerates the simulator another 175x and speeds up the extractor 23.4x within 0.1% of error. INDUCTWISE can extract and simulate an 118K-conductor RKC circuit within 18 minutes. It has been well tested and released on the web for public usage. (http://vlsi.ece.wisc.edu/Inductwise.htm)
design automation conference | 2002
Yahong Cao; Yu-Min Lee; Tsung-Hao Chen; Charlie Chung-Ping Chen
This paper proposes a general hierarchical analysis methodology, HiPRIME, to efficiently analyze RLKC power delivery systems. After partitioning the circuits into blocks, we develop and apply the IEKS (Improved Extended Krylov Subspace) method to build the multiport Norton equivalent circuits which transform all the internal sources to Norton current sources at ports. Since there are no active elements inside the Norton circuits, passive or realizable model order reduction techniques such as PRIMA can be applied. The significant speed improvement, 700 times faster than Spice with less than 0.2% error and 7 times faster than a state-of-the-art solver, InductWise, is observed. To further reduce the top-level hierarchy runtime, we develop a second-level model reduction algorithm and prove its passivity.
asia and south pacific design automation conference | 2005
Hsinwei Chou; Yu-Hao Wane; Charlie Chung-Ping Chen
Simultaneous gate-sizing with multiple V/sub t/ assignment for delay and power optimization is a complicated task in modern custom designs. In this work, we make the key contribution of a novel gate-sizing and multi-V/sub t/ assignment technique based on generalized Lagrangian relaxation. Experimental results show that our technique exhibits linear runtime and memory usage, and can effectively tune circuits with over 15,000 variables and 8,000 constraints in under 8 minutes (250/spl times/ faster than state-of-the-art optimization solvers).