Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Huazhong Yang is active.

Publication


Featured researches published by Huazhong Yang.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2011

An EScheduler-Based Data Dependence Analysis and Task Scheduling for Parallel Circuit Simulation

Xiaoming Chen; Wei Wu; Yu Wang; Hao Yu; Huazhong Yang

The sparse matrix solver has become the bottleneck in a Simulation Program with Integrated Circuit Emphasis circuit simulator. It is difficult to parallelize the sparse matrix solver because of the high data dependence during the numerical LU factorization. In this brief, a parallel LU factorization algorithm is developed on shared-memory computers with multicore central processing units, based on KLU algorithms. An Elimination Scheduler (EScheduler) is proposed to represent the data dependence during the LU factorization. Based on the EScheduler, the parallel tasks are scheduled in two modes to achieve a high level of concurrence, i.e., cluster mode and pipeline mode . The experimental results on 26 circuit matrices reveal that the developed algorithm can achieve speedup of 1.18-4.55× (on geometric average), as compared with KLU, with 1-8 threads. The result analysis indicates that for different data dependence, different parallel strategies should be dynamically selected to obtain optimal performance.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation

Xiaoming Chen; Yu Wang; Huazhong Yang

The sparse matrix solver has become a bottleneck in simulation program with integrated circuit emphasis (SPICE)-like circuit simulators. It is difficult to parallelize the solver because of the high data dependency during the numeric LU factorization and the irregular structure of circuit matrices. This paper proposes an adaptive sparse matrix solver called NICSLU, which uses a multithreaded parallel LU factorization algorithm on shared-memory computers with multicore/multisocket central processing units to accelerate circuit simulation. The solver can be used in all the SPICE-like circuit simulators. A simple method is proposed to predict whether a matrix is suitable for parallel factorization, such that each matrix can achieve optimal performance. The experimental results on 35 matrices reveal that NICSLU achieves speedups of 2.08× ~ 8.57×(on the geometric mean), compared with KLU, with 1-12 threads, for the matrices which are suitable for the parallel algorithm. NICSLU can be downloaded from http://nicslu.weebly.com.


IEEE Transactions on Parallel and Distributed Systems | 2015

GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling

Xiaoming Chen; Ling Ren; Yu Wang; Huazhong Yang

The sparse matrix solver by LU factorization is a serious bottleneck in Simulation Program with Integrated Circuit Emphasis (SPICE)-based circuit simulators. The state-of-the-art Graphics Processing Units (GPU) have numerous cores sharing the same memory, provide attractive memory bandwidth and compute capability, and support massive thread-level parallelism, so GPUs can potentially accelerate the sparse solver in circuit simulators. In this paper, an efficient GPU-based sparse solver for circuit problems is proposed. We develop a hybrid parallel LU factorization approach combining task-level and data-level parallelism on GPUs. Work partitioning, number of active thread groups, and memory access patterns are optimized based on the GPU architecture. Experiments show that the proposed LU factorization approach on NVIDIA GTX580 attains an average speedup of 7.02× (geometric mean) compared with sequential PARDISO, and 1.55× compared with 16-threaded PARDISO. We also investigate bottlenecks of the proposed approach by a parametric performance model. The performance of the sparse LU factorization on GPUs is constrained by the global memory bandwidth, so the performance can be further improved by future GPUs with larger memory bandwidth.


design automation conference | 2012

Sparse LU factorization for parallel circuit simulation on GPU

Ling Ren; Xiaoming Chen; Yu Wang; Chenxi Zhang; Huazhong Yang

Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency. The strong data-dependency determines that parallel sparse LU factorization runs efficiently on shared-memory computing devices. But the number of CPU cores sharing the same memory is often limited. The state of the art Graphic Processing Units (GPU) naturally have numerous cores sharing the device memory, and provide a possible solution to the problem. In this paper, we propose a GPU-based sparse LU solver for circuit simulation. We optimize the work partitioning, the number of active thread groups, and the memory access pattern, based on GPU architecture. On matrices whose factorization involves many floating-point operations, our GPU-based sparse LU factorization achieves 7.90× speedup over 1-core CPU and 1.49× speedup over 8-core CPU. We also analyze the scalability of parallel sparse LU factorization and investigate the specifications on CPUs and GPUs that most influence the performance.


asia and south pacific design automation conference | 2012

An adaptive LU factorization algorithm for parallel circuit simulation

Xiaoming Chen; Yu Wang; Huazhong Yang

Sparse matrix solver has become the bottleneck in SPICE simulator. It is difficult to parallelize the solver because of the high data-dependency during the numerical LU factorization. This paper proposes a parallel LU factorization (with partial pivoting) algorithm on shared-memory computers with multi-core CPUs, to accelerate circuit simulation. Since not every matrix is suitable for parallel algorithm, a predictive method is proposed to decide whether a matrix should use parallel or sequential algorithm. The experimental results on 35 circuit matrices reveal that the developed algorithm achieves speedups of 2.11×~8.38× (on geometric-average), compared with KLU, with 1~8 threads, on the matrices which are suitable for parallel algorithm. Our solver can be downloaded from http://nicslu.weebly.com.


applied reconfigurable computing | 2011

FPGA accelerated parallel sparse matrix factorization for circuit simulations

Wei Wu; Yi Shan; Xiaoming Chen; Yu Wang; Huazhong Yang

Sparse matrix factorization is a critical step for the circuit simulation problem, since it is time consuming and computed repeatedly in the flow of circuit simulation. To accelerate the factorization of sparse matrices, a parallel CPU+FPGA based architecture is proposed in this paper. While the preprocessing of the matrix is implemented on CPU, the parallelism of numeric factorization is explored by processing several columns of the sparse matrix simultaneously on a set of processing elements (PE) in FPGA. To cater for the requirements of circuit simulation, we also modified the Gilbert/Peierls (G/P) algorithm and considered the scalability of our architecture. Experimental results on circuit matrices from the University of Florida Sparse Matrix Collection show that our architecture achieves speedup of 0.5x-5.36x compared with the CPU KLU results.


design, automation, and test in europe | 2015

A fast parallel sparse solver for SPICE-based circuit simulators

Xiaoming Chen; Yu Wang; Huazhong Yang

The sparse solver is a serious bottleneck in SPICE-based circuit simulators. Although several existing researches have proposed some circuit simulation-oriented parallel solvers, there is still some room to improve the speed and scalability of these solvers. This paper proposes a fast parallel sparse solver based on a pivoting-reduction technique which takes full advantage of features of circuit simulation. Experimental results show that on average, the proposed solver is up to 50% faster than the state-of-the-art solver NICSLU, and up to 3.3× faster than KLU. Real DC simulation reveals that our solver is faster than NICSLU, PARDISO, and commercial solvers.


irregular applications: architectures and algorithms | 2013

Nonzero pattern analysis and memory access optimization in GPU-based sparse LU factorization for circuit simulation

Xiaoming Chen; Du Su; Yu Wang; Huazhong Yang

The sparse matrix solver is a critical component in circuit simulators. Some researches have developed GPU-based LU factorization approaches to accelerate the sparse solver. But the performance of these solvers is constrained by the irregularities of sparse matrices. This work investigates the nonzero patterns and memory access patterns in sparse LU factorization, and explores the common features to give guidelines on the improvements of the GPU solvers. We further propose a crisscross blocked implementation on GPUs. The proposed method attains average speedups of 1.68× compared with the unblocked method and 2.2× compared with 4-threaded PARDISO, for circuit matrices.


design, automation, and test in europe | 2016

Sparsity-oriented sparse solver design for circuit simulation

Xiaoming Chen; Lixue Xia; Yu Wang; Huazhong Yang

The sparse solver is a critical component in circuit simulators. The widely used solver KLU is based on a pure column-level algorithm. In this paper, we point out that KLU is not always the best algorithm for circuit matrices by experiments. We also demonstrate that the optimal algorithm strongly depends on the sparsity of the matrix. Two sparse LU factorization algorithms are proposed for extremely sparse matrices and dense matrices. A simple but effective strategy is proposed to select the optimal algorithm according to the sparsity. By combining the two new algorithms and the selection method together, the proposed solver achieves much higher performance than both KLU and PARDISO.


international parallel and distributed processing symposium | 2012

Parallel Circuit Simulation on Multi/Many-core Systems

Xiaoming Chen; Yu Wang; Huazhong Yang

SPICE is widely used for transistor-level circuit simulation. However, with the growing complexity of the VLSI at nano-scale, the traditional SPICE simulator has become inefficient to provide accurate verifications. This thesis tries to accelerate transistor-level simulation on multi/many-core systems, and we will solve 3 problems: 1) develop a parallel sparse LU factorization algorithm for circuit simulation, 2) implement the matrix solver on GPU to further accelerate the solver, 3) develop a circuit partitioning based parallel simulation approach on distributed machines to obtain better scalability. The experimental results show that the proposed parallel LU factorization algorithm effectively accelerates the matrix solver for circuit simulation on both CPU and GPU.

Collaboration


Dive into the Huazhong Yang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei Wu

University of California

View shared research outputs
Top Co-Authors

Avatar

Ling Ren

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Du Su

Tsinghua University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hao Yu

Nanyang Technological University

View shared research outputs
Researchain Logo
Decentralizing Knowledge