Jianlei Yang
Beihang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jianlei Yang.
international conference on computer aided design | 2012
Jianlei Yang; Zuowei Li; Yici Cai; Qiang Zhou
Transient analysis is the most practical and effective approach for power grid validation, but which is very challengeable for large scale VLSI chips because it is really time consuming and requires large memory resources. In this paper we proposed a parallel transient simulation approach for efficient power grid analysis. Firstly we adopt symmetric formulation for NA equation of RLC power grid to reduce memory usage. Meanwhile, fast Cholesky factorization solver can be used to improve simulation efficiency. Secondly, we perform partition-based parallel transient simulation for naturally independent subnets without accuracy lost. Thirdly, we propose a composite simulation flow for efficient and practical transient analysis for industrial power grid. Finally, several industrial power grid benchmarks are evaluated on our approaches for high accurate transient simulation with extremely low memory consumption.
international conference on computer aided design | 2011
Jianlei Yang; Zuowei Li; Yici Cai; Qiang Zhou
As the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. In this paper, we propose a linear complexity simulator named PowerRush, which consists of an efficient SPICE Parser, a robust circuit Builder and a linear solver. The proposed solver is a pure algebraic method which can provide an optimal convergence without geometric information. It is implemented by Algebraic Multigrid Preconditioned Conjugate Gradient method, in which an aggregation based algebraic multigrid with K-Cycle acceleration is adopted as a preconditioner to improve the robustness of conjugate gradient iterative method. In multigrid scheme, double pairwise aggregation technique is applied to the matrix graph in coarsening procedure to ensure low setup cost and memory requirement. Further, a K-Cycle multigrid scheme is adopted to provide Krylov subspace acceleration at each level to guarantee optimal or near optimal convergence. Experimental results on real power grids have shown that PowerRush has a linear complexity in runtime cost and memory consumption. The DC analysis of a 60 Million nodes power grid can be solved by PowerRush for 0.01mV accuracy in 170 seconds with 21.89GB memory used.
international conference on computer aided design | 2011
Jianlei Yang; Yici Cai; Qiang Zhou; Jin Shi
Robust and efficient algorithms for power grid analysis are crucial for both VLSI design and optimization. Due to the increasing size of power grids IR drop analysis has become more computationally challenging both in runtime and memory consumption. This work presents a fast Poisson solver preconditioned method for unstructured power grid with unideal boundary conditions. In fact, by taking the advantage of analytical formulation of power grids this analytical preconditioner can be considered as sparse approximate inverse technique. By combining this analytical preconditioner with robust conjugate gradient method, we demonstrate that this approach is totally robust for extremely large scale power grid simulations. Experimental results have shown that iterations of our proposed method will hardly increase with grid size increasing once the pads density and the range of metal resistances value distribution have been decided. We demonstrated that this approach solves an unstructured power grid with 2.56M nodes in only 1/3 iterations of classical ICCG solver, and achieves almost 20X speedups over the classical ICCG solver on runtime.
ieee computer society annual symposium on vlsi | 2016
Chenchen Liu; Qing Yang; Bonan Yan; Jianlei Yang; Xiaocong Du; Weijie Zhu; Hao Jiang; Qing Wu; Mark Barnell; Hai Li
Matrix-vector multiplication, as a key computing operation, has been largely adopted in applications and hence greatly affects the execution efficiency. A common technique to enhance the performance of matrix-vector multiplication is increasing execution parallelism, which results in higher design cost. In recent years, new devices and structures have been widely investigated as alternative solutions. Among them, memristor crossbar demonstrates a great potential for its intrinsic support of matrix-vector multiplication, high integration density, and built-in parallel execution. However, the computation accuracy and speed of such designs are limited and constrained by the features of crossbar array and peripheral circuitry. In this work, we propose a new memristor crossbar based computing engine design by leveraging a current sensing scheme. High operation parallelism and therefore fast computation can be achieved by simultaneously supplying analog voltages into a memristor crossbar and directly detecting weighted currents through current amplifiers. The performance and effectiveness of the proposed design were examined through the implementation of a neural network for pattern recognition based on MNIST database. Compared to a prior reported design, ours increases the recognition accuracy 8.1% (to 94.6%).
great lakes symposium on vlsi | 2015
Bonan Yan; Zheng Li; Yaojun Zhang; Jianlei Yang; Hai Li; Weisheng Zhao; Pierre Chor-Fung Chia
As manufacture process scales down rapidly, the design of ternary content-addressable memory (TCAM) requiring high storage density, fast access speed and low power consumption becomes very challenging. In recent years, many novel TCAM designs have been inspired by the research on emerging nonvolatile memory technologies, such as magnetic tunneling junction (MTJ), phase change memory (PCM), and memristor. These designs store a data as the resistive variable of a nonvolatile device, which usually results in limited sensing margin and therefore constrains the searching speed of TCAM architecture severely. To further enhance the performance and robustness of TCAMs, we proposed two novel cell designs that utilize MTJs as data storage units - the symmetrical dual-N structure and the asymmetrical P-N scheme. In both designs, a body bias feedback circuit is integrated to enlarge the sensing margins. Compared with an existing MTJ-based TCAM structure, the tolerance in gate voltage variation of the symmetrical dua-N (asymmetrical P-N) scheme can significantly improve 59.5% (21.2%). The latency and the dynamic energy consumption in one searching operation at the word length of 256 bits are merely 590.35ps (97.89ps) and 65.05fJ/bit (36.85fJ/bit), not even mentioning that the use of nonvolatile MTJ devices avoids unnecessary leakage power consumption.
great lakes symposium on vlsi | 2011
Feifei Niu; Qiang Zhou; Hailong Yao; Yici Cai; Jianlei Yang; Chin Ngai Sze
Buered clock tree synthesis (CTS) is increasingly critical as VLSI technology continually scales down. Many researches have been done on this topic due to its key role in CTS, but current approaches either lack the obstacle-avoiding functionality or lead to large clock latency and/or skew. This paper presents a new obstacle-avoiding CTS approach with separate clock tree construction and buer insertion stages based on an integral view to explore the global optimization space. Aiming at skew optimization under constraints of slew and obstacles, our CTS approach features the clock tree construction stage with the obstacle-aware topology generation algorithm called OBB, balanced insertion of candidate buer positions, and a fast heuristic buer insertion algorithm. Experimental results show the eectiveness of our CTS approach with significantly improved skew and latency than [6] by 46% and 63% on average, and 15.3% reduction in skew than [5]. Our OBB heuristic obtains 36% improvement in skew than the classic balanced bipartition algorithm (BB) in [10].
great lakes symposium on vlsi | 2016
Xueyan Wang; Xiaotao Jia; Qiang Zhou; Yici Cai; Jianlei Yang; Mingze Gao; Gang Qu
Circuit obfuscation techniques have been proposed to conceal circuits functionality in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We believe that a good obfuscation method should have low design complexity and low performance overhead, yet, causing high RE attack complexity. However, existing obfuscation techniques do not meet all these requirements. In this paper, we propose a polynomial obfuscation scheme which leverages special designed multiplexers (MUXs) to replace judiciously selected logic gates. Candidate to-be-obfuscated logic gates are selected based on a novel gate classification method which utilizes IC topological structure information. We show that this scheme is resilient to all the known attacks, hence it is secure. Experiments are conducted on ISCAS 85/89 and MCNC benchmark suites to evaluate the performance overhead due to obfuscation.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016
Jianlei Yang; Peiyuan Wang; Yaojun Zhang; Yuanqing Cheng; Weisheng Zhao; Yiran Chen; Hai Helen Li
Spin-transfer torque magnetic random access memory (STT-MRAM) is a promising emerging memory technology due to its various advantageous features such as scalability, nonvolatility, density, endurance, and fast speed. However, the reliability of STT-MRAM is severely impacted by environmental disturbances because radiation strike on the access transistor could introduce potential write and read failures for 1T1MTJ cells. In this paper, a comprehensive approach is proposed to evaluate the radiation-induced soft errors spanning from device modeling to circuit level analysis. The simulation based on 3-D metal-oxide-semiconductor transistor modeling is first performed to capture the radiation-induced transient current pulse. Then a compact switching model of magnetic tunneling junction (MTJ) is developed to analyze the various mechanisms of STT-MRAM write failures. The probability of failure of 1T1MTJ is characterized and built as look-up-tables. This approach enables designers to consider the effect of different factors such as radiation strength, write current magnitude and duration time on soft error rate of STT-MRAM memory arrays. Meanwhile, comprehensive write and sense circuits are evaluated for bit error rate analysis under random radiation effects and transistors process variation, which is critical for performance optimization of practical STT-MRAM read and sense circuits.
ifip ieee international conference on very large scale integration | 2015
Zheng Li; Chenchen Liu; Yandan Wang; Bonan Yan; Chaofei Yang; Jianlei Yang; Hai Li
As technology advances, artificial intelligence becomes pervasive in society and ubiquitous in our lives, which stimulates the desire for embedded-everywhere and human-centric intelligent computation paradigm. However, conventional instruction-based computer architecture was designed for algorithmic and exact calculations. It is not suitable for handling the applications of machine learning and neural networks that usually involve a large sets of noisy and incomplete natural data. Instead, neuromorphic systems inspired by the working mechanism of human brains create promising potential. Neuromorphic systems possess a massively parallel architecture with closely coupled memory and computing. Moreover, through the sparse utilizations of hardware resources in time and space, extremely high power efficiency can be achieved. In recent years, the use of memristor technology in neuromorphic systems has attracted growing attention for its distinctive properties, such as nonvolatility, reconfigurability, and analog processing capability. In this paper, we summarize the research efforts in the development of memristor crossbar based neuromorphic design from the perspectives of device modeling, circuit, architecture, and design automation.
asia and south pacific design automation conference | 2014
Wei Zhao; Yici Cai; Jianlei Yang
Power grid integrity verification is critical for reliable chip design. Vectorless power grid verification provides a promising approach to evaluate the worst-case voltage fluctuations without the detailed information of circuit activities. Vectorless verification is usually required to solve numerous linear programming problems to obtain the worst-case voltage fluctuation throughout the grid, which is extremely time-consuming for large-scale verification. In this paper, a maximum voltage drop location estimation approach is proposed for efficient vectorless verification. The power grid nodes are grouped into disjoint subsets, and an estimation strategy is utilized to roughly locate the nodes which have the worst-case voltage drop in each group. Consequently, the verification problem size can be significantly reduced compared with accurate verification. Experimental results show that the proposed approach can achieve remarkable speedups with acceptable accuracy loss.