Zhang Shengbing
Northwestern University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhang Shengbing.
international conference on communications circuits and systems | 2002
Sun Huajin; Gao Deyuan; Zhang Shengbing; Wang Danghui
As a classical scheduling algorithm, the round robin scheduling algorithm is as widely used at present as it was in the past. A new FPGA-based implementation method is presented in this paper. After considering the FPGA structural characteristics and requirements of the system, a method using a pipelined priority encoder (PPE) and a barrel shifter (BS) is implemented effectively in an FPGA, and the performance of the PPE and BS is evaluated. The test results of the system show that the algorithm implementation is successful and fulfills the system requirements. At the same time, the method is also useful in other cases in which the round robin is applied.
international forum on information technology and applications | 2010
Lili Zhao; Zhang Shengbing; Zhang Meng; Zhang Yi
The Fast Fourier Transform (FFT), which charactered in memory-access-intensive, follows a divide-and-conquer strategy, is one of the most important and heavily used kernel in scientific computing. The newest generation of Graphics Processor Units (GPUs) implement a stream architecture besides acting as powerful massively parallel coprocessor. Fouthermore, the intruduction of APIs for general-purpose computation on GPUs mades GPUs an attractive choice for high-performance numerical and scientific computing. In this work we deal with the implementation of the FFT on a novel NVIDIA GPU, using the CUDA programming model. By optimizing the organiztion of signal data, exploiting the memory hierairchy, and associating the stream to different operations, we efficiently overlap kernel execution and data transfer. Our results indicate a significant performance improvement over GPU-based and CPU-based FFT algorithms. The speedup is 18 percent higher than the original GPU-based on average.
Journal of Semiconductors | 2013
Wang Shaoxi; Wang Mingxin; Fan Xiaoya; Zhang Shengbing; Han Ru
After analyzing the multivariate Cpm method (Chan et al. 1991), this paper presents a spatial multivariate process capability index (PCI) method, which can solve a multivariate off-centered case and may provide references for assuring and improving process quality level while achieving an overall evaluation of process quality. Examples for calculating multivariate PCI are given and the experimental results show that the systematic method presented is effective and actual.
international conference on information science and control engineering | 2015
Liu Fang; Zhang Shengbing; Zhao Lei; Zhang Meng
A system-level architecture simulator can be used as a virtual target machine, which can achieve the functional and detail simulation of a system composed of a processor, a system memory, caches, and external devices. Perfecting technology can improve the overall performance of the system by reducing pipeline stalls according to the temporal and spatial locality. This article is based on the characteristics of multi-core processors to study the design method of architectural simulators. We use an architecture simulator to evaluate the requirements of memory system based on different architecture and achieve two prefetchers, as well as test the prefetchers impact on bandwidth.
international computer science and engineering conference | 2013
Liu Fang; Zhang Shengbing; Ren Meng; Zhang Meng
The optimization of memory latency is always an important bottleneck to improving the performance of computer systems. The memory system, especially the last-level cache (LLC) as the important method to solve the “Memory Wall” problem, its management has become a key factors of influencing the performance of processor. And prefetching technology can improve the overall performance of the system by reducing pipeline stalls according to the temporal and spatial locality. This article is based on the characteristics of different workloads to study the performance of state-of-art LLC management policies with prediction technology. We achieve Bimodal Insertion Policy (BIP) which can adapt to changes in the working set. In order to further reduce the cache miss rate, we use the Set Dueling mechanism to dynamically choose the best replacement policy between Static Re-Reference Interval Policy (SRRIP) and Bimodal Re-Reference Interval Policy (BRRIP) based on the historical information [13]. We take SPLASH-2 as the benchmark to test the performance of these replacement policies. Finally we give a summary on the characteristics of different kinds of policies.
international conference on digital manufacturing & automation | 2012
Wang Mingxin; Wang Shaoxi; Zhang Shengbing; Fan Xiaoya
Process capability ultimately decides process quality level. Based on analyzing process capability index (PCI), process capability may be effectively assured. For the multivariate manufacturing processes, tremendous difficulties are often encountered when one attempts to measure the process capability by directly extending the univariate approach. The paper presents a modify spatial multivariate PCI method, which can solve multivariate off-centered case and may provide references for assuring and improving process quality level while achieving overall evaluation of process quality. At last, examples for calculating multivariate PCI are given and the experimental results show that the systematic method presented is effective and actual.
international symposium on industrial embedded systems | 2007
Wang Jing; Zhang Shengbing; Zhang Meng
This paper describes the testable design and fault coverage analysis for a 32-bit high performance embedded microprocessor which is compatible with PowerPC750 ISA. In the test structure, the TAP controller do on-chip standard boundary scan test, full scan test for core logic and MBIST test for embedded memories. In full scan test, all scan registers are organized to 32 scan chains, the area overhead is less than 3%, the critical path is less than 4ns and meets timing requirement. The fault coverage analysis and chip test result demonstrate that the test strategy and test structure are effective for our embedded processor.
multimedia and ubiquitous engineering | 2008
Tian Hangpei; Gao Deyuan; Wang Deli; Zhu Yian; Zhang Shengbing; Wang Jing
Archive | 2017
Ma Yanzhao; Wang Danghui; Zhang Shengbing; Fan Xiao-ya
Archive | 2017
Ma Yanzhao; Wang Danghui; Zhang Shengbing; Fan Xiao-ya