Gaoming Du
Hefei University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gaoming Du.
international conference on solid state and integrated circuits technology | 2006
Yi Xu; Li Li; Ming-lun Gao; Bing Zhang; Zhao-yu Jiang; Gaoming Du; Wei Zhang
As technology scales towards deep submicron, the integration of more than one processor on a chip is becoming possible. The communication architecture is becoming the bottleneck for multi-processor SoC, and efficient arbiter is able to solve the contentions due to simultaneous request accesses in shared bus systems to prevent system performance degradation. This paper presents a simple adaptive dynamic arbiter that can adjust the bandwidth proportion assigned to every processor automatically to avoid starvation problem in multi-processor SoC environment. The simulation results show that the proposed one could reduce 68% task execution time, decrease the bus request latency of a processor by 78% and provide better control of the communication bandwidth allocated to individual processor than convention arbiters
international conference on asic | 2007
Wen-Ting Zhang; Luo-Feng Geng; Duo-Li Zhang; Gaoming Du; Ming-Lun Gao; Wei Zhang; Ning Hou; Yi-hua Tang
To achieve a balance between high performance and energy efficiency, embedded systems often use heterogeneous multiprocessor platforms which tuned for a well defined application domain. Meanwhile FPGA is known for providing designers with several benefits in system design. One most important is high programmability and low risks. In this paper we demonstrate the design of an FPGA-based heterogeneous multiprocessor system integrating 4 Nios II soft cores and 1 ARM core. ARM core is the central controller of the whole system, and 4 Nios II cores are served as slaves, which are commanded by ARM core and responsible for processing regular and quantity data. ARM core and Nios II cores cooperate and work in parallel to accomplish each task. FPGA utilization of current implementation is 13% requiring 19,593 ALUTs on Altera Stratix II EP2S180.
international conference on anti counterfeiting security and identification | 2009
Fu-ming Xiao; Dong-sheng Li; Gaoming Du; Yukun Song; Duoli Zhang; Ming-Lun Gao
While the computational core is becoming faster and faster, the communication efficiency between the processors has become a bottleneck which limits the performance of multiprocessor system-on-chip (MPSoC). This paper focuses on design and implementation of AXI bus protocol-based MPSoC architecture. Firstly, the RTL models of 4 NIOS II processors using AXI communication architecture are developed. Then the MPSoC was implemented in Altera Stratix II EP2S180 FPGA. Lastly, the performance was evaluated using matrix operation benchmark and compared with previous in-house designed architecture. Experiments showed that the proposed prototype could run at 100 MHz requiring 8963 Adaptive Look-up Table (ALUTs) and the maxim speedup ratio can be up to 3.81, and performs better than the traditional bus (AHB bus) and 2-D mesh NoC architecture.
international conference on anti-counterfeiting, security, and identification | 2008
Fuhui Du; Gaoming Du; Yukun Song; Duoli Zhang; Ming-Lun Gao
MPEG-2 AAC is the widely used audio standard and getting more popular for commercial use. In the AAC, the filterbank tool, which is composed of IMDCT, windowing and overlap-add, has the highest computation complexity. In this three steps, IMDCT is the important component. Hence, most published filterbank algorithms focus mainly on the implementation of IMDCT but overlook the relevancy between the steps. This paper proposed a novel architecture of filterbank tool and its hardware implementation. A fast algorithm for IMDCT which contains pre-IFFT, N/4-point IFFT and post-IFFT is employed. In order to improve the efficiency of memory access, windowing and overlap-add operation are combined with post-IFFT, which means no storage elements are required between them and results of post-IFFT will perform windowing and overlap-adding directly. This proposed architecture contains three hardware modules and further improvements are made to each module as well. Totally, 4 multipliers are shared by them in different time. Each module reads data continually from RAM, just like pipeline operation. As a result, this new architecture can improve the memory access efficiency with a speedup of 75% in computation time over the unoptimized one.
international conference on anti counterfeiting security and identification | 2009
Luo-Feng Geng; Duoli Zhang; Ming-Lun Gao; Ying-Chun Chen; Gaoming Du
The Multiprocessor System-on-Chip (MPSoC) is a promising solution for future complex computer and embedded systems. And, the Network-on-Chip (NoC) has been proposed as the future on-chip interconnection. Whereas, the NoCs bring more challenge on parallel programming and synchronization of different processor cores. This paper proposes a new cluster-based homogeneous MPSoC architecture, which adopts the hybrid interconnection composed of both bus-based and NoC architecture. This architecture has been implemented as a prototype by FPGA device, which integrates 17 processor cores. The performances of this prototype are evaluated under two real applications, matrix chain multiplication and JPEG picture decoding. The speedup ratio of this prototype is up to 15.850.
international conference on anti counterfeiting security and identification | 2009
Ning Hou; Duoli Zhang; Gaoming Du; Yukun Song; Haihua Wen
New tendencies envisage multiprocessor systems-onchips (MPSoCs) as a promising solution for the high performance Embedding System. And the key challenge is how to improve the communication efficiency. Network on Chip (NoC) has been considered as a new paradigm in the next generation communication architecture for its extensibility and power efficiency. The router is the fundamental unit of NoC. In this paper, a NoC prototype which consists of 6 ARM compatible cores and a router-based on-chip network is designed, and implements on a FPGA device. Different from the prototypes which we formerly designed, this prototype comprises more cores, and virtual-channel routers instead of basic routers. Specially, to evaluate the network performance, we present a run-time network monitor system, which can monitor the performance of on-chip network by calculating the performance parameters, such as average latency and throughput. The experimental results show that this prototype with 2×3 virtualchannel routers has less average latency than the former basic router prototype, and improves the throughput by up to 62%. Furthermore, JPEG decoding application is applied on this prototype, which steadily works at 50MHZ. And the decoding speed of system is very fast because of 2 decoding lane.
international conference on asic | 2009
Junqiao Huang; Gaoming Du; Duoli Zhang; Yukun Song; Luo-Feng Geng; Ming-Lun Gao
A VLSI design of complex Quadrature Mirror Filterbank (QMF) for MPEG-4 High Efficiency Advanced Audio Coding (MPEG-4 HE-AAC) decoder using resource-sharing technique is proposed. The algorithm that uses conventional discrete cosine transform of type IV(DCT-IV) to optimize complex-QMF is derived in this paper. By using the proposed algorithm, the VLSI design of complex valued analysis quadrature mirror filterbank (complex-AQMF) and synthesis quadrature mirror filterbank (complex-SQMF) can improve resource efficiently by sharing the same DCT module. Experiment results show that the computational complexity of the complex-QMF can be reduced up to 8.59%, the VLSI architecture of the proposed algorithm can save about 53% of area and 50% memory due to the shared resources of DCT-IV.
international conference on solid state and integrated circuits technology | 2006
Wei Zhang; Gaoming Du; Yi Xu; Ming-lun Gao; Luo-Feng Geng; Bing Zhang; Zhao-yu Jiang; Ning Hou; Yi-hua Tang
The increasing system resources available on field-programmable gate arrays (FPGA) enable the integration of complex system on one programmable chip. This paper focuses on the design and implementation of a hierarchy-bus based multi-processor system-on-chip (MPSoC) integrating 4 ARM processors on FPGA. Experimental results had been obtained running at 60MHz with total area requiring 34% adaptive look-up tables (ALUTs) of Altera Stratix II EP2S180 and a maxim performance speedup of 3.2
international conference on anti-counterfeiting, security, and identification | 2010
Chunhua Chen; Gaoming Du; Duoli Zhang; Yukun Song; Ning Hou
Inter-Processor communication synchronization in multi-processor system-on-chip (MPSoC) is one of the key factors for the whole chip performance. It cannot only affect the efficiency of task-level parallelism, but also has high dependency on MPSoC hardware architecture. Two synchronization mechanisms, i.e. mailbox and packet switching, are studied and analyzed in Network on chip based MPSoC. At first, the two schemes are implemented and verified in stand-alone mode, analyzed with communication latency, communication bandwidth and resource utilization. Furthermore, the two schemes are analyzed in MPSoC prototype environment that runs real-time fade-in fade-out video processing. Experimental results show that the mailbox based synchronization scheme has low latency and low resource overhead, but it is not feasible for large number of clusters due to the physical limitation. Although the packet based scheme has more latency, it has more scalability and feasibility.
pacific-asia workshop on computational intelligence and industrial application | 2008
Gaoming Du; Duoli Zhang; Yukun Song; Ming-Lun Gao; Luo-Feng Geng; Ning Hou
With the development of IC technology and the increasing processing power requirement, more and more processing cores are being integrated into one single chip. One of the key problems is the communication efficiency between the processing cores, and network on chip (NoC) has been proposed as prospect architecture. In this paper, scalability issue of 2-D mesh based NoC is analyzed. First, a mesh based NoC router using XY routing algorithm is designed and implemented in FPGA prototype. Second, 2*2 and 3*3 NoCs are constructed using the above router module, with each router connected to a processing core via the resource network interface (RNI). At last, pipelined matrixes multiplications and FFT are executed to evaluate the 2-D mesh based NoC performance, together with the router area overhead in the case of increasing processing nodes numbers. Experiments showed that 2-D mesh based NoC architecture is easy scalable in increasing processing nodes numbers with small resource overhead.