Ming-Lun Gao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ming-Lun Gao is active.

Explore More

Publication

Featured researches published by Ming-Lun Gao.

international conference on asic | 2007

Design of heterogeneous MPSoC on FPGA

Wen-Ting Zhang; Luo-Feng Geng; Duo-Li Zhang; Gaoming Du; Ming-Lun Gao; Wei Zhang; Ning Hou; Yi-hua Tang

To achieve a balance between high performance and energy efficiency, embedded systems often use heterogeneous multiprocessor platforms which tuned for a well defined application domain. Meanwhile FPGA is known for providing designers with several benefits in system design. One most important is high programmability and low risks. In this paper we demonstrate the design of an FPGA-based heterogeneous multiprocessor system integrating 4 Nios II soft cores and 1 ARM core. ARM core is the central controller of the whole system, and 4 Nios II cores are served as slaves, which are commanded by ARM core and responsible for processing regular and quantity data. ARM core and Nios II cores cooperate and work in parallel to accomplish each task. FPGA utilization of current implementation is 13% requiring 19,593 ALUTs on Altera Stratix II EP2S180.

international conference on anti counterfeiting security and identification | 2009

Design of AXI bus based MPSoC on FPGA

Fu-ming Xiao; Dong-sheng Li; Gaoming Du; Yukun Song; Duoli Zhang; Ming-Lun Gao

While the computational core is becoming faster and faster, the communication efficiency between the processors has become a bottleneck which limits the performance of multiprocessor system-on-chip (MPSoC). This paper focuses on design and implementation of AXI bus protocol-based MPSoC architecture. Firstly, the RTL models of 4 NIOS II processors using AXI communication architecture are developed. Then the MPSoC was implemented in Altera Stratix II EP2S180 FPGA. Lastly, the performance was evaluated using matrix operation benchmark and compared with previous in-house designed architecture. Experiments showed that the proposed prototype could run at 100 MHz requiring 8963 Adaptive Look-up Table (ALUTs) and the maxim speedup ratio can be up to 3.81, and performs better than the traditional bus (AHB bus) and 2-D mesh NoC architecture.

international conference on anti-counterfeiting, security, and identification | 2008

An implementation of filterbank for MPEG-2 AAC on FPGA

Fuhui Du; Gaoming Du; Yukun Song; Duoli Zhang; Ming-Lun Gao

MPEG-2 AAC is the widely used audio standard and getting more popular for commercial use. In the AAC, the filterbank tool, which is composed of IMDCT, windowing and overlap-add, has the highest computation complexity. In this three steps, IMDCT is the important component. Hence, most published filterbank algorithms focus mainly on the implementation of IMDCT but overlook the relevancy between the steps. This paper proposed a novel architecture of filterbank tool and its hardware implementation. A fast algorithm for IMDCT which contains pre-IFFT, N/4-point IFFT and post-IFFT is employed. In order to improve the efficiency of memory access, windowing and overlap-add operation are combined with post-IFFT, which means no storage elements are required between them and results of post-IFFT will perform windowing and overlap-adding directly. This proposed architecture contains three hardware modules and further improvements are made to each module as well. Totally, 4 multipliers are shared by them in different time. Each module reads data continually from RAM, just like pipeline operation. As a result, this new architecture can improve the memory access efficiency with a speedup of 75% in computation time over the unoptimized one.

international conference on anti counterfeiting security and identification | 2009

Prototype design of cluster-based homogeneous Multiprocessor System-on-Chip

Luo-Feng Geng; Duoli Zhang; Ming-Lun Gao; Ying-Chun Chen; Gaoming Du

The Multiprocessor System-on-Chip (MPSoC) is a promising solution for future complex computer and embedded systems. And, the Network-on-Chip (NoC) has been proposed as the future on-chip interconnection. Whereas, the NoCs bring more challenge on parallel programming and synchronization of different processor cores. This paper proposes a new cluster-based homogeneous MPSoC architecture, which adopts the hybrid interconnection composed of both bus-based and NoC architecture. This architecture has been implemented as a prototype by FPGA device, which integrates 17 processor cores. The performances of this prototype are evaluated under two real applications, matrix chain multiplication and JPEG picture decoding. The speedup ratio of this prototype is up to 15.850.

international conference on asic | 2009

VLSI design of resource shared complex-QMF bank for HE-AAC decoder

Junqiao Huang; Gaoming Du; Duoli Zhang; Yukun Song; Luo-Feng Geng; Ming-Lun Gao

A VLSI design of complex Quadrature Mirror Filterbank (QMF) for MPEG-4 High Efficiency Advanced Audio Coding (MPEG-4 HE-AAC) decoder using resource-sharing technique is proposed. The algorithm that uses conventional discrete cosine transform of type IV(DCT-IV) to optimize complex-QMF is derived in this paper. By using the proposed algorithm, the VLSI design of complex valued analysis quadrature mirror filterbank (complex-AQMF) and synthesis quadrature mirror filterbank (complex-SQMF) can improve resource efficiently by sharing the same DCT module. Experiment results show that the computational complexity of the complex-QMF can be reduced up to 8.59%, the VLSI architecture of the proposed algorithm can save about 53% of area and 50% memory due to the shared resources of DCT-IV.

international conference on anti-counterfeiting, security, and identification | 2008

Current switch driver and current source designs for high-speed current-steering DAC

Fang-Jie Luo; Yongsheng Yin; Shang-Quan Liang; Ming-Lun Gao

Based on analyzing of the influence of the current switch driver on dynamic performance of the high-speed current-steering DAC, several key points of designing the current switch driver are proposed. The low cross-point method, synchronous flip-latch and limited swing of switch driver are introduced, and a current switch driver circuit is proposed. In order to further improve the dynamic performance of the DAC, this paper presents a high output impedance current source circuit. A gain stage is utilized in the biasing circuit of the current source, and the output impedance of the proposed current reaches 108 Omega, which is important to fulfill the performance requirements of the DAC. A 14-bit high-speed DAC is designed using the currentswitch driver and current source under a 0.35 mum CMOS process. When the frequency of the full-range input signal is 24.6 MHZ and the sample frequency is 140 MSPS, the SFDR of the DAC achieves 78.2 dB.And the settling time is about 10 ns.

international conference on image analysis and signal processing | 2011

Dual-ADC based digital calibration of timing skew for a time-interleaved ADC

Rui Zhang; Yongsheng Yin; Jun Yang; Ming-Lun Gao

The performance of time-interleaved analog-to-digital converters (TIADCs) is seriously restricted by the mismatch of the timing skew between ADC channels. The concept of dual-ADC based calibration is that setting two ADC to sample the same input signal synchronously, and the differences of the two outputs are used in calibration algorithm to estimate and compensate the errors in each ADC. Based on this method, we propose a novel 7-channel TIADC with digital background calibration, which focuses on calibrating the timing skew of each ADC channel. The calibration algorithm is based on the least-mean-square iteration. Simulation of the designed 14-bit 7-channel TIADC with MATLAB shows that, with ± 0.02Ts timing skew, and normalized input frequency fin/fs=0.05, signal to noise and distortion ratio and spurious free dynamic range of the output signal of the TIADC after calibration reach 85.9dBc and 103dBc, and improve 28dBc and 43dBc, respectively, compared to the uncalibrated output signal.

pacific-asia workshop on computational intelligence and industrial application | 2008

Scalability Study on Mesh Based Network on Chip

Gaoming Du; Duoli Zhang; Yukun Song; Ming-Lun Gao; Luo-Feng Geng; Ning Hou

With the development of IC technology and the increasing processing power requirement, more and more processing cores are being integrated into one single chip. One of the key problems is the communication efficiency between the processing cores, and network on chip (NoC) has been proposed as prospect architecture. In this paper, scalability issue of 2-D mesh based NoC is analyzed. First, a mesh based NoC router using XY routing algorithm is designed and implemented in FPGA prototype. Second, 2*2 and 3*3 NoCs are constructed using the above router module, with each router connected to a processing core via the resource network interface (RNI). At last, pipelined matrixes multiplications and FFT are executed to evaluate the 2-D mesh based NoC performance, together with the router area overhead in the case of increasing processing nodes numbers. Experiments showed that 2-D mesh based NoC architecture is easy scalable in increasing processing nodes numbers with small resource overhead.

international conference on anti-counterfeiting, security, and identification | 2008

Fractional-pel motion compensation interpolation architecture based on parallel FIR systolic arrays for H.264/AVC

Liang Ma; Gaoming Du; Duoli Zhang; Yukun Song; Luo-Feng Geng; Ming-Lun Gao

A new architecture based on parallel FIR systolic arrays for motion compensation interpolation in H.264/AVC is presented in this paper. Unlike other interpolation architectures based on traditional adder tree or one systolic FIR, this design has advantages of both the pipeline property of systolic FIR filter and high parallel property. It has following characteristics: First, it uses several strategies to reduce the number of memory access. For example, the design fully uses the recursive relation between the fractional-pel samples, the appropriate interpolation orders for different situations are adopted, and two buffers are designed for storing immediate values. Second, it can increase the system clock frequency by using the systolic FIR filter to replace the traditional adder tree. Third, it can enhance the interpolation throughput by generating four fractional-pel samples in parallel. Fourth, it doesnpsilat need high memory bandwidth and can work under different bus-width by changing the number of systolic FIR filters. The design is synthesized with synopsys design compiler by using TSMC 0.18 um standard cell CMOS technology. The synthesis result shows that this architecture can achieve 230 MHz and meet the need for interpolation of the H.264 decoder for SDTV or HDTV.

international conference on anti counterfeiting security and identification | 2009

Design of an on-line configurable traffic generator for NoC

Haihua Wen; Gaoming Du; Duoli Zhang; Luo-Feng Geng; Ming-Lun Gao; Ying-Chun Chen

Performance evaluation for Network on Chip (NoC) is still a challenging problem. This paper presents the design of an on-line configurable traffic generator (OCTG) that provides a fast and effective traffic generation environment for evaluating the communication performance of Network-on-Chip (NoC). The novelty of the proposed OCTG architecture lies in the fact that it is different from just having some configurable parameters as the conventional design in order to improve its flexibility but it holds out on-line configuration. Parameters are transferred to the configuration engine through JTAG interface, then the configuration engine creates configuration signals to OCTGs to perform online configuration. The OCTG comprises two traffic modes: broadcast transmission (BT) and node to node transmission (NTNT). The OCTG can restart communication immediately without any other operations after completing configuration even when the communication transaction is running. Some communication traffic modes can be exactly emulated by the OCTG, so we can evaluate the NoC communication architecture in different traffic modes or compare the NoC performance with different architectures. Experiments showed that NoC performance with the same architecture in NTNT (node (i, j) to node (j, i)) is better than that in BT. And the XY routing is better than that with odd_even router in BT when the injection rate is more than 0.2. But when the injection rate is less than 0.2, the later is better than the former only in average packet latency.

Explore More