Der Wei Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Der Wei Yang is active.

Explore More

Publication

Featured researches published by Der Wei Yang.

asia pacific conference on circuits and systems | 2012

Face detection architecture design using hybrid skin color detection and cascade of classifiers

Der Wei Yang; Chun Wei Chen; Che Hao Chang; Yun Chen Chang; Ming-Der Shieh; Jonas Wang; Chia Cheng Lo

This paper presents an efficient face detection algorithm which results in low cost and high performance hardware implementation. Using the proposed scheme of hybrid skin color detection and cascade of classifiers, we successfully overcome (i) the poor false positive rate issue in the skin color based face detection and (ii) the huge computation complexity in the cascade of classifiers. A light variation tolerable skin color detection method and a scan-line based dynamic down-sampling architecture are also proposed to enhance the performance. Experimental results reveal that the proposed design has high detection accuracy with low hardware cost as compared with related work.

international symposium on circuits and systems | 2016

Fast model searching and combining for example learning-based super-resolution

Chun Wei Chen; Fang Kai Hsu; Der Wei Yang; Jonas Wang; Ming-Der Shieh

Single-image super-resolution is an important technique for high resolution display related applications. Example learning-based approaches can provide plenty of image details by using trained dataset. The regression based methods reduce the memory storage size by training mapping functions rather than using a huge dictionary. However, the speed of searching the nearest cluster for the desired mapping function is still the bottleneck of the system. This problem is getting critical when the number of mapping functions is increased. This work presents an operator denoted as local multi-gradient level pattern to fast yet effectively describe the patch local geometry for a cluster of patches. The corresponding cluster can then be quickly identified by a simple lookup table. Furthermore, the potential cluster misclassification problem, induced by adopting the simplified clustering feature, is relaxed by applying the proposed model combining scheme. Simulation results show that the proposed one can achieve about 8 times speedup with even higher SSIM as compared to the related k-mean based method.

international symposium on vlsi design, automation and test | 2015

Efficient highly-parallel turbo decoder for 3GPP LTE-Advanced

Jing Shiun Lin; Ming-Der Shieh; Chung Yen Liu; Der Wei Yang

Turbo codes have been widely adopted in latest wireless communication systems due to their excellent error correction capability. In 3GPP LTE-Advanced systems, a peak data rate of up to 1 Gbps should be satisfied. To meet this throughput requirement, several turbo decoding algorithms aimed at achieving highly parallel architecture have been investigated. However, the resulting hardware cost of turbo decoders is increased considerably with increasing parallelism. This paper presents a modified parallel-window decoding algorithm to reduce the warm-up computation ratio per each decoding window. In addition, a dual-mode computing schedule is proposed to support the requirement of various code rates and block lengths. Experimental results reveal that the proposed design, implemented in the TSMC 90-nm CMOS process, can achieve the highest throughput rate of 1.45 Gbps and improve the normalized area efficiency by about 24.53% compared to the existing 3GPP-LTE-Advanced turbo decoders.

IEEE Transactions on Circuits and Systems for Video Technology | 2015

Depth-Reliability-Based Stereo-Matching Algorithm and Its VLSI Architecture Design

Der Wei Yang; Li Chia Chu; Chun Wei Chen; Jonas Wang; Ming-Der Shieh

A low-complexity depth-reliability-based stereomatching algorithm and an efficient scanline memory-merging implementation scheme are proposed in this paper. The developed algorithm analyzes the accuracy of disparity results by using simple local window-based methods and preserves reliable information only. A bidirectional depth propagation flow is then adopted to fill the unreliable segments by using reliable information. Moreover, a set of predefined function-specific reliability variables are extracted to further improve depth quality in the occluded and smooth regions, which can reduce 39% bad pixels obtained by applying the basic 7 × 7 window-based matching. The proposed scanline memory-merging scheme along with data prefetching can lead to 32.7% savings on the scanline memory area and relax the requirements of external frame buffer size and bandwidth. Experimental results show that the implemented stereo-matching hardware has a gate count of 223 k including the scanline memory, and can achieve up to 70 frames/s for 480 × 540 resolution (2 × 2 downsampling of FullHD side-by-side 3-D format) with 56 disparity levels.

asia pacific conference on circuits and systems | 2016

Effective model construction for enhanced prediction in example-based super-resolution

Chun Wei Chen; Fang Kai Hsu; Der Wei Yang; Jonas Wang; Ming-Der Shieh

Single-image super-resolution is widely adopted for high resolution display related applications. Example learning-based approaches can provide plenty of image details by using trained dataset. Regression-based methods reduce the memory storage size by training mapping functions instead of using a huge dictionary. The reconstructed image quality can be further enhanced by combining various prediction results. This work presents an effective model reconstruction method for enhanced predictions. The desired model can be constructed offline when using the local multi-gradient level pattern as the clustering feature. Applying the proposed schemes can further improve the quality of reconstructed high resolution image while retaining almost the same time complexity as the original solution. Experimental results exhibit that the quality of reconstructed image using the proposed schemes is very close to that of Yangs work, but the proposed one can operate much faster than his solutions. Moreover, the space for storing mapping functions can be dramatically reduced by using the proposed model combining method.

international symposium on circuits and systems | 2015

High-quality texture compression using adaptive color grouping and selection algorithm

Chun Wei Chen; Ching Heng Su; Der Wei Yang; Jonas Wang; Chia Cheng Lo; Ming-Der Shieh

Texture compression is an important technique to reduce memory storage and increase rendering speed in graphic processing units (GPUs). To ensure fast decoding and retain random memory access characteristics, the industry standard DXTC compression finds an approximate line segment in color space for each 4×4 block. The idea has been extended to multiline solutions by segmenting the 4×4 block into several partitions and adopted by the new format BPTC. BPTC can provide the highest quality than other texture compression standards. However, the limited partition combinations used in BPTC would smooth the texture detail in some cases. This work introduces an adaptive color grouping and selection algorithm to relax the texture smoothing issue. Simulation results show that the proposed algorithm can improve the average PSNR by 0.78 dB for those blocks.

international symposium on vlsi design, automation and test | 2014

Low complexity stereo matching algorithm using adaptive sized square window

Der Wei Yang; Li Chia Chu; Chun Wei Chen; Jia Ming Gan; Jonas Wang; Ming-Der Shieh

This paper presents an efficient stereo matching algorithm for hardware implementation. A low-complexity local window size decision rule is proposed in our processing scheme. The proper window size is adopted according to the local content characteristic. The matching computation cost is almost as low as the conventional fixed window size method. An efficient hardware architecture comprises unified window size processing elements is also developed. Experimental results reveal that the proposed design can achieve better quality than the related fixed window size method with the properties of low computation cost and hardware realizable flow.

international symposium on circuits and systems | 2012

Efficient scissoring scheme for scanline-based rendering of 2D vector graphics

Wen Ching Lin; Jheng Hao Ye; Der Wei Yang; Si Yu Huang; Ming-Der Shieh; Jonas Wang

This work presents a look-up table-based (LUT-based) algorithm for scanline-based rendering of OpenVG. The proposed method can deal with arbitrary number of scissoring rectangles. The rasterization and scissoring in the proposed architecture can be performed concurrently to reduce rendering time. The scanline-size buffers used as scissoring LUTs result in low area overhead. Moreover, a linked list structure of scissoring rectangles is proposed in order that only the scissoring rectangles interacted with the processing scanline are accessed to increase bus efficiency and reduce power consumption. Implementation results based on TSMC 0.13-μm CMOS technology show that the proposed rasterization design with LUT-based scissoring can operate at 200 MHz with 77K gate counts. The proposed design can render 16.8 tiger images with 392×483 resolution per second assuming ideal bus latency. Compared to existing works, the proposed design achieves a smaller area and more functionality for higher display resolution with comparable throughput.

symposium on cloud computing | 2011

VLSI design of area-efficient memory access architectures for quasi-cyclic LDPC codes

Ming-Der Shieh; Shih Hao Fang; Shing Chung Tang; Der Wei Yang

This paper proposes an area-efficient memory access architecture that merges small memory blocks into memory groups to relax the effect of peripherals in small memory blocks. An efficient algorithm is also presented to handle the additional delay elements. The proposed LDPC decoder has the lowest area complexity among related studies.

asia pacific conference on circuits and systems | 2010

Efficient protocol converter generation for system integration

Der Wei Yang; Ming-Der Shieh; Wen Hsuen Kuo; Jonas Wang

Integrate intellectual properties (IPs) designed for different protocols is always a troublesome task for system integrators. In this paper, we explore efficient methods to generate protocol converters automatically under the consideration of system performance. For the frequency/phase mismatch, we proposed a modified asynchronous FIFO together with our protocol converter. The generated results are verified in Synopsis Verification IP (VIP) environment. The performance and cost of the resulted converter are as efficient as the manual one, ARM Prime Cell.

Explore More