Tay-Jyi Lin
National Chung Cheng University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tay-Jyi Lin.
international symposium on vlsi design, automation and test | 2008
Tay-Jyi Lin; Chun-Nan Liu; Shau-Yin Tseng; Yuan-Hua Chu; An-Yeu Wu
The Industrial Technology Research Institute (ITRI) PAC (parallel architecture core) project was initiated in 2003. The target is to develop a low-power and high-performance programmable SoC platform for multimedia applications. In the first PAC project phase (2004-2006), a 5-way VLIW DSP (PACDSP) processor has been developed with our patented distributed & ping-pong register file and variable-length VLIW encoding techniques. A dual-core PAC SoC, which is composed of a PACDSP core and an ARM9 core, has also been designed and fabricated in the TSMC 0.13 mum technology to demonstrate its outstanding performance and energy efficiency for multimedia processing such as real-time H.264 codec. This paper summarizes the technical contents of PACDSP, DVFS (dynamic voltage and frequency scaling) -enabled PAC SoC, and the energy-aware multimedia codec. The research directions of our second-phase PAC project (PAC II), including multicore architectures, ESL (electronics system-level) technology, and low-power multimedia framework, are also addressed in this paper.
IEEE Transactions on Circuits and Systems | 2010
Yu-Ting Kuo; Tay-Jyi Lin; Yueh-Tai Li; Chih-Wei Liu
Due to well matching the frequency characteristics of human ears, ANSI S1.11 1/3-octave filter bank is popular in acoustic applications, such as acoustic analyzers and equalizers. It is also desirable in hearing aids because the famous hearing aid prescription formula, NAL-NL1, prescribes its gains at ANSI 1/3-octave frequencies. However, the high computation complexity limits its usage, in which the power consumption is a critical concern. To address this issue, a low-power design and implementation of ANSI S1.11 filter bank for digital hearing aids is present. We first develop the complexity-effective multirate FIR filter bank algorithm. And, a systematic coefficient design flow is elaborated for the proposed filter bank to minimize the order of the FIR filter thereof. In an 18-band digital hearing aid with 24-kHz sampling rate, the proposed algorithm saves about 96% of multiplications and additions, comparing that with a straightforward FIR filter bank. Moreover, various low-power VLSI design techniques are investigated in detail and applied on our design. The proposed complexity-effective ANSI S1.11 FIR filter bank has been implemented in the TSMC 0.13-μ m CMOS technology with an area-efficient architecture. The test chip consumes only 87 μW, which is 30%-79% of that of the others available in the literature. The proposed low-power ANSI 1/3-octave bank makes itself being able to precisely apply the prescribed gains obtained by NAL-NL1 prescription formula for hearing-impaired people.
signal processing systems | 2011
David Chih-Wei Chang; Tay-Jyi Lin; Chung-Ju Wu; Jenq Kuen Lee; Yuan-Hua Chu; An-Yeu Wu
In order to develop a low-power and high-performance SoC platform for multimedia applications, the Parallel Architecture Core (PAC) project was initiated in Taiwan in 2003. A VLIW digital signal processor (PACDSP) has been developed from a proprietary instruction set with multimedia-rich instructions, a complexity-effective microarchitecture with an innovative distributed & ping-pong register organization and variable-length VLIW encoding, to a highly-configurable soft IP with several successful silicon implementations. A complete toolchain with an optimizing C compiler has also been developed for PACDSP. A dual-core PAC SoC has been designed and fabricated, which consists of a PACDSP core, an ARM9 core, scratchpad memories, and various on-chip peripherals, to demonstrate the outstanding performance and energy efficiency for multimedia processing such as the real-time H.264 codec. The first part of the two introductory papers of PAC describes the hardware architecture of the PACDSP core, its software development tools, and the PAC SoC with dynamic voltage and frequency scaling (DVFS).
international conference on computer design | 2003
Tay-Jyi Lin; Chin-Chi Chang; Chen-Chia Lee; Chein-Wei Jen
The VLIW processors with static instruction scheduling and thus deterministic execution times are very suitable for high-performance real-time DSP applications. But the two major weaknesses in VLIW processors prevent the integration of more functional units (FU)for a higher instruction issuing rate & the dramatically growing complexity in the register file (RF), and the poor code density. We propose a novel ring-structure RF, which partitions the centralized RF into 2N subblocks with an explicit N-by-N switch network for N FU. Each subblock only requires access ports for a single FU. We also propose the hierarchical VLIW encoding with variable-length RISC-like instructions and NOP removal. The ring-structure RF saves 91.88% silicon area and reduces 77.35% access time of the centralized RF. Our simulation results show that the proposed instruction set architecture with the exposed ring-structure RF has comparable performance with the state-of-the-art DSP processors. Moreover, the hierarchical VLIW encoding can save 32%/spl sim/50% code sizes.
great lakes symposium on vlsi | 2005
Tay-Jyi Lin; Chie-Min Chao; Chia-Hsien Liu; Pi-Chen Hsiao; Shin-Kai Chen; Li-Chun Lin; Chih-Wei Liu; Chein-Wei Jen
This paper presents a unified processor core with two operation modes. The processor core works as a compiler-friendly MIPS-like core in the RISC mode, and it is a 4-way VLIW in its DSP mode, which has distributed and ping-pong register organization optimized for stream processing. To minimize hardware, the DSP mode has no control construct for program flow, while the data manipulation RISC instructions are executed in the DSP datapath. Moreover, the two operation modes can be changed instruction by instruction within a single program stream via the hierarchical instruction encoding, which also helps to reduce the VLIW code sizes significantly. The processor has been implemented in the UMC 0.18um CMOS technology, and its core size is 3.23mmx3.23mm including the 32KB on-chip memory. It can operate at 208MHz while consuming 380.6mW average power.
international soc design conference | 2009
Ching-Hsiang Chuang; Chiu-Ling Chen; Pi-Cheng Hsiao; Tay-Jyi Lin
Programming multicore application processors is a daunting task and component-based software development has already demonstrated the effectiveness to simplify it. However, component compositions are cumbersome, time-consuming and error-prone. This paper presents a graphical tool to mitigate the problem, which efficiently visualizes the design capture, simulation and debugging processes of TI DaVinci multicore platform based on Codec Engine and the xDAIS framework components. The experimental result shows our proposed tool introduces only 1% run-time overhead, which is neglectable for practical applications.
international soc design conference | 2008
Tien-Wei Hsieh; Pi-Chen Hsiao; Che-Yu Liao; Hsien-Ching Hsieh; Huang-Lun Lin; Tay-Jyi Lin; Yuan-Hua Chu; An-Yeu Wu
This paper describes the power/energy optimizations of an embedded VLIW digital signal processor (DSP) - PACDSP from the Industrial Technology Research Institute (ITRI). First, a configurable cache/scratchpad memory subsystem has been implemented, which saves the energy for tag matching in deeply embedded applications. An energy-effective cell-based design has been produced by analyzing the relationships between the energy efficiency and the synthesis constraints. Finally, dynamic voltage & frequency scaling (DVFS) and power gating have been applied to reduce both switching and leakage power dissipations with a novel common power format (CPF) flow. The test-chip will be fabricated in the TSMC 90 nm CMOS technology, of which the estimated power dissipations are 156.62 mW for 350 MHz @1 V and 46.60 mW for 230 MHz @0.7 V respectively.
signal processing systems | 2009
Shin-Kai Chen; Tay-Jyi Lin; Chih-Wei Liu
Object detection is an important function for intelligent multimedia processing, but its computational complexity prevented its pervasive uses in consumer electronics. Cost-effective & energy-efficient computations are now available with various innovative multicore architectures proposed for embedded systems. However, extensive software optimizations are needed to unravel the inherent parallelisms in object detection for multicore processing. This paper presents interleaved reordering and splitting of parallel tasks in object detection. Overall performance improvements by 10% & 19% have been measured for the proposed methods respectively on a face detection prototype implemented on Sony PlayStation 3.
international symposium on vlsi design, automation and test | 2005
Tay-Jyi Lin; Chen-Chia Lee; Chih-Wei Liu; Chein-Wei Jen
This paper presents a novel register organization for VLIW DSPs. The simulation results show the performance of a DSP with the proposed register file is comparable with state-of-the-art DSPs. However, the proposed register file can save 89.7% area of a conventional centralized one, while reducing its access time by 68.6%.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2012
Pei-Yao Chang; Tay-Jyi Lin; Jinn-Shyan Wang; Yen-Hsiang Yu
This brief presents a 4R/2W register file design for two-issue microprocessors with ultra-wide dynamic voltage scaling. A full-N separated read port has been proposed to save ~ 19% area and to improve 4.5 ~ 10.4 % performance of state-of-the-art 1P3N designs for subthreshold operations. In addition, a reconfigurable write scheme has been proposed to utilize the unused write port in the energy-efficient mode with single-issue execution for ~ 18% write noise margin improvement. A test chip has been designed and fabricated using the TSMC 65-nm GP process, of which a minimum operating voltage of 148 mV has been measured.