Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tay-Jyi Lin is active.

Publication


Featured researches published by Tay-Jyi Lin.


international symposium on vlsi design, automation and test | 2008

Overview of ITRI PAC project - from VLIW DSP processor to multicore computing platform

Tay-Jyi Lin; Chun-Nan Liu; Shau-Yin Tseng; Yuan-Hua Chu; An-Yeu Wu

The Industrial Technology Research Institute (ITRI) PAC (parallel architecture core) project was initiated in 2003. The target is to develop a low-power and high-performance programmable SoC platform for multimedia applications. In the first PAC project phase (2004-2006), a 5-way VLIW DSP (PACDSP) processor has been developed with our patented distributed & ping-pong register file and variable-length VLIW encoding techniques. A dual-core PAC SoC, which is composed of a PACDSP core and an ARM9 core, has also been designed and fabricated in the TSMC 0.13 mum technology to demonstrate its outstanding performance and energy efficiency for multimedia processing such as real-time H.264 codec. This paper summarizes the technical contents of PACDSP, DVFS (dynamic voltage and frequency scaling) -enabled PAC SoC, and the energy-aware multimedia codec. The research directions of our second-phase PAC project (PAC II), including multicore architectures, ESL (electronics system-level) technology, and low-power multimedia framework, are also addressed in this paper.


IEEE Transactions on Circuits and Systems | 2010

Design and Implementation of Low-Power ANSI S1.11 Filter Bank for Digital Hearing Aids

Yu-Ting Kuo; Tay-Jyi Lin; Yueh-Tai Li; Chih-Wei Liu

Due to well matching the frequency characteristics of human ears, ANSI S1.11 1/3-octave filter bank is popular in acoustic applications, such as acoustic analyzers and equalizers. It is also desirable in hearing aids because the famous hearing aid prescription formula, NAL-NL1, prescribes its gains at ANSI 1/3-octave frequencies. However, the high computation complexity limits its usage, in which the power consumption is a critical concern. To address this issue, a low-power design and implementation of ANSI S1.11 filter bank for digital hearing aids is present. We first develop the complexity-effective multirate FIR filter bank algorithm. And, a systematic coefficient design flow is elaborated for the proposed filter bank to minimize the order of the FIR filter thereof. In an 18-band digital hearing aid with 24-kHz sampling rate, the proposed algorithm saves about 96% of multiplications and additions, comparing that with a straightforward FIR filter bank. Moreover, various low-power VLSI design techniques are investigated in detail and applied on our design. The proposed complexity-effective ANSI S1.11 FIR filter bank has been implemented in the TSMC 0.13-μ m CMOS technology with an area-efficient architecture. The test chip consumes only 87 μW, which is 30%-79% of that of the others available in the literature. The proposed low-power ANSI 1/3-octave bank makes itself being able to precisely apply the prescribed gains obtained by NAL-NL1 prescription formula for hearing-impaired people.


signal processing systems | 2011

Parallel Architecture Core (PAC)--the First Multicore Application Processor SoC in Taiwan Part I: Hardware Architecture & Software Development Tools

David Chih-Wei Chang; Tay-Jyi Lin; Chung-Ju Wu; Jenq Kuen Lee; Yuan-Hua Chu; An-Yeu Wu

In order to develop a low-power and high-performance SoC platform for multimedia applications, the Parallel Architecture Core (PAC) project was initiated in Taiwan in 2003. A VLIW digital signal processor (PACDSP) has been developed from a proprietary instruction set with multimedia-rich instructions, a complexity-effective microarchitecture with an innovative distributed & ping-pong register organization and variable-length VLIW encoding, to a highly-configurable soft IP with several successful silicon implementations. A complete toolchain with an optimizing C compiler has also been developed for PACDSP. A dual-core PAC SoC has been designed and fabricated, which consists of a PACDSP core, an ARM9 core, scratchpad memories, and various on-chip peripherals, to demonstrate the outstanding performance and energy efficiency for multimedia processing such as the real-time H.264 codec. The first part of the two introductory papers of PAC describes the hardware architecture of the PACDSP core, its software development tools, and the PAC SoC with dynamic voltage and frequency scaling (DVFS).


international conference on computer design | 2003

An efficient VLIW DSP architecture for baseband processing

Tay-Jyi Lin; Chin-Chi Chang; Chen-Chia Lee; Chein-Wei Jen

The VLIW processors with static instruction scheduling and thus deterministic execution times are very suitable for high-performance real-time DSP applications. But the two major weaknesses in VLIW processors prevent the integration of more functional units (FU)for a higher instruction issuing rate & the dramatically growing complexity in the register file (RF), and the poor code density. We propose a novel ring-structure RF, which partitions the centralized RF into 2N subblocks with an explicit N-by-N switch network for N FU. Each subblock only requires access ports for a single FU. We also propose the hierarchical VLIW encoding with variable-length RISC-like instructions and NOP removal. The ring-structure RF saves 91.88% silicon area and reduces 77.35% access time of the centralized RF. Our simulation results show that the proposed instruction set architecture with the exposed ring-structure RF has comparable performance with the state-of-the-art DSP processors. Moreover, the hierarchical VLIW encoding can save 32%/spl sim/50% code sizes.


great lakes symposium on vlsi | 2005

A unified processor architecture for RISC & VLIW DSP

Tay-Jyi Lin; Chie-Min Chao; Chia-Hsien Liu; Pi-Chen Hsiao; Shin-Kai Chen; Li-Chun Lin; Chih-Wei Liu; Chein-Wei Jen

This paper presents a unified processor core with two operation modes. The processor core works as a compiler-friendly MIPS-like core in the RISC mode, and it is a 4-way VLIW in its DSP mode, which has distributed and ping-pong register organization optimized for stream processing. To minimize hardware, the DSP mode has no control construct for program flow, while the data manipulation RISC instructions are executed in the DSP datapath. Moreover, the two operation modes can be changed instruction by instruction within a single program stream via the hierarchical instruction encoding, which also helps to reduce the VLIW code sizes significantly. The processor has been implemented in the UMC 0.18um CMOS technology, and its core size is 3.23mmx3.23mm including the 32KB on-chip memory. It can operate at 208MHz while consuming 380.6mW average power.


international soc design conference | 2009

Software development tools for streaming DSP applications

Ching-Hsiang Chuang; Chiu-Ling Chen; Pi-Cheng Hsiao; Tay-Jyi Lin

Programming multicore application processors is a daunting task and component-based software development has already demonstrated the effectiveness to simplify it. However, component compositions are cumbersome, time-consuming and error-prone. This paper presents a graphical tool to mitigate the problem, which efficiently visualizes the design capture, simulation and debugging processes of TI DaVinci multicore platform based on Codec Engine and the xDAIS framework components. The experimental result shows our proposed tool introduces only 1% run-time overhead, which is neglectable for practical applications.


international soc design conference | 2008

Energy-effective design & implementation of an embedded VLIW DSP

Tien-Wei Hsieh; Pi-Chen Hsiao; Che-Yu Liao; Hsien-Ching Hsieh; Huang-Lun Lin; Tay-Jyi Lin; Yuan-Hua Chu; An-Yeu Wu

This paper describes the power/energy optimizations of an embedded VLIW digital signal processor (DSP) - PACDSP from the Industrial Technology Research Institute (ITRI). First, a configurable cache/scratchpad memory subsystem has been implemented, which saves the energy for tag matching in deeply embedded applications. An energy-effective cell-based design has been produced by analyzing the relationships between the energy efficiency and the synthesis constraints. Finally, dynamic voltage & frequency scaling (DVFS) and power gating have been applied to reduce both switching and leakage power dissipations with a novel common power format (CPF) flow. The test-chip will be fabricated in the TSMC 90 nm CMOS technology, of which the estimated power dissipations are 156.62 mW for 350 MHz @1 V and 46.60 mW for 230 MHz @0.7 V respectively.


signal processing systems | 2009

Parallel object detection on multicore platforms

Shin-Kai Chen; Tay-Jyi Lin; Chih-Wei Liu

Object detection is an important function for intelligent multimedia processing, but its computational complexity prevented its pervasive uses in consumer electronics. Cost-effective & energy-efficient computations are now available with various innovative multicore architectures proposed for embedded systems. However, extensive software optimizations are needed to unravel the inherent parallelisms in object detection for multicore processing. This paper presents interleaved reordering and splitting of parallel tasks in object detection. Overall performance improvements by 10% & 19% have been measured for the proposed methods respectively on a face detection prototype implemented on Sony PlayStation 3.


international symposium on vlsi design, automation and test | 2005

A novel register organization for VLIW digital signal processors

Tay-Jyi Lin; Chen-Chia Lee; Chih-Wei Liu; Chein-Wei Jen

This paper presents a novel register organization for VLIW DSPs. The simulation results show the performance of a DSP with the proposed register file is comparable with state-of-the-art DSPs. However, the proposed register file can save 89.7% area of a conventional centralized one, while reducing its access time by 68.6%.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2012

A 4R/2W Register File Design for UDVS Microprocessors in 65-nm CMOS

Pei-Yao Chang; Tay-Jyi Lin; Jinn-Shyan Wang; Yen-Hsiang Yu

This brief presents a 4R/2W register file design for two-issue microprocessors with ultra-wide dynamic voltage scaling. A full-N separated read port has been proposed to save ~ 19% area and to improve 4.5 ~ 10.4 % performance of state-of-the-art 1P3N designs for subthreshold operations. In addition, a reconfigurable write scheme has been proposed to utilize the unused write port in the energy-efficient mode with single-issue execution for ~ 18% write noise margin improvement. A test chip has been designed and fabricated using the TSMC 65-nm GP process, of which a minimum operating voltage of 148 mV has been measured.

Collaboration


Dive into the Tay-Jyi Lin's collaboration.

Top Co-Authors

Avatar

Chih-Wei Liu

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chein-Wei Jen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jinn-Shyan Wang

National Chung Cheng University

View shared research outputs
Top Co-Authors

Avatar

Yu-Ting Kuo

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Tien-Fu Chen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chie-Min Chao

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chingwei Yeh

National Chung Cheng University

View shared research outputs
Top Co-Authors

Avatar

Pi-Chen Hsiao

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Shih-Hao Ou

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yuan-Hua Chu

Industrial Technology Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge