Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lin-Shan Lee is active.

Publication


Featured researches published by Lin-Shan Lee.


IEEE Signal Processing Magazine | 2005

Spoken document understanding and organization

Lin-Shan Lee; Berlin Chen

Spoken documents (or associated multimedia content) are in fact better understood and reorganized in a way that retrieval/browsing can be performed easily. For example, they are now in the form of short paragraphs, properly organized in some hierarchical visual presentation with titles/summaries/topic labels as references for retrieval and browsing. The retrieval can be performed based on the full content, the summaries/titles/topic labels, or both. In this article, this is referred to as spoken document understanding and organization for efficient retrieval/browsing applications. The purpose of this article is to present a concise, comprehensive, and integrated overview of related areas in a unified context of spoken document understanding and organization for efficient retrieval/browsing applications. In addition, we present an initial prototype system we developed at National Taiwan University as a new example of integrating the various technologies and functionalities.


international conference on acoustics, speech, and signal processing | 1990

A real-time Mandarin dictation machine for Chinese language with unlimited texts and very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Hung-yan Gu; Fu-hua Liu; C.H. Chang; Sung-Hsien Hsieh; Chia-ping Chen

A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described. Isolated syllables including the tones are first recognized using specially trained hidden Markov models with special feature parameters. The exact characters are then identified from the syllables using a Markov Chinese language model. The real-time implementation is on an IBM PC/AT, connected to a set of special hardware boards on which ten TMS 320C25 chips operate in parallel. It takes only 0.45 s to dictate a character.<<ETX>>


international conference on acoustics, speech, and signal processing | 1993

A new framework for recognition of Mandarin syllables with tones using sub-syllabic units

Chih-Heng Lin; Lin-Shan Lee; Pei-Yih Ting

Three classes of subsyllabic units for Mandarin syllables are defined, i.e., the initials, the finals, and the transitions. A new structure for Mandarin syllable recognition is developed, in which the tones and base syllables are recognized jointly and a total of 574 subsyllabic unit models will be enough to provide improved recognition performance. This approach has the potential to be extended to continuous Mandarin speech recognition, although the preliminary experiments demonstrated here are based on isolated syllables only.<<ETX>>


international conference on speech image processing and neural networks | 1994

Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions

Lin-Shan Lee; Keh-Jiann Chen; Chiu-yu Tseng; Ren-Yuan Lyu; Lee-Feng Chien; Hsin-Min Wang; Jia-Lin Shen; Sung-Chien Lin; Yen-Ju Yang; Bo-Ren Bai; Chi-ping Nee; Chun-Yi Liao; Shueh-Sheng Lin; Chung-Shu Yang; I-Jung Hung; Ming-Yu Lee; Rei-Chang Wang; Bo-Shen Lin; Yuan-Cheng Chang; Rung-Chiung Yang; Yung-Chi Huang; Chen-Yuan Lou; Tung-Sheng Lin

Golden Mandarin (II) is an intelligent single-chip based real-time Mandarin dictation machine for the Chinese language with a very large vocabulary for the input of unlimited Chinese texts into computers using voice. This dictation machine can be installed on any personal computer, in which only a single chip Motorola DSP 96002D is used, with a preliminary character correct rate around 95% at a speed of 0.6 sec per character. Various adaptation/learning functions have been developed for this machine, including fast adaptation to new speakers, on-line learning the voice characteristics, task domains, word pattern and noise environments of the users, so the machine can be easily personalized for each user. These adaptation/learning functions are the major subjects of the paper.<<ETX>>


international conference on acoustics, speech, and signal processing | 1993

Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Keh-Jiann Chen; I-Jung Hung; Ming-Yu Lee; Lee-Feng Chien; Yumin Lee; Ren-Yuan Lyu; Hsin-Min Wang; Yung-Chuan Wu; Tung-Sheng Lin; Hung-yan Gu; Chi-ping Nee; Chun-Yi Liao; Yeng-Ju Yang; Yuan-Cheng Chang; Rung-Chiung Yang

Golden Mandarin (II) is an improved single-chip real-time Mandarin dictation machine with a very large vocabulary for the input of unlimited Chinese sentences into computers using voice. In this dictation machine only a single-chip Motorola DSP 96002D on an Ariel DSP-96 card is used, with a preliminary character correct rate of around 95% in speaker-dependent mode at a speech of 0.36 s per character. This is achieved by many new techniques, primarily a segmental probability modeling technique for syllable recognition especially considering the characteristics of Mandarin syllables, and a word-lattice-based Chinese character bigram for character identification especially considering the structure of the Chinese language.<<ETX>>


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

The synthesis rules in a Chinese text-to-speech system

Lin-Shan Lee; Chiu-yu Tseng; Ming Ouhyoung

The synthesis rules developed for a successfully implemented Chinese text-to-speech system are described in detail. The design approach is based on a syllable concatenation that is rooted in the special characteristics of the Chinese language. Special attention given to the lexical tones and other prosodic rules, such as concatenation rules, sandhi rules, stress rules, intonation patterns, syllable duration rules, pause insertion rules, and energy modification rules. The rules are derived from the acoustic properties of Mandarin Chinese and therefore are useful not only in designing other Chinese text-to-speech systems, but also in understanding the characteristics of Mandarin sentences and processing Mandarin speech signals for other purposes such as segmentation or recognition. >


multimedia signal processing | 1997

Voice dictation of Mandarin Chinese

Lin-Shan Lee

The Chinese language is not alphabetic, and input of Chinese characters into computers remains a difficult problem even after decades of efforts made to overcome the problem. Voice dictation of Mandarin Chinese with a very large vocabulary is believed to be the perfect solution, but this is a highly challenging speech recognition problem with many technical issues yet unsolved. The characteristics of Mandarin Chinese, significantly different from those of most alphabetic western languages, lead to the fact that many special measures and unique approaches that consider the feature structure of the language are believed to be the key to providing better solutions to the problem. Such special measures and unique approaches are the primary focus of this article. We analyze the characteristic structure of Mandarin Chinese and discuss related issues. The primary focus is then on the key technology regarding the problem, including the basic architecture for Mandarin dictation, acoustic modeling/processing, and linguistic modeling/processing. Some typical prototype systems, other related applications, and initial industrial efforts and products are presented to indicate the feasibility of the key technology discussed.


IEEE Transactions on Speech and Audio Processing | 1993

Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Hung-yan Gu; Fu-hua Liu; Chen-hao Chang; Yueh-hong Lin; Yumin Lee; Shih-Lung Tu; Shew-Heng Hsieh; Chian-hung Chen

The first successfully implemented real-time Mandarin dictation machine, which recognizes Mandarin speech with very large vocabulary and almost unlimited texts for the input of Chinese characters into computers, is described. The machine is speaker-dependent, and the input speech is in the form of sequences of isolated syllables. The machine can be decomposed into two subsystems. The first subsystem recognizes the syllables using hidden Markov models. Because every syllable can represent many different homonym characters and form different multisyllabic words with syllables on its right or left, the second subsystem is needed to identify the exact characters from the syllables and correct the errors in syllable recognition. The real-time implementation is on an IBM PC/AT, connected to three sets of specially designed hardware boards on which seven TMS 320C25 chips operate in parallel. The preliminary test results indicate that it takes only about 0.45 s to dictate a syllable (or character) with an accuracy on the order of 90%. >


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Optimization of temporal filters for constructing robust features in speech recognition

Jeih-weih Hung; Lin-Shan Lee

Linear discriminant analysis (LDA) has long been used to derive data-driven temporal filters in order to improve the robustness of speech features used in speech recognition. In this paper, we proposed the use of new optimization criteria of principal component analysis (PCA) and the minimum classification error (MCE) for constructing the temporal filters. Detailed comparative performance analysis for the features obtained using the three optimization criteria, LDA, PCA, and MCE, with various types of noise and a wide range of SNR values is presented. It was found that the new criteria lead to superior performance over the original MFCC features, just as LDA-derived filters can. In addition, the newly proposed MCE-derived filters can often do better than the LDA-derived filters. Also, it is shown that further performance improvements are achievable if any of these LDA/PCA/MCE-derived filters are integrated with the conventional approach of cepstral mean and variance normalization (CMVN). The performance improvements obtained in recognition experiments are further supported by analyses conducted using two different distance measures.


IEEE Transactions on Signal Processing | 1991

Isolated-utterance speech recognition using hidden Markov models with bounded state durations

Hung-yan Gu; Chiu-yu Tseng; Lin-Shan Lee

Hidden Markov models (HMMs) with bounded state durations (HMM/BSD) are proposed to explicitly model the state durations of HMMs and more accurately consider the temporal structures existing in speech signals in a simple, direct, but effective way. A series of experiments have been conducted for speaker dependent applications using 408 highly confusing first-tone Mandarin syllables as the example vocabulary. It was found that in the discrete case the recognition rate of HMM/BSD (78.5%) is 9.0%, 6.3%, and 1.9% higher than the conventional HMMs and HMMs with Poisson and gamma distribution state durations, respectively. In the continuous case (partitioned Gaussian mixture modeling), the recognition rates of HMM/BSD (88.3% with 1 mixture, 88.8% with 3 mixtures, and 89.4% with 5 mixtures) are 6.3%, 5.0%, and 5.5% higher than those of the conventional HMMs, and 5.9% (with 1 mixture), 3.9% (with 3 mixtures) and 3.1% (with 1 mixture), 1.8% (with 3 mixtures) higher than HMMs with Poisson and gamma distributed state durations, respectively. >

Collaboration


Dive into the Lin-Shan Lee's collaboration.

Top Co-Authors

Avatar

Hung-yi Lee

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jia-Lin Shen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Berlin Chen

National Taiwan Normal University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yi-cheng Pan

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Cheng-Tao Chung

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Jeih-weih Hung

National Chi Nan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge