Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hsin-Min Wang is active.

Publication


Featured researches published by Hsin-Min Wang.


international conference on speech image processing and neural networks | 1994

Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions

Lin-Shan Lee; Keh-Jiann Chen; Chiu-yu Tseng; Ren-Yuan Lyu; Lee-Feng Chien; Hsin-Min Wang; Jia-Lin Shen; Sung-Chien Lin; Yen-Ju Yang; Bo-Ren Bai; Chi-ping Nee; Chun-Yi Liao; Shueh-Sheng Lin; Chung-Shu Yang; I-Jung Hung; Ming-Yu Lee; Rei-Chang Wang; Bo-Shen Lin; Yuan-Cheng Chang; Rung-Chiung Yang; Yung-Chi Huang; Chen-Yuan Lou; Tung-Sheng Lin

Golden Mandarin (II) is an intelligent single-chip based real-time Mandarin dictation machine for the Chinese language with a very large vocabulary for the input of unlimited Chinese texts into computers using voice. This dictation machine can be installed on any personal computer, in which only a single chip Motorola DSP 96002D is used, with a preliminary character correct rate around 95% at a speed of 0.6 sec per character. Various adaptation/learning functions have been developed for this machine, including fast adaptation to new speakers, on-line learning the voice characteristics, task domains, word pattern and noise environments of the users, so the machine can be easily personalized for each user. These adaptation/learning functions are the major subjects of the paper.<<ETX>>


Speech Communication | 2005

Fluent speech prosody: Framework and modeling

Chiu-yu Tseng; ShaoHuang Pin; Yehlin Lee; Hsin-Min Wang; Yong-cheng Chen

Abstract The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. We analyzed speech corpora of read Mandarin Chinese discourses from a top–down perspective on perceived units and boundaries, and consistently identified speech paragraphs of multiple phrases that reflected discourse rather than sentence effects in fluent speech. Subsequent cross-speaker and cross-speaking-rate acoustic analyses of identified speech paragraphs revealed systematic cross-phrase prosodic patterns in every acoustic parameter, namely, F0 contours, duration adjustment, intensity patterns, and in addition, boundary breaks. We therefore argue for a higher prosodic node that governs, constrains, and groups phrases to derive speech paragraphs. A hierarchical multi-phrase framework is constructed to account for the governing effect, with complimentary production and perceptual evidences. We show how cross-phrase F0 and syllable duration patterns templates are derived to account for the tune and rhythm characteristic to fluent speech prosody, and argue for a prosody framework that specifies phrasal intonations as subjacent sister constituent subject to higher terms. Output fluent speech prosody is thus cumulative results of contributions from every prosodic layer. To test our framework, we further construct a modular prosody model of multiple-phrase grouping with four corresponding acoustic modules and begin testing the model with speech synthesis. To conclude, we argue that any prosody framework of fluent speech should include prosodic contributions above individual sentences in production, with considerations of its perceptual effects to on-line processing; and development of unlimited TTS could benefit most appreciably by capturing and including cross-phrase relationships in prosody modeling.


international conference on acoustics, speech, and signal processing | 1993

Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Keh-Jiann Chen; I-Jung Hung; Ming-Yu Lee; Lee-Feng Chien; Yumin Lee; Ren-Yuan Lyu; Hsin-Min Wang; Yung-Chuan Wu; Tung-Sheng Lin; Hung-yan Gu; Chi-ping Nee; Chun-Yi Liao; Yeng-Ju Yang; Yuan-Cheng Chang; Rung-Chiung Yang

Golden Mandarin (II) is an improved single-chip real-time Mandarin dictation machine with a very large vocabulary for the input of unlimited Chinese sentences into computers using voice. In this dictation machine only a single-chip Motorola DSP 96002D on an Ariel DSP-96 card is used, with a preliminary character correct rate of around 95% in speaker-dependent mode at a speech of 0.36 s per character. This is achieved by many new techniques, primarily a segmental probability modeling technique for syllable recognition especially considering the characteristics of Mandarin syllables, and a word-lattice-based Chinese character bigram for character identification especially considering the structure of the Chinese language.<<ETX>>


IEEE Transactions on Speech and Audio Processing | 2002

Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese

Berlin Chen; Hsin-Min Wang; Lin-Shan Lee

With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and more important. In this paper, considering the monosyllabic structure of the Chinese language, a whole class of syllable-based indexing features, including overlapping segments of syllables and syllable pairs separated by a few syllables, is extensively investigated based on a Mandarin broadcast news database. The strong discriminating capabilities of such syllable-based features were verified by comparing with the word- or character-based features. Good approaches for better utilizing such capabilities, including fusion with the word- and character-level information and improved approaches to obtain better syllable-based features and query expressions, were extensively investigated. Very encouraging experimental results were obtained.


IEEE Transactions on Speech and Audio Processing | 1997

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data

Hsin-Min Wang; Tai-Hsuan Ho; Rung-Chiung Yang; Jia-Lin Shen; Bo-Ren Bai; Jenn-Chau Hong; Wei-Peng Chen; Tong-Lo Yu; Lin-Shan Lee

This correspondence presents the first known results of complete recognition of continuous Mandarin speech for the Chinese language with very large vocabulary but very limited training data. Various acoustic and linguistic processing techniques were developed, and a prototype system of a continuous speech Mandarin dictation machine has been successfully implemented. The best recognition accuracy achieved is 92.2% for finally decoded Chinese characters.


IEEE Transactions on Multimedia | 2011

Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang; Shou-De Lin

Audio tags correspond to keywords that people use to describe different aspects of a music clip. With the explosive growth of digital music available on the Web, automatic audio tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. This can be achieved by training a binary classifier for each tag based on the labeled music data. Our method that won the MIREX 2009 audio tagging competition is one of this kind of methods. However, since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging problem as a cost-sensitive classification problem. In addition, tag correlation information is useful for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging problem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem and propose two solutions to solve it. The experimental results demonstrate that the new approach outperforms our MIREX 2009 winning method.


IEEE Transactions on Multimedia | 2008

A Query-by-Singing System for Retrieving Karaoke Music

Hung-Ming Yu; Wei-Ho Tsai; Hsin-Min Wang

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively, and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a users sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.


Computer Music Journal | 2004

Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics

Wei-Ho Tsai; Dwight Rodgers; Hsin-Min Wang

This paper presents an effective technique for automatically clustering undocumented music recordings based on their associated singer. This serves as an indispensable step towards indexing and content-based information retrieval of music by singer. The proposed clustering system operates in an unsupervised manner, in which no prior information is available regarding the characteristics of singer voices, nor the population of singers. Methods are presented to separate vocal from non-vocal regions, to isolate the singers’ vocal characteristics from the background music, to compare the similarity between singers’ voices, and to determine the total number of unique singers from a collection of songs. Experimental evaluations conducted on a 200-track pop music database confirm the validity of the proposed system.


international conference on acoustics, speech, and signal processing | 2000

Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics

Berlin Chen; Hsin-Min Wang; Lin-Shan Lee

Spoken document retrieval has been extensively studied over the years because of its high potential in various applications in the near future. Considering the monosyllabic structure of the Chinese language, a whole class of indexing features for retrieval of spoken documents in Mandarin Chinese using syllable-level statistical characteristics has been studied, and very encouraging experimental results on retrieval of broadcast news speech collected in Taiwan were obtained. This paper reports some interesting initial results and findings obtained in this research.


international conference on acoustics, speech, and signal processing | 1995

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data

Hsin-Min Wang; Jia-Lin Shen; Yen-Ju Yang; Chiu-yu Tseng; Lin-Shan Lee

This paper presents the first known results for complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but very limited training data. Although some isolated-syllable-based or isolated-word-based large-vocabulary Mandarin speech recognition systems have been successfully developed, a continuous-speech-based system of this kind has never been reported before. For successful development of this system, several important techniques have been used, including acoustic modeling of a set of sub-syllabic models for base syllable recognition and another set of context-dependent models for tone recognition, a multiple candidate searching technique based on a concatenated syllable matching algorithm to synchronize base syllable and tone recognition, and a word-class-based Chinese language model for linguistic decoding. The best recognition accuracy achieved is 88.69% for finally decoded Chinese characters, with 88.69%, 91.57%, and 81.37% accuracy for base syllables, tones, and tonal syllables respectively.

Collaboration


Dive into the Hsin-Min Wang's collaboration.

Top Co-Authors

Avatar

Berlin Chen

National Taiwan Normal University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wei-Ho Tsai

National Taipei University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lin-Shan Lee

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Yu Tsao

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shyh-Kang Jeng

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge