Is this you? Create Your Porfile

Wei-Ho Tsai

National Taipei University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wei-Ho Tsai is active.

Explore More

Publication

Featured researches published by Wei-Ho Tsai.

IEEE Transactions on Multimedia | 2008

A Query-by-Singing System for Retrieving Karaoke Music

Hung-Ming Yu; Wei-Ho Tsai; Hsin-Min Wang

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively, and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a users sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.

Computer Music Journal | 2004

Blind Clustering of Popular Music Recordings Based on Singer Voice Characteristics

Wei-Ho Tsai; Dwight Rodgers; Hsin-Min Wang

This paper presents an effective technique for automatically clustering undocumented music recordings based on their associated singer. This serves as an indispensable step towards indexing and content-based information retrieval of music by singer. The proposed clustering system operates in an unsupervised manner, in which no prior information is available regarding the characteristics of singer voices, nor the population of singers. Methods are presented to separate vocal from non-vocal regions, to isolate the singers’ vocal characteristics from the background music, to compare the similarity between singers’ voices, and to determine the total number of unique singers from a collection of songs. Experimental evaluations conducted on a 200-track pop music database confirm the validity of the proposed system.

Journal of Information Science and Engineering | 2008

Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval

Wei-Ho Tsai; Hung-Ming Yu; Hsin-Min Wang

Automatic extraction of information from music data is an important and challenging issue in the field of content-based music retrieval. As part of the research effort, this study presents a technique that automatically identifies cover versions of songs specified by users. The technique enables users to search for songs with an identical tune, but performed by different singers, in different languages, genres, and so on. The proposed system takes an excerpt of the song specified by the user as input, and returns a ranked list of songs similar to the input excerpt in terms of the main melody. To handle likely discrepancies, e.g., in tempo, transposition, and accompaniment, between cover versions and the original song, methods are presented to remove the non-vocal portions of the song, extract the sung notes from the accompanied vocals, and compare the similarities between the sung note sequences. Our experiments on a database of 594 cross-lingual popular songs show the feasibility of identifying cover versions of songs for music retrieval.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Automatic Evaluation of Karaoke Singing Based on Pitch, Volume, and Rhythm Features

Wei-Ho Tsai; Hsin-Chieh Lee

This study aims to develop an automatic singing evaluation system for Karaoke performances. Many Karaoke systems in the market today come with a scoring function. The addition of the feature enhances the entertainment appeal of the system due to the competitive nature of humans. The automatic Karaoke scoring mechanism to date, however, is still rudimentary, often giving inconsistent results with scoring by human raters. A cause of blunder arises from the fact that often only the singing volume is used as the evaluation criteria. To improve on the singing evaluation capabilities on Karaoke machines, this study exploits various acoustic features, including pitch, volume, and rhythm to assess a singing performance. We invited a number of singers having different levels of singing capabilities to record for Karaoke solo vocal samples. The performances were rated independently by four musicians, and then used in conjunction with additional Karaoke Video Compact Disk music for the training of our proposed system. Our experiment shows that the results of automatic singing evaluation are close to the human rating, where the Pearson product-moment correlation coefficient between them is 0.82.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation

Wei-Ho Tsai; Shih-Sian Cheng; Hsin-Min Wang

This paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. In attempts to determine which utterances should be grouped together, it is necessary to measure the voice similarities between utterances. Since most existing methods measure the inter-utterance similarities based directly on the spectrum-based features, the resulting clusters may not be well-related to speakers, but to various acoustic classes instead. This study remedies this shortcoming by projecting utterances onto a reference space trained to cover the generic voice characteristics underlying the whole utterance collection. The resultant projection vectors naturally reflect the relationships of voice similarities among all the utterances, and hence are more robust against interference from nonspeaker factors. Then, a clustering method based on maximum purity estimation is proposed, with the aim of maximizing the similarities between utterances within all the clusters. This method employs a genetic algorithm to determine the cluster to which each utterance should be assigned, which overcomes the limitation of conventional hierarchical clustering that the final result can only reach the local optimum. In addition, the proposed clustering method adapts a Bayesian information criterion to determine how many clusters should be created

international conference on acoustics, speech, and signal processing | 2005

Clustering speech utterances by speaker using Eigenvoice-motivated vector space models

Wei-Ho Tsai; Shih-Sian Cheng; Yi-Hsiang Chao; Hsin-Min Wang

The paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. The proposed method utilizes the vector space model, which was originally developed in document-retrieval research, to characterize each utterance as a tf-idf-based vector of acoustic terms, thereby deriving a reliable measurement of similarity between utterances. To define the required acoustic terms that are most representative in terms of voice characteristics, the Eigenvoice approach is applied to the utterances to be clustered, which creates a set of eigenvector-based terms. To further improve speaker-clustering performance, the proposed method encompasses a mechanism of blind relevance feedback for refining the inter-utterance similarity measure.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification

Wei-Ho Tsai; Hao-Ping Lin

One major challenge of identifying singers in popular music recordings lies in how to reduce the interference of background accompaniment in trying to characterize the singer voice. Although a number of studies on automatic Singer IDentification (SID) from acoustic features have been reported, most systems to date, however, do not explicitly deal with the background accompaniment. This study proposes a background accompaniment removal approach for SID by exploiting the underlying relationships between solo singing voices and their accompanied versions in cepstrum. The relationships are characterized by a transformation estimated using a large set of accompanied singing generated by manually mixing solo singing with the accompaniments extracted from Karaoke VCDs. Such a transformation reflects the cepstrum variations of a singing voice before and after it is added with accompaniments. When an unknown accompanied voice is presented to our system, the transformation is performed to convert the cepstrum of the accompanied voice into a solo-voice-like one. Our experiments show that such a background removal approach improves the SID accuracy significantly; even when a test music recording involves sung language not covered in the data for estimating the transformation.

acm/ieee joint conference on digital libraries | 2005

On the extraction of vocal-related information to facilitate the management of popular music collections

Wei-Ho Tsai; Hsin-Min Wang

With the explosive growth of networked collections of musical material, there is a need to establish a mechanism like a digital library to manage music data. This paper presents a content-based processing paradigm of popular song collections to facilitate the realization of a music digital library. The paradigm is built on the automatic extraction of information of interest from music audio signals. Because the vocal part is often the heart of a popular song, we focus on developing techniques to exploit the solo vocal signals underlying an accompanied performance. This supports the necessary functions of a music digital library, namely, music data organization, music information retrieval/recommendation, and copyright protection

asia information retrieval symposium | 2005

A query-by-singing technique for retrieving polyphonic objects of popular music

Hung-Ming Yu; Wei-Ho Tsai; Hsin-Min Wang

This paper investigates the problem of retrieving popular music by singing. In contrast to the retrieval of MIDI music, which is easy to acquire the main melody by the selection of the symbolic tracks, retrieving polyphonic objects in CD or MP3 format requires to extract the main melody directly from the accompanied singing signals, which proves difficult to handle well simply using the conventional pitch estimation. To reduce the interference of background accompaniments during the main melody extraction, methods are proposed to estimate the underlying sung notes in a music recording by taking into account the characteristic structure of popular song. In addition, to accommodate users’ unprofessional or personal singing styles, methods are proposed to handle the inaccuracies of tempo, pause, transposition, or off-key, etc., inevitably existing in queries. The proposed system has been evaluated on a music database consisting of 2613 phrases extracted manually from 100 Mandarin pop songs. The experimental results indicate the feasibility of retrieving pop songs by singing.

international conference on multimedia and expo | 2004

A query-by-example framework to retrieve music documents by singer

Wei-Ho Tsai; Hsin-Min Wang

We present a framework for music document retrieval that allows users to retrieve a specified singers music recordings from an unlabeled database by submitting a fragment of music as a query to the system. Such a framework can be of great use for those wishing to know more about a particular singer but having no idea about what the name of this singer is. In order for the searched documents to be relevant to the query, methods are proposed to compare the similarity between document and query based on the automatic extraction of a singers voice characteristics from a music recording.

Explore More