Hsuan-Huei Shih
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hsuan-Huei Shih.
international conference on multimedia and expo | 2002
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C.J. Kuo
Providing natural and efficient access to the fast growing multimedia information, accommodating a variety of user skills and preferences, is a critical aspect of content-based information mining. Query by humming provides a natural means for content-based retrieval from music databases. A statistical pattern recognition approach for recognizing hummed or sung melodies is reported in this paper. Being data-driven, the proposed system aims at providing a robust front-end especially for dealing with variability in users productions. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM) while data features such as pitch measures are modeled by Gaussian mixture models (GMM). Preliminary real-time recognition experiments are carried out based on humming data obtained from eight users and an overall correct recognition rate of around 80% is demonstrated.
international conference on acoustics, speech, and signal processing | 2003
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C.J. Kuo
A new statistical pattern recognition approach applied to human humming transcription is proposed. A musical note has two important attributes, i.e. pitch and duration. The proposed algorithm generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust frontend for such an application. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM), while the pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human subjects, and an overall correct recognition rate of around 80% is demonstrated.
international symposium on intelligent multimedia video and speech processing | 2001
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C.J. Kuo
A dictionary based approach for extracting repetitive patterns in music aimed at music feature extraction and indexing for audio database management is proposed. Segmentation is achieved based on the tempo information and a music score is decomposed into bars. Each bar is indexed and a bar index table is built. Then, an adaptive dictionary based compression algorithm known as Lempel Ziv 78 (LZ-78) is applied to the bar-represented music scores to extract repetitive patterns (J. Ziv and A. Lempel, 1978). Finally, pruning is performed to this dictionary to remove non-repeating patterns and combine shorter repeating patterns into a longer repeating pattern. The LZ78 algorithm is slightly modified to achieve better results in the current context. Experiments are performed to MIDI files, and the proposed algorithm has demonstrated an excellent performance.
international conference on multimedia and expo | 2003
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C.J. Kuo
A new phone level hidden Markov model approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed system generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such an application. The segment of a note in the humming waveform is modeled by phone level hidden Markov models (HMM). The duration of the note segment is then labeled by a duration model. The pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 84% is demonstrated.
Multimedia Systems | 2005
Erdem Unal; Shrikanth Narayanan; Hsuan-Huei Shih; Elaine Chew; C.-C.J. Kuo
Advances in music retrieval research greatly depend on appropriate database resources and their meaningful organization. In this paper we describe data collection efforts related to the design of query-by-humming (QBH) systems. We also provide a statistical analysis for categorizing the collected data, especially focusing on intersubject variability issues. In total, 100 people participated in our experiment, resulting in around 2000 humming samples drawn from a predefined melody list consisting of 22 different well-known music pieces and over 500 samples of melodies that were chosen spontaneously by our subjects. These data are being made available to the research community. The data from each subject were compared to the expected melody features, and an objective measure was derived to quantify the statistical deviation from the baseline. The results showed that the uncertainty in human humming varies depending on the musical structure of the melodies and the musical background of the subjects. Such details are important for designing robust QBH systems.
international conference on multimedia and expo | 2001
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C.J. Kuo
A dictionary-based approach for extracting repetitive patterns in music aiming at music feature extraction and indexing for audio database management is proposed. In this system, segmentation is achieved with the tempo information, and a music score is decomposed into bars. Each bar is indexed to construct a bar index table. Then, an adaptive dictionary-based compression algorithm known as Lempel Ziv 78 (LZ-78) is applied to the barrepresented music scores to extract repetitive patterns. Finally, pruning is applied to this dictionary to remove non-repeating patterns and to combine shorter repeating patterns into a longer one. The LZ78 algorithm is slightly modified to achieve better results in the current application context. Experiments performed on a popular music database of MIDI files demonstrated that the proposed algorithm extracts repeating melodies effectively with a speed of four times faster compared to the traditional linear search approach.
Storage and Retrieval for Image and Video Databases | 2001
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C. Jay Kuo
Automatic melody extraction techniques can be used to index and retrieve songs in music databases. Here, we consider a piece of music consisting of numerical music scores (e.g. the MIDI file format) as the input. Segmentation is done based on the tempo information, and a music score is decomposed into bars. Each bar is indexed, and a bar index table is built accordingly. Two approaches were proposed to find repeating patterns by the authors recently. In the first approach, an adaptive dictionary-based algorithm known as the Lempel Ziv 78 (LZ-78) was modified and applied to melody extraction, which is called the modified LZ78 algorithm or MLZ78. In the second approach, a sliding window is applied to generate the pattern dictionary. It is called the Exhaustive Search with Progressive LEngth algorithm or ESPLE. Dictionaries generated from both approaches need to be pruned to remove non-repeating patterns. Each iteration of either MLZ78 or ESPLE is followed by pruning of updated dictionaries generated from the previous cycle until the dictionaries converge. Experiments are performed on MIDI files to evaluate the performance of the proposed algorithms. In this research, we compare results obtained from these two systems in terms of complexity, performance accuracy and efficiency. Their relative merits and shortcomings are discussed in detail.
ITCom 2001: International Symposium on the Convergence of IT and Communications | 2001
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C. Jay Kuo
Extraction of repetitive patterns of the main melody in a given music piece is investigated in this research. A dictionary-based approach is proposed to achieve the task. The input to the proposed system is a piece of music consisting of numerical music scores (e.g. the MIDI file format), and other music forms such as the sound wave have to be converted to numerical music scores first. In the system, segmentation is done based on the tempo information and a music score is decomposed into bars. Each bar is indexed, and a bar index table is built accordingly. Then, an adaptive dictionary-based algorithm known as the Lempel Ziv 78 (LZ-78) is modified and applied to the bar-represented music scores to extract repetitive patterns. The LZ78 algorithm is slightly modified to achieve better results, and the modified LZ78 is named the ¡§Exhaustive Search with Progressive LEngth¡¨ (ESPLE). After this step, pruning is applied to this dictionary to remove non-repeating patterns. Modified LZ78 and pruning are repetitively applied to the updated dictionary, which is generated from the previous cycle, until the dictionary converges. Experiments are performed on MIDI files to demonstrate the superior performance of the proposed algorithm.
international conference on acoustics, speech, and signal processing | 2002
Hsuan-Huei Shih; Shrikanth Narayanan; C.-C. Jay Kuo
A statistical pattern recognition approach applied to human humming data is examined in this research. Query by humming provides a natural means for content-based retrieval from music databases. The proposed system aims at providing a robust front-end for such an application. The segment of a note in the humming waveform is modeled by a hidden Markov model (HMM) while data features such as pitch measures are modeled by a Gaussian mixture model (GMM). Preliminary real-time recognition experiments are carried out based on humming data obtained from eight users and an overall correct recognition rate of around 80% is demonstrated.
multimedia information retrieval | 2003
Erdem Unal; Shrikanth Narayanan; Hsuan-Huei Shih; Elaine Chew; C.-C. Jay Kuo