Chin-Chia Michael Yeh

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chin-Chia Michael Yeh is active.

Explore More

Publication

Featured researches published by Chin-Chia Michael Yeh.

international conference on data mining | 2016

Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets

Chin-Chia Michael Yeh; Yan Zhu; Liudmila Ulanova; Nurjahan Begum; Yifei Ding; Hoang Anh Dau; Diego Furtado Silva; Abdullah Mueen; Eamonn J. Keogh

The all-pairs-similarity-search (or similarity join) problem has been extensively studied for text and a handful of other datatypes. However, surprisingly little progress has been made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for two time series data mining problems, including motif discovery and novelty discovery.

IEEE Transactions on Multimedia | 2014

A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval

Li Su; Chin-Chia Michael Yeh; Jen-Yu Liu; Ju-Chiang Wang; Yi-Hsuan Yang

There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.

international conference on multimedia retrieval | 2012

Supervised dictionary learning for music genre classification

Chin-Chia Michael Yeh; Yi-Hsuan Yang

This paper concerns the development of a music codebook for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like representation better captures the rich and time-varying information of music. We systematically compare a number of existing codebook generation techniques and also propose a new one that incorporates labeled data in the dictionary learning process. Several aspects of the encoding system such as local feature extraction and codeword encoding are also analyzed. Our result demonstrates the superiority of sparsity-enforced dictionary learning over conventional VQ-based or exemplar-based methods. With the new supervised dictionary learning algorithm and the optimal settings inferred from the performance study, we achieve state-of-the-art accuracy of music genre classification using just the log-power spectrogram as the local feature descriptor. The classification accuracies for benchmark datasets GTZAN and IS-MIR2004Genre are 84.7% and 90.8%, respectively.

international conference on data mining | 2016

Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins

Yan Zhu; Zachary Zimmerman; Nader Shakibay Senobari; Chin-Chia Michael Yeh; Gareth J. Funning; Abdullah Mueen; Philip Brisk; Eamonn J. Keogh

Time series motifs have been in the literature for about fifteen years, but have only recently begun to receive significant attention in the research community. This is perhaps due to the growing realization that they implicitly offer solutions to a host of time series problems, including rule discovery, anomaly detection, density estimation, semantic segmentation, etc. Recent work has improved the scalability to the point where exact motifs can be computed on datasets with up to a million data points in tenable time. However, in some domains, for example seismology, there is an insatiable need to address even larger datasets. In this work we show that a combination of a novel algorithm and a high-performance GPU allows us to significantly improve the scalability of motif discovery. We demonstrate the scalability of our ideas by finding the full set of exact motifs on a dataset with one hundred million subsequences, by far the largest dataset ever mined for time series motifs. Furthermore, we demonstrate that our algorithm can produce actionable insights in seismology and other domains.

international conference on acoustics, speech, and signal processing | 2013

Dual-layer bag-of-frames model for music genre classification

Chin-Chia Michael Yeh; Li Su; Yi-Hsuan Yang

This paper concerns the development of a music dictionary-based model for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like, bag-of-frames representation better captures the rich and time-varying information of music. However, the dictionary used in classical bag-of-frames model only captures frame-level elements of the music; thus, there exists a semantic gap between the dictionary element and commonly seen music description. In order to reduce the gap, a new feature representation called dual-layer bag-of-frames is proposed in this paper. It models the music with a two layer structure, where the first-layer dictionary captures the frame-level characteristics, and the second-layer dictionary captures the segment-level semantics. This hierarchical structure resembles the alphabet-word-document structure of text. Our result demonstrates that the proposed dual-layer bag-of-frames feature achieves state-of-the-art accuracy of music genre classification. The classification accuracy for the GTZAN benchmark reaches 86.7% with dictionary trained from GTZAN, and 83.6% with dictionary trained from another data set USPOP.

international conference on acoustics, speech, and signal processing | 2014

Modified lasso screening for audio word-based music classification using large-scale dictionary

Ping-Keng Jao; Chin-Chia Michael Yeh; Yi-Hsuan Yang

Representing music information using audio codewords has led to state-of-the-art performance on various music classifcation benchmarks. Comparing to conventional audio descriptors, audio words offer greater fexibility in capturing the nuance of music signals, in that each codeword can be viewed as a quantization of the music universe and that the quantization goes finer as the size of the dictionary (i.e., audio codebook) increases. In practice, however, the high computational cost of codeword assignment might discourage the use of a large dictionary. This paper presents two modifications of a LASSO screening technique developed in the compressive sensing field to speed up the codeword assignment process. The first modification exploits the repetitive nature of music signals, whereas the second one relaxes a screening constraint that is specific to reconstruction but not for classifcation. Our experiments show that the proposed method enables the use of a dictionary of 10,000 codewords with runtime close to the case of using a dictionary of 1,000 codewords. Moreover, using the larger dictionary significantly improves the mean average precision (MAP) from 0.219 to 0.246 for tagging thousands of tracks with 147 possible genre tags.

Data Mining and Knowledge Discovery | 2018

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

Chin-Chia Michael Yeh; Yan Zhu; Liudmila Ulanova; Nurjahan Begum; Yifei Ding; Hoang Anh Dau; Zachary Zimmerman; Diego Furtado Silva; Abdullah Mueen; Eamonn J. Keogh

The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.

international conference on acoustics, speech, and signal processing | 2014

Improving music auto-tagging by intra-song instance bagging

Chin-Chia Michael Yeh; Ju-Chiang Wang; Yi-Hsuan Yang; Hsin-Min Wang

Bagging is one the most classic ensemble learning techniques in the machine learning literature. The idea is to generate multiple subsets of the training data via bootstrapping (random sampling with replacement), and then aggregate the output of the models trained from each subset via voting or averaging. As music is a temporal signal, we propose and study two bagging methods in this paper: the inter-song instance bagging that bootstraps song-level features, and the intra-song instance bagging that draws bootstrapping samples directly from short-time features for each training song. In particular, we focus on the latter method, as it better exploits the temporal information of music signals. The bagging methods result in surprisingly effective models for music auto-tagging: incorporating the idea to a simple linear support vector machine (SVM) based system yields accuracies that are comparable or even superior to state-of-the-art, possibly more sophisticated methods for three different datasets. As the bagging method is a meta algorithm, it holds the promise of improving other MIR systems.

asia-pacific signal and information processing association annual summit and conference | 2013

Towards a more efficient sparse coding based audio-word feature extraction system

Chin-Chia Michael Yeh; Yi-Hsuan Yang

This paper is concerned with the efficiency of sparse coding based audio-word feature extraction system. In particular, we have defined and added the concept of early and late temporal pooling to the classic sparse coding based audio-word feature extraction pipeline, and we have tested them on the genre tags subset of the CAL10k data set. We define temporal pooling as any functions that are able to transforms the input time series representation into a more temporally compact representation. Under this definition, we have examined the following two temporal pooling functions for improving the feature extractions efficiency, and they are: Early Texture Window Pooling and Multiple Frame Representation. Early texture window pooling tremendously boost the efficiency by compromising the retrieving accuracy, while multiple frame representation slightly improve both the feature extracting efficiency and retrieving accuracy. Overall, our best feature extraction setup achieves 0.202 in mean average precision on the genre tags subset of the CAL10k data set.

acm multimedia | 2012

Bilingual analysis of song lyrics and audio words

Jen-Yu Liu; Chin-Chia Michael Yeh; Yi-Hsuan Yang; Yuan-Ching Teng

Thanks to the development of music audio analysis, state-of-the-art techniques can now detect musical attributes such as timbre, rhythm, and pitch with certain level of reliability and effectiveness. An emerging body of research has begun to model the high-level perceptual properties of music listening, including the mood and the preferable listening context of a music piece. Towards this goal, we propose a novel text-like feature representation that encodes the rich and time-varying information of music using a composite of features extracted from the song lyrics and audio signals. In particular, we investigate dictionary learning algorithms to optimize the generation of local feature descriptors and also probabilistic topic models to group semantically relevant text and audio words. This text-like representation leads to significant improvement in automatic mood classification over conventional audio features.

Explore More