Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ju-Chiang Wang is active.

Publication


Featured researches published by Ju-Chiang Wang.


IEEE Transactions on Multimedia | 2011

Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang; Shou-De Lin

Audio tags correspond to keywords that people use to describe different aspects of a music clip. With the explosive growth of digital music available on the Web, automatic audio tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. This can be achieved by training a binary classifier for each tag based on the labeled music data. Our method that won the MIREX 2009 audio tagging competition is one of this kind of methods. However, since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging problem as a cost-sensitive classification problem. In addition, tag correlation information is useful for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging problem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem and propose two solutions to solve it. The experimental results demonstrate that the new approach outperforms our MIREX 2009 winning method.


IEEE Transactions on Multimedia | 2014

A Systematic Evaluation of the Bag-of-Frames Representation for Music Information Retrieval

Li Su; Chin-Chia Michael Yeh; Jen-Yu Liu; Ju-Chiang Wang; Yi-Hsuan Yang

There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.


acm multimedia | 2012

The acoustic emotion gaussians model for emotion-based music annotation and retrieval

Ju-Chiang Wang; Yi-Hsuan Yang; Hsin-Min Wang; Shyh-Kang Jeng

One of the most exciting but challenging endeavors in music research is to develop a computational model that comprehends the affective content of music signals and organizes a music collection according to emotion. In this paper, we propose a novel acoustic emotion Gaussians (AEG) model that defines a proper generative process of emotion perception in music. As a generative model, AEG permits easy and straightforward interpretations of the model learning processes. To bridge the acoustic feature space and music emotion space, a set of latent feature classes, which are learned from data, is introduced to perform the end-to-end semantic mappings between the two spaces. Based on the space of latent feature classes, the AEG model is applicable to both automatic music emotion annotation and emotion-based music retrieval. To gain insights into the AEG model, we also provide illustrations of the model learning process. A comprehensive performance study is conducted to demonstrate the superior accuracy of AEG over its predecessors, using two emotion annotated music corpora MER60 and MTurk. Our results show that the AEG model outperforms the state-of-the-art methods in automatic music emotion annotation. Moreover, for the first time a quantitative evaluation of emotion-based music retrieval is reported.


international conference on multimedia and expo | 2010

Homogeneous segmentation and classifier ensemble for audio tag annotation and retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang

Audio tags describe different types of musical information such as genre, mood, and instrument. This paper aims to automatically annotate audio clips with tags and retrieve relevant clips from a music database by tags. Given an audio clip, we divide it into several homogeneous segments by using an audio novelty curve, and then extract audio features from each segment with respect to various musical information, such as dynamics, rhythm, timbre, pitch, and tonality. The features in frame-based feature vector sequence format are further represented by their mean and standard deviation such that they can be combined with other segment-based features to form a fixed-dimensional feature vector for a segment. We train an ensemble classifier, which consists of SVM and AdaBoost classifiers, for each tag. For the audio annotation task, the individual classifier outputs are transformed into calibrated probability scores such that probability ensemble can be employed. For the audio retrieval task, we propose using ranking ensemble. We participated in the MIREX 2009 audio tag classification task and our system was ranked first in terms of F-measure and the area under the ROC curve given a tag.


IEEE Transactions on Affective Computing | 2015

Modeling the Affective Content of Music with a Gaussian Mixture Model

Ju-Chiang Wang; Yi-Hsuan Yang; Hsin-Min Wang; Shyh-Kang Jeng

Modeling the association between music and emotion has been considered important for music information retrieval and affective human computer interaction. This paper presents a novel generative model called acoustic emotion Gaussians (AEG) for computational modeling of emotion. Instead of assigning a music excerpt with a deterministic (hard) emotion label, AEG treats the affective content of music as a (soft) probability distribution in the valence-arousal space and parameterizes it with a Gaussian mixture model (GMM). In this way, the subjective nature of emotion perception is explicitly modeled. Specifically, AEG employs two GMMs to characterize the audio and emotion data. The fitting algorithm of the GMM parameters makes the model learning process transparent and interpretable. Based on AEG, a probabilistic graphical structure for predicting the emotion distribution from music audio data is also developed. A comprehensive performance study over two emotion-labeled datasets demonstrates that AEG offers new insights into the relationship between music and emotion (e.g., to assess the “affective diversity” of a corpus) and represents an effective means of emotion modeling. Readers can easily implement AEG via the publicly available codes. As the AEG model is generic, it holds the promise of analyzing any signal that carries affective or other highly subjective information.


international conference on acoustics, speech, and signal processing | 2014

Linear regression-based adaptation of music emotion recognition models for personalization

Yu-An Chen; Ju-Chiang Wang; Yi-Hsuan Yang; Homer H. Chen

Personalization techniques can be applied to address the subjectivity issue of music emotion recognition, which is important for music information retrieval. However, achieving satisfactory accuracy in personalized music emotion recognition for a user is difficult because it requires an impractically huge amount of annotations from the user. In this paper, we adopt a probabilistic framework for valence-arousal music emotion modeling and propose an adaptation method based on linear regression to personalize a background model in an online learning fashion. We also incorporate a component-tying strategy to enhance the model flexibility. Comprehensive experiments are conducted to test the performance of the proposed method on three datasets, including a new one created specifically in this work for personalized music emotion recognition. Our results demonstrate the effectiveness of the proposed method.


acm multimedia | 2011

Colorizing tags in tag cloud: a novel query-by-tag music search system

Ju-Chiang Wang; Yu-Chin Shih; Meng-Sung Wu; Hsin-Min Wang; Shyh-Kang Jeng

This paper presents a novel content-based query-by-tag music search system for an untagged music database. We design a new tag query interface that allows users to input multiple tags with multiple levels of preference (denoted as an MTML query) by colorizing desired tags in a web-based tag cloud interface. When a user clicks and holds the left mouse button (or presses and holds his/her finger on a touch screen) on a desired tag, the color of the tag will change cyclically according to a color map (from dark blue to bright red), which represents the level of preference (from 0 to 1). In this way, the user can easily organize and check the query of multiple tags with multiple levels of preference through the colored tags. To effect the MTML content-based music retrieval, we introduce a probabilistic fusion model (denoted as GMFM), which consists of two mixture models, namely a Gaussian mixture model and a multinomial mixture model. GMFM can jointly model the auditory features and tag labels of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag affinity-based matching, are incorporated into the pre-learned GMFM. We evaluate the proposed system on the MajorMiner and CAL-500 datasets. The experimental results demonstrate the effectiveness of GMFM and the potential of using MTML queries to search music from an untagged music database.


Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies | 2012

Exploring the relationship between categorical and dimensional emotion semantics of music

Ju-Chiang Wang; Yi-Hsuan Yang; Kaichun Chang; Hsin-Min Wang; Shyh-Kang Jeng

Computational modeling of music emotion has been addressed primarily by two approaches: the categorical approach that categorizes emotions into mood classes and the dimensional approach that regards emotions as numerical values over a few dimensions such as valence and activation. Being two extreme scenarios (discrete/continuous), the two approaches actually share a unified goal of understanding the emotion semantics of music. This paper presents the first computational model that unifies the two semantic modalities under a probabilistic framework, which makes it possible to explore the relationship between them in a computational way. With the proposed framework, mood labels can be mapped into the emotion space in an unsupervised and content-based manner, without any training ground truth annotations for the semantic mapping. Such a function can be applied to automatically generate a semantically structured tag cloud in the emotion space. To demonstrate the effectiveness of the proposed framework, we qualitatively evaluate the mood tag clouds generated from two emotion-annotated corpora, and quantitatively evaluate the accuracy of the categorical-dimensional mapping by comparing the results with those created by psychologists, including the one proposed by Whissell & Plutchik and the one defined in the Affective Norms for English Words (ANEW).


international conference on multimedia and expo | 2011

Query by multi-tags with multi-level preferences for content-based music retrieval

Ju-Chiang Wang; Meng-Sung Wu; Hsin-Min Wang; Shyh-Kang Jeng

This paper presents a novel content-based music retrieval system that accepts a query containing multiple tags with multiple levels of preference (denoted as an MTML query) to retrieve music from an untagged music database. We select a limited number of popular music tags to form the tag space and design an interface for users to input queries by operating the scroll bars. To effect MTML content-based music retrieval, we introduce a tag-based music aspect model that jointly models the auditory features and tag-based text features of a song. Two indexing methods and their corresponding matching methods, namely pseudo song-based matching and tag co-occurrence pattern-based matching, are incorporated into the pre-learned tag-based music aspect model. Finally, we evaluate the proposed system on the Major Miner dataset. The results demonstrate the potential of using MTML queries to retrieve music from an untagged music database.


international conference on acoustics, speech, and signal processing | 2015

The AMG1608 dataset for music emotion recognition

Yu-An Chen; Yi-Hsuan Yang; Ju-Chiang Wang; Homer H. Chen

Automated recognition of musical emotion from audio signals has received considerable attention recently. To construct an accurate model for music emotion prediction, the emotion-annotated music corpus has to be of high quality. It is desirable to have a large number of songs annotated by numerous subjects to characterize the general emotional response to a song. Due to the need for personalization of the music emotion prediction model to address the subjective nature of emotion perception, it is also important to have a large number of annotations per subject for training and evaluating a personalization method. In this paper, we discuss the deficiency of existing datasets and present a new one. The new dataset, which is publically available to the research community, is composed of 1608 30-second music clips annotated by 665 subjects. Furthermore, 46 subjects annotated more than 150 songs, making this dataset the largest of its kind to date.

Collaboration


Dive into the Ju-Chiang Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shyh-Kang Jeng

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Homer H. Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Yu-An Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shou-De Lin

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Chin-Chia Michael Yeh

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Chih-Yi Chiu

National Chiayi University

View shared research outputs
Researchain Logo
Decentralizing Knowledge