Jen-Yu Liu
Center for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jen-Yu Liu.
IEEE Transactions on Multimedia | 2013
Yi-Hsuan Yang; Jen-Yu Liu
A scientific understanding of emotion experience requires information on the contexts in which the emotion is induced. Moreover, as one of the primary functions of music is to regulate the listeners mood, the individuals short-term music preference may reveal the emotional state of the individual. In light of these observations, this paper presents the first scientific study that exploits the online repository of social data to investigate the connections between a bloggers emotional state, user context manifested in the blog articles, and the content of the music titles the blogger attached to the post. A number of computational models are developed to evaluate the accuracy of different content or context cues in predicting emotional state, using 40,000 pieces of music listening records collected from the social blogging website LiveJournal. Our study shows that it is feasible to computationally model the latent structure underlying music listening and mood regulation. The average area under the receiver operating characteristic curve (AUC) for the content-based and context-based models attains 0.5462 and 0.6851, respectively. The association among user mood, music emotion, and individuals personality is also identified.
IEEE Transactions on Multimedia | 2014
Li Su; Chin-Chia Michael Yeh; Jen-Yu Liu; Ju-Chiang Wang; Yi-Hsuan Yang
There has been an increasing attention on learning feature representations from the complex, high-dimensional audio data applied in various music information retrieval (MIR) problems. Unsupervised feature learning techniques, such as sparse coding and deep belief networks have been utilized to represent music information as a term-document structure comprising of elementary audio codewords. Despite the widespread use of such bag-of-frames (BoF) model, few attempts have been made to systematically compare different component settings. Moreover, whether techniques developed in the text retrieval community are applicable to audio codewords is poorly understood. To further our understanding of the BoF model, we present in this paper a comprehensive evaluation that compares a large number of BoF variants on three different MIR tasks, by considering different ways of low-level feature representation, codebook construction, codeword assignment, segment-level and song-level feature pooling, tf-idf term weighting, power normalization, and dimension reduction. Our evaluations lead to the following findings: 1) modeling music information by two levels of abstraction improves the result for difficult tasks such as predominant instrument recognition, 2) tf-idf weighting and power normalization improve system performance in general, 3) topic modeling methods such as latent Dirichlet allocation does not work for audio codewords.
acm multimedia | 2013
Chih-Ming Chen; Ming-Feng Tsai; Jen-Yu Liu; Yi-Hsuan Yang
This paper proposes a context-aware approach that recommends music to a user based on the users emotional state predicted from the article the user writes. We analyze the association between user-generated text and music by using a real-world dataset with user, text, music tripartite information collected from the social blogging website LiveJournal. The audio information represents various perceptual dimensions of music listening, including danceability, loudness, mode, and tempo; the emotional text information consists of bag-of-words and three dimensional affective states within an article: valence, arousal and dominance. To combine these factors for music recommendation, a factorization machine-based approach is taken. Our evaluation shows that the emotional context information mined from user-generated articles does improve the quality of recommendation, comparing to either the collaborative filtering approach or the content-based approach.
Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies | 2012
Jen-Yu Liu; Yi-Hsuan Yang
Nowadays, we often leave our personal information on the Internet without noticing it. People could learn things about you from these information. It has been reported that it is possible to infer some personal information from the web browsing records or from blog articles. As the music streaming services become increasingly popular, the music listening history of one person could be acquired easily. This paper investigates the possibility for a computer to automatically infer personal traits such as gender and age from the music listening history. Specifically, we consider three types of features for building the machine learning models, including 1) statistics of the listening timestamps, 2) song/artist metadata, and 3) song signal features, and evaluate the accuracy of binary age classification and gender classification utilizing a 1K-user dataset obtained from the online music service Last.fm. Our study brings about new insights into the human behavior of music listening, but also raises concern over the privacy issues involved in music streaming services.
international conference on multimedia and expo | 2014
Jen-Yu Liu; Sung-Yen Liu; Yi-Hsuan Yang
Recent years have witnessed a growing interest in modeling user behaviors in multimedia research, emphasizing the need to consider human factors such as preference, activity, and emotion in system development and evaluation. Following this research line, we present in this paper the LiveJournal two-million post (LJ2M) dataset to foster research on user-centered music information retrieval. The new dataset is characterized by the great diversity of real-life listening contexts where people and music interact. It contains blog articles from the social blogging website LiveJournal, along with tags self-reporting a users emotional state while posting and the musical track that the user considered as the best match for the post. More importantly, the data are contributed by users spontaneously in their daily lives, instead of being collected in a controlled environment. Therefore, it offers new opportunities to understand the interrelationship among the personal, situational, and musical factors of music listening. As an example application, we present research investigating the interaction between the affective context of the listener and the affective content of music, using audio-based music emotion recognition techniques and a psycholinguistic tool. The study offers insights into the role of music in mood regulation and demonstrates how LJ2M can contribute to studies on real-world music listening behavior.
acm multimedia | 2016
Jen-Yu Liu; Yi-Hsuan Yang
In music auto-tagging, people develop models to automatically label a music clip with attributes such as instruments, styles or acoustic properties. Many of these tags are actually descriptors of local events in a music clip, rather than a holistic description of the whole clip. Localizing such tags in time can potentially innovate the way people retrieve and interact with music, but little work has been done to date due to the scarcity of labeled data with granularity specific enough to the frame level. Most labeled data for training a learning-based model for music auto-tagging are in the clip level, providing no cues when and how long these attributes appear in a music clip. To bridge this gap, we propose in this paper a convolutional neural network (CNN) architecture that is able to make accurate frame-level predictions of tags in unseen music clips by using only clip-level annotations in the training phase. Our approach is motivated by recent advances in computer vision for localizing visual objects, but we propose new designs of the CNN architecture to account for the temporal information of music and the variable duration of such local tags in time. We report extensive experiments to gain insights into the problem of event localization in music, and validate through experiments the effectiveness of the proposed approach. In addition to quantitative evaluations, we also present qualitative analyses showing the model can indeed learn certain characteristics of music tags.
international conference on acoustics, speech, and signal processing | 2017
Ting-Wei Su; Jen-Yu Liu; Yi-Hsuan Yang
Audio event detection aims at discovering the elements inside an audio clip. In addition to labeling the clips with the audio events, we want to find out the temporal locations of these events. However, creating clearly annotated training data can be time-consuming. Therefore, we provide a model based on convolutional neural networks that relies only on weakly-supervised data for training. These data can be directly obtained from online platforms, such as Freesound, with the clip-level labels assigned by the uploaders. The structure of our model is extended to a fully convolutional networks, and an event-specific Gaussian filter layer is designed to advance its learning ability. Besides, this model is able to detect frame-level information, e.g., the temporal position of sounds, even when it is trained merely with clip-level labels.
web intelligence | 2013
Chih-Ming Chen; Ming-Feng Tsai; Jen-Yu Liu; Yi-Hsuan Yang
This paper proposes a music recommendation approach based on various similarity information via Factorization Machines (FM). We introduce the idea of similarity, which has been widely studied in the filed of information retrieval, and incorporate multiple feature similarities into the FM framework, including content-based and context-based similarities. The similarity information not only captures the similar patterns from the referred objects, but enhances the convergence speed and accuracy of FM. In addition, in order to avoid the noise within large similarity of features, we also adopt the grouping FM as an extended method to model the problem. In our experiments, a music-recommendation dataset is used to assess the performance of the proposed approach. The datasets is collected from an online blogging Web site, which includes user listening history, user profiles, social information, and music information. Our experimental results show that, with various types of feature similarities the performance of music recommendation can be enhanced significantly. Furthermore, via the grouping technique, the performance can be improved significantly in terms of Mean Average Precision, compared to the traditional collaborative filtering approach.
acm multimedia | 2012
Jen-Yu Liu; Chin-Chia Michael Yeh; Yi-Hsuan Yang; Yuan-Ching Teng
Thanks to the development of music audio analysis, state-of-the-art techniques can now detect musical attributes such as timbre, rhythm, and pitch with certain level of reliability and effectiveness. An emerging body of research has begun to model the high-level perceptual properties of music listening, including the mood and the preferable listening context of a music piece. Towards this goal, we propose a novel text-like feature representation that encodes the rich and time-varying information of music using a composite of features extracted from the song lyrics and audio signals. In particular, we investigate dictionary learning algorithms to optimize the generation of local feature descriptors and also probabilistic topic models to group semantically relevant text and audio words. This text-like representation leads to significant improvement in automatic mood classification over conventional audio features.
international conference on acoustics, speech, and signal processing | 2017
Li-Chia Yang; Jen-Yu Liu; Yi-Hsuan Yang; Yi-An Chen
Being able to predict whether a song can be a hit has important applications in the music industry. Although it is true that the popularity of a song can be greatly affected by external factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own. Motivated by the recent success of deep learning techniques, we attempt to extend previous work on hit song prediction by jointly learning the audio features and prediction models using deep learning. Specifically, we experiment with a convolutional neural network model that takes the primitive mel-spectrogram as the input for feature learning, a more advanced JYnet model that uses an external song dataset for supervised pre-training and auto-tagging, and the combination of these two models. We also consider the inception model to characterize audio information in different scales. Our experiments suggest that deep structures are indeed more accurate than shallow structures in predicting the popularity of either Chinese or Western Pop songs in Taiwan. We also use the tags predicted by JYnet to gain insights into the result of different models.