Hung-Yi Lo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hung-Yi Lo is active.

Explore More

Publication

Featured researches published by Hung-Yi Lo.

IEEE Transactions on Multimedia | 2011

Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang; Shou-De Lin

Audio tags correspond to keywords that people use to describe different aspects of a music clip. With the explosive growth of digital music available on the Web, automatic audio tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. This can be achieved by training a binary classifier for each tag based on the labeled music data. Our method that won the MIREX 2009 audio tagging competition is one of this kind of methods. However, since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging problem as a cost-sensitive classification problem. In addition, tag correlation information is useful for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging problem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem and propose two solutions to solve it. The experimental results demonstrate that the new approach outperforms our MIREX 2009 winning method.

IEEE Transactions on Knowledge and Data Engineering | 2014

Generalized k -Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification

Hung-Yi Lo; Shou-De Lin; Hsin-Min Wang

Label powerset (LP) method is one category of multi-label learning algorithm. This paper presents a basis expansions model for multi-label classification, where a basis function is an LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the ground truth. We derive an analytic solution to learn the coefficients efficiently. We further extend this model to handle the cost-sensitive multi-label classification problem, and apply it in social tagging to handle the issue of the noisy training set by treating the tag counts as the misclassification costs. We have conducted experiments on several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. Experimental results on both multi-label classification and cost-sensitive social tagging demonstrate that our method has better performance than other methods.

international conference on multimedia and expo | 2010

Homogeneous segmentation and classifier ensemble for audio tag annotation and retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang

Audio tags describe different types of musical information such as genre, mood, and instrument. This paper aims to automatically annotate audio clips with tags and retrieve relevant clips from a music database by tags. Given an audio clip, we divide it into several homogeneous segments by using an audio novelty curve, and then extract audio features from each segment with respect to various musical information, such as dynamics, rhythm, timbre, pitch, and tonality. The features in frame-based feature vector sequence format are further represented by their mean and standard deviation such that they can be combined with other segment-based features to form a fixed-dimensional feature vector for a segment. We train an ensemble classifier, which consists of SVM and AdaBoost classifiers, for each tag. For the audio annotation task, the individual classifier outputs are transformed into calibrated probability scores such that probability ensemble can be employed. For the audio retrieval task, we propose using ranking ensemble. We participated in the MIREX 2009 audio tag classification task and our system was ranked first in terms of F-measure and the area under the ROC curve given a tag.

international conference on acoustics, speech, and signal processing | 2007

Phonetic Boundary Refinement using Support Vector Machine

Hung-Yi Lo; Hsin-Min Wang

In this paper, we propose using support vector machine (SVM) to refine the hypothesized phone transition boundaries given by the HMM-based Viterbi forced alignment. We conducted experiments on the TIMIT speech corpus. The phone transitions were automatically partitioned into 46 clusters according to their acoustic characteristics and the cross-validation using the training data; hence, 46 phone-transition-dependent SVM classifiers were used for phone boundary refinement. The proposed HMM-SVM approach performs as well as the recent discriminative HMM-based segmentation. The best accuracies achieved are 81.23% within a tolerance of 10 ms and 92.47% within a tolerance of 20 ms. The mean boundary distance is 7.73 ms.

Sigkdd Explorations | 2008

Learning to improve area-under-FROC for imbalanced medical data classification using an ensemble method

Hung-Yi Lo; Chun-Min Chang; Tsung-Hsien Chiang; Cho-Yi Hsiao; Anta Huang; Tsung-Ting Kuo; Wei-Chi Lai; Ming-Han Yang; Jung-Jung Yeh; Chun-Chao Yen; Shou-De Lin

This paper presents our solution for KDD Cup 2008 competition that aims at optimizing the area under ROC for breast cancer detection. We exploited weighted-based classification mechanism to improve the accuracy of patient classification (each patient is represented by a collection of data points). Final predictions for challenge 1 are generated by combining outputs from weighted SVM and AdaBoost; whereas we integrate SVM, AdaBoost, and GA to produce the results for challenge 2. We have also tried location-based classification and model adaptation to add the testing data into training. Our results outperform other participants given the same set of features, and was selected as the joint winner in KDD Cup 2008.

國際電腦音樂與音訊技術暨新媒體研討會論文集 | 2010

Audio Classification Using Semantic Transformation and Classifier Ensemble

Ju-Chiang Wang; Hung-Yi Lo; Shyh-Kang Jeng; Hsin-Min Wang

This paper presents our winning audio classification system in MIREX 2010. Our system is implemented as follows. First, in the training phase, the frame-based 70-dimensional feature vectors are extracted from a training audio clip by MIRToolbox. Next, the Posterior Weighted Bernoulli Mixture Model (PWBMM) is applied to transform the frame-decomposed feature vectors of the training song into a fixed-dimensional semantic vector representation based on the pre-defined music tags; this procedure is called Semantic Transformation. Finally, for each class, the semantic vectors of associated training clips are used to train an ensemble classifier consisting of SVM and AdaBoost classifiers. In the classification phase, a testing audio clip is first represented by a semantic vector, and then the class with the highest score is selected as the final output. Our system was ranked first out of 36 submissions in the MIREX 2010 audio mood classification task.

international conference on acoustics, speech, and signal processing | 2011

Cost-sensitive stacking for audio tag annotation and retrieval

Hung-Yi Lo; Ju-Chiang Wang; Hsin-Min Wang; Shou-De Lin

Audio tags correspond to keywords that people use to describe different aspects of a music clip, such as the genre, mood, and instrumentation. Since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. By treating the tag counts as costs, we can model the audio tagging problem as a cost-sensitive classification problem. In addition, tag correlation is another useful information for automatic audio tagging since some tags often co-occur. By considering the co-occurrences of tags, we can model the audio tagging problem as a multi-label classification problem. To exploit the tag count and correlation information jointly, we formulate the audio tagging task as a novel cost-sensitive multi-label (CSML) learning problem. The results of audio tag annotation and retrieval experiments demonstrate that the new approach outperforms our MIREX 2009 winning method.

international conference on acoustics, speech, and signal processing | 2012

Generalized k-labelset ensemble for multi-label classification

Hung-Yi Lo; Shou-De Lin; Hsin-Min Wang

Label powerset (LP) method is one category of multi-label learning algorithms. It reduces the multi-label classification problem to a multi-class classification problem by treating each distinct combination of labels in the training set as a different class. This paper proposes a basis expansion model for multi-label classification, where a basis function is a LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the multi-label ground truth. We derive an analytic solution to learn the coefficients efficiently. We have conducted experiments using several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. The results show that our method has better or competitive performance against other methods.

conference on multimedia modeling | 2011

Audio tag annotation and retrieval using tag count information

Hung-Yi Lo; Shou-De Lin; Hsin-Min Wang

Audio tags correspond to keywords that people use to describe different aspects of a music clip, such as the genre, mood, and instrumentation. With the explosive growth of digital music available on the Web, automatic audio tagging, which can be used to annotate unknown music or retrieve desirable music, is becoming increasingly important. This can be achieved by training a binary classifier for each tag based on the labeled music data. However, since social tags are usually assigned by people with different levels of musical knowledge, they inevitably contain noisy information. To address the noisy label problem, we propose a novel method that exploits the tag count information. By treating the tag counts as costs, we model the audio tagging problem as a cost-sensitive classification problem. The results of audio tag annotation and retrieval experiments show that the proposed approach outperforms our previous method, which won the MIREX 2009 audio tagging competition.

international conference on multimedia and expo | 2006

Directional Weighting-Based Demosaicking Algorithm for Noisy CFA Environments

Hung-Yi Lo; Tsung-Nan Lin; Chih-Lung Hsu; Cheng-Hsien Lee

Captured CFA data by image sensors like CCD/or CMOS are often corrupted by noises. To produce high quality images acquired by CCD/CMOS digital cameras, the problem of noise needs addressing. In this paper, we propose a novel demosaicking algorithm with the ability to handle noisy CFA data directly. By utilizing the proposed spatial filter which can characterize the similarity likelihood in local structure accurately, the noisy pixel is then filtered depending on the degree of similarity between the current pixel and a weighted average of its neighboring pixels. Therefore the edge information can be preserved without the blurring artifacts while the capacity of noise reduction can be adjusted to the maximum degree in the smooth region. Our algorithm is the first one that can accomplish the demosaicking processing and noise removal simultaneously, which contributes to the reduction of hardware cost since one module can achieve two functions efficiently at the same time

Explore More