Is this you? Create Your Porfile

Sheng-Jun Huang

Nanjing University of Aeronautics and Astronautics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sheng-Jun Huang is active.

Explore More

Publication

Featured researches published by Sheng-Jun Huang.

Artificial Intelligence | 2012

Multi-instance multi-label learning

Zhi-Hua Zhou; Min-Ling Zhang; Sheng-Jun Huang; Yu-Feng Li

In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Active Learning by Querying Informative and Representative Examples

Sheng-Jun Huang; Rong Jin; Zhi-Hua Zhou

Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.

knowledge discovery and data mining | 2012

Multi-label hypothesis reuse

Sheng-Jun Huang; Yang Yu; Zhi-Hua Zhou

Multi-label learning arises in many real-world tasks where an object is naturally associated with multiple concepts. It is well-accepted that, in order to achieve a good performance, the relationship among labels should be exploited. Most existing approaches require the label relationship as prior knowledge, or exploit by counting the label co-occurrence. In this paper, we propose the MAHR approach, which is able to automatically discover and exploit label relationship. Our basic idea is that, if two labels are related, the hypothesis generated for one label can be helpful for the other label. MAHR implements the idea as a boosting approach with a hypothesis reuse mechanism. In each boosting round, the base learner for a label is generated by not only learning on its own task but also reusing the hypotheses from other labels, and the amount of reuse across labels provides an estimate of the label relationship. Extensive experimental results validate that MAHR is able to achieve superior performance and discover reasonable label relationship. Moreover, we disclose that the label relationship is usually asymmetric.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014

Genome-wide protein function prediction through multi-instance multi-label learning

Jian-Sheng Wu; Sheng-Jun Huang; Zhi-Hua Zhou

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the vast majority of proteins can only be annotated computationally. Nature often brings several domains together to form multi-domain and multi-functional proteins with a vast number of possibilities, and each domain may fulfill its own function independently or in a concerted manner with its neighbors. Thus, it is evident that the protein function prediction problem is naturally and inherently Multi-Instance Multi-Label (MIML) learning tasks. Based on the state-of-the-art MIML algorithm MIMLNN, we propose a novel ensemble MIML learning framework EnMIMLNN and design three algorithms for this task by combining the advantage of three kinds of Hausdorff distance metrics. Experiments on seven real-world organisms covering the biological three-domain system, i.e., archaea, bacteria, and eukaryote, show that the EnMIMLNN algorithms are superior to most state-of-the-art MIML and Multi-Label learning algorithms.

international conference on data mining | 2013

Active Query Driven by Uncertainty and Diversity for Incremental Multi-label Learning

Sheng-Jun Huang; Zhi-Hua Zhou

In multi-label learning, it is rather expensive to label instances since they are simultaneously associated with multiple labels. Therefore, active learning, which reduces the labeling cost by actively querying the labels of the most valuable data, becomes particularly important for multi-label learning. A strong multi-label active learning algorithm usually consists of two crucial elements: a reasonable criterion to evaluate the gain of queried label, and an effective classification model, based on whose prediction the criterion can be accurately computed. In this paper, we first introduce an effective multi-label classification model by combining label ranking with threshold learning, which is incrementally trained to avoid retraining from scratch after every query. Based on this model, we then propose to exploit both uncertainty and diversity in the instance space as well as the label space, and actively query the instance-label pairs which can improve the classification model most. Experimental results demonstrate the superiority of the proposed approach to state-of-the-art methods.

Frontiers of Computer Science in China | 2016

Multi-label active learning by model guided distribution matching

Nengneng Gao; Sheng-Jun Huang; Songcan Chen

Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks. In contrast with traditional single-label learning, the cost of labeling a multi-label example is rather high, thus it becomes an important task to train an effectivemulti-label learning model with as few labeled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is the most important approach to reduce labeling cost. In this paper, we propose a novel approach MADM for batch mode multi-label active learning. On one hand, MADM exploits representativeness and diversity in both the feature and label space by matching the distribution between labeled and unlabeled data. On the other hand, it tends to query predicted positive instances, which are expected to be more informative than negative ones. Experiments on benchmark datasets demonstrate that the proposed approach can reduce the labeling cost significantly.

IEEE Transactions on Systems, Man, and Cybernetics | 2018

WoCE: A framework for Clustering Ensemble by Exploiting the Wisdom of Crowds Theory

Muhammad Yousefnezhad; Sheng-Jun Huang; Daoqiang Zhang

The wisdom of crowds (WOCs), as a theory in the social science, gets a new paradigm in computer science. The WOC theory explains that the aggregate decision made by a group is often better than those of its individual members if specific conditions are satisfied. This paper presents a novel framework for unsupervised and semisupervised cluster ensemble by exploiting the WOC theory. We employ four conditions in the WOC theory, i.e., diversity, independency, decentralization, and aggregation, to guide both constructing of individual clustering results and final combination for clustering ensemble. First, independency criterion, as a novel mapping system on the raw data set, removes the correlation between features on our proposed method. Then, decentralization as a novel mechanism generates high quality individual clustering results. Next, uniformity as a new diversity metric evaluates the generated clustering results. Further, weighted evidence accumulation clustering method is proposed for the final aggregation without using thresholding procedure. Experimental study on varied data sets demonstrates that the proposed approach achieves superior performance to state-of-the-art methods.

international joint conference on artificial intelligence | 2017

Multi-instance multi-label active learning

Sheng-Jun Huang; Nengneng Gao; Songcan Chen

Multi-instance multi-label learning (MIML) has achieved success in various applications, especially those involving complicated learning objects. Along with the enhancing of expressive power, the cost of annotating a MIML example also increases significantly. In this paper, we propose a novel active learning approach to reduce the labeling cost of MIML. The approach actively query the most valuable information by exploiting diversity and uncertainty in both the input and output spaces. It designs a novel query strategy for MIML objects specifically and acquires more precise information from the oracle without additional cost. Based on the queried information, the MIML model is then effectively trained by simultaneously optimizing the relevance rank among instances and labels. Experiments on benchmark datasets demonstrate that the proposed approach achieves superior performance on various criteria.

international joint conference on artificial intelligence | 2017

Cost-Effective Active Learning from Diverse Labelers

Sheng-Jun Huang; Jia-Lve Chen; Xin Mu; Zhi-Hua Zhou

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost of a labeler is proportional to its overall labeling quality, we also observe that different labelers usually have diverse expertise, and thus it is likely that labelers with a low overall quality can provide accurate labels on some specific instances. Based on this fact, we propose a novel active selection criterion to evaluate the cost-effectiveness of instance-labeler pairs, which ensures that the selected instance is helpful for improving the classification model, and meanwhile the selected labeler can provide an accurate label for the instance with a relative low cost. Experiments on both UCI and real crowdsourcing data sets demonstrate the superiority of our proposed approach on selecting cost-effective queries.

knowledge discovery and data mining | 2018

Active Feature Acquisition with Supervised Matrix Completion

Sheng-Jun Huang; Miao Xu; Ming-Kun Xie; Masashi Sugiyama; Gang Niu; Songcan Chen

Feature missing is a serious problem in many applications, which may lead to low quality of training data and further significantly degrade the learning performance. While feature acquisition usually involves special devices or complex process, it is expensive to acquire all feature values for the whole dataset. On the other hand, features may be correlated with each other, and some values may be recovered from the others. It is thus important to decide which features are most informative for recovering the other features as well as improving the learning performance. In this paper, we try to train an effective classification model with least acquisition cost by jointly performing active feature querying and supervised matrix completion. When completing the feature matrix, a novel objective function is proposed to simultaneously minimize the reconstruction error on observed entries and the supervised loss on training data. When querying the feature value, the most uncertain entry is actively selected based on the variance of previous iterations. In addition, a bi-objective optimization method is presented for cost-aware active selection when features bear different acquisition costs. The effectiveness of the proposed approach is well validated by both theoretical analysis and experimental study.

Explore More