Bin Tong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bin Tong is active.

Explore More

Publication

Featured researches published by Bin Tong.

Knowledge and Information Systems | 2013

A feature-free and parameter-light multi-task clustering framework

Thach Nguyen Huy; Hao Shao; Bin Tong; Einoshin Suzuki

The two last decades have witnessed extensive research on multi-task learning algorithms in diverse domains such as bioinformatics, text mining, natural language processing as well as image and video content analysis. However, all existing multi-task learning methods require either domain-specific knowledge to extract features or a careful setting of many input parameters. There are many disadvantages associated with prior knowledge requirements for feature extraction or parameter-laden approaches. One of the most obvious problems is that we may find a wrong or non-existent pattern because of poorly extracted features or incorrectly set parameters. In this work, we propose a feature-free and parameter-light multi-task clustering framework to overcome these disadvantages. Our proposal is motivated by the recent successes of Kolmogorov-based methods on various applications. However, such methods are only defined for single-task problems because they lack a mechanism to share knowledge between different tasks. To address this problem, we create a novel dictionary-based compression dissimilarity measure that allows us to share knowledge across different tasks effectively. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.

Knowledge and Information Systems | 2013

Extended MDL principle for feature-based inductive transfer learning

Hao Shao; Bin Tong; Einoshin Suzuki

Transfer learning provides a solution in real applications of how to learn a target task where a large amount of auxiliary data from source domains are given. Despite numerous research studies on this topic, few of them have a solid theoretical framework and are parameter-free. In this paper, we propose an Extended Minimum Description Length Principle (EMDLP) for feature-based inductive transfer learning, in which both the source and the target data sets contain class labels and relevant features are transferred from the source domain to the target one. Unlike conventional methods, our encoding measure is based on a theoretical background and has no parameter. To obtain useful features to be used in the target task, we design an enhanced encoding length by adopting a code book that stores useful information obtained from the source task. With the code book that builds connections between the source and the target tasks, our EMDLP is able to evaluate the inferiority of the results of transfer learning with the add sum of the code lengths of five components: those of the corresponding two hypotheses, the two data sets with the help of the hypotheses, and the set of the transferred features. The proposed method inherits the nice property of the MDLP that elaborately evaluates the hypotheses and balances the simplicity of the hypotheses and the goodness-of-the-fit to the data. Extensive experiments using both synthetic and real data sets show that the proposed method provides a better performance in terms of the classification accuracy and is robust against noise.

advanced data mining and applications | 2012

Query by Committee in a Heterogeneous Environment

Hao Shao; Bin Tong; Einoshin Suzuki

In real applications of inductive learning, labeled instances are often deficient. The countermeasure is either to ask experts to label informative instances in active learning, or to borrow useful information from abundant labeled instances in the source domain in transfer learning. Due to the high cost of querying experts, it is promising to integrate the two methodologies into a more robust and reliable classification framework to compensate the disadvantages of both methods. Recently, a few research studies have been investigated to integrate the two methods together, which is called transfer active learning. However, when there exist unrelated domains which have different distributions or label assignments, an inevitable problem named negative transfer will happen which leads to degenerated performance. Also, how to avoid selecting unconcerned samples to query is still an open question. To tackle these issues, we propose a hybrid algorithm for active learning with the help of transfer learning by adopting a divergence measure to measure the similarities between different domains, so that the negative effects can be alleviated. To avoid querying irrelevant instances, we also present an adaptive strategy that is able to eliminate unnecessary instances in the input space and models in the model space. Extensive experiments on both synthetic and real data sets show that our algorithm is able to query less instances and converges faster than the state-of-the-art methods.

international syposium on methodologies for intelligent systems | 2011

A compression-based dissimilarity measure for multi-task clustering

Nguyen Huy Thach; Hao Shao; Bin Tong; Einoshin Suzuki

Virtually all existing multi-task learning methods for string data require either domain specific knowledge to extract feature representations or a careful setting of many input parameters. In this work, we propose a feature-free and parameter-light multi-task clustering algorithm for string data. To transfer knowledge between different domains, a novel dictionary-based compression dissimilarity measure is proposed. Experimental results with extensive comparisons demonstrate the generality and the effectiveness of our proposal.

european conference on machine learning | 2011

Compact coding for hyperplane classifiers in heterogeneous environment

Hao Shao; Bin Tong; Einoshin Suzuki

Transfer learning techniques have witnessed a significant development in real applications where the knowledge from previous tasks are required to reduce the high cost of inquiring the labeled information for the target task. However, how to avoid negative transfer which happens due to different distributions of tasks in heterogeneous environment is still a open problem. In order to handle this kind of issue, we propose a Compact Coding method for Hyperplane Classifiers (CCHC) under a two-level framework in inductive transfer learning setting. Unlike traditional methods, we measure the similarities among tasks from the macro level perspective through minimum encoding. Particularly speaking, the degree of the similarity is represented by the relevant code length of the class boundary of each source task with respect to the target task. In addition, informative parts of the source tasks are adaptively selected in the micro level viewpoint to make the choice of the specific source task more accurate. Extensive experiments show the effectiveness of our algorithm in terms of the classification accuracy in both UCI and text data sets.

intelligent information systems | 2012

Linear semi-supervised projection clustering by transferred centroid regularization

Bin Tong; Hao Shao; Bin Hui Chou; Einoshin Suzuki

We propose a novel method, called Semi-supervised Projection Clustering in Transfer Learning (SPCTL), where multiple source domains and one target domain are assumed. Traditional semi-supervised projection clustering methods hold the assumption that the data and pairwise constraints are all drawn from the same domain. However, many related data sets with different distributions are available in real applications. The traditional methods thus can not be directly extended to such a scenario. One major challenging issue is how to exploit constraint knowledge from multiple source domains and transfer it to the target domain where all the data are unlabeled. To handle this difficulty, we are motivated to construct a common subspace where the difference in distributions among domains can be reduced. We also invent a transferred centroid regularization, which acts as a bridge to transfer the constraint knowledge to the target domain, to formulate this geometric structure formed by the centroids from different domains. Extensive experiments on both synthetic and benchmark data sets show the effectiveness of our method.

european conference on machine learning | 2010

Semi-supervised projection clustering with transferred centroid regularization

Bin Tong; Hao Shao; Bin-Hui Chou; Einoshin Suzuki

We propose a novel method, called Semi-supervised Projection Clustering in Transfer Learning (SPCTL), where multiple source domains and one target domain are assumed. Traditional semi-supervised projection clustering methods hold the assumption that the data and pairwise constraints are all drawn from the same domain. However, many related data sets with different distributions are available in real applications. The traditional methods thus can not be directly extended to such a scenario. One major challenging issue is how to exploit constraint knowledge from multiple source domains and transfer it to the target domain where all the data are unlabeled. To handle this difficulty, we are motivated to construct a common subspace where the difference in distributions among domains can be reduced. We also invent a transferred centroid regularization, which acts as a bridge to transfer the constraint knowledge to the target domain, to formulate this geometric structure formed by the centroids from different domains. Extensive experiments on both synthetic and practical data sets show the effectiveness of our method.

Journal of Intelligent Information Systems | 2013

Transfer learning by centroid pivoted mapping in noisy environment

Thach Nguyen Huy; Bin Tong; Hao Shao; Einoshin Suzuki

Transfer learning is a widely investigated learning paradigm that is initially proposed to reuse informative knowledge from related domains, as supervised information in the target domain is scarce while it is sufficiently available in the multiple source domains. One of the challenging issues in transfer learning is how to handle the distribution differences between the source domains and the target domain. Most studies in the research field implicitly assume that data distributions from the source domains and the target domain are similar in a well-designed feature space. However, it is often the case that label assignments for data in the source domains and the target domain are significantly different. Therefore, in reality even if the distribution difference between a source domain and a target domain is reduced, the knowledge from multiple source domains is not well transferred to the target domain unless the label information is carefully considered. In addition, noisy data often emerge in real world applications. Therefore, considering how to handle noisy data in the transfer learning setting is a challenging problem, as noisy data inevitably cause a side effect during the knowledge transfer. Due to the above reasons, in this paper, we are motivated to propose a robust framework against noise in the transfer learning setting. We also explicitly consider the difference in data distributions and label assignments among multiple source domains and the target domain. Experimental results on one synthetic data set, three UCI data sets and one real world text data set in different noise levels demonstrate the effectiveness of our method.

knowledge discovery and data mining | 2010

Subclass-Oriented dimension reduction with constraint transformation and manifold regularization

Bin Tong; Einoshin Suzuki

We propose a new method, called Subclass-oriented Dimension Reduction with Pairwise Constraints (SODRPaC), for dimension reduction on high dimensional data Current linear semi-supervised dimension reduction methods using pairwise constraints, e.g., must-link constraints and cannot-link constraints, can not handle appropriately the data of multiple subclasses where the points of a class are separately distributed in different groups To illustrate this problem, we particularly classify the must-link constraint into two categories, which are the inter-subclass must-link constraint and the intra-subclass must-link constraint, respectively We argue that handling the inter-subclass must-link constraint is challenging for current discriminant criteria Inspired by the above observation and the cluster assumption that nearby points are possible in the same class, we carefully transform must-link constraints into cannot-link constraints, and then propose a new discriminant criterion by employing the cannot-link constraints and the compactness of shared nearest neighbors For the reason that the local data structure is one of the most significant features for the data of multiple subclasses, manifold regularization is also incorporated in our dimension reduction framework Extensive experiments on both synthetic and practical data sets illustrate the effectiveness of our method.

european conference on artificial intelligence | 2014

Probabilistic two-level anomaly detection for correlated systems

Bin Tong; Tetsuro Morimura; Einoshin Suzuki; Tsuyoshi Idé

We propose a novel probabilistic semi-supervised anomaly detection framework for multi-dimensional systems with high correlation among variables. Our method is able to identify both abnormal instances and abnormal variables of an instance.

Explore More