Jiangtao Ren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jiangtao Ren is active.

Explore More

Publication

Featured researches published by Jiangtao Ren.

european conference on machine learning | 2008

Actively transfer domain knowledge

Xiaoxiao Shi; Wei Fan; Jiangtao Ren

When labeled examples are not readily available, active learning and transfer learning are separate efforts to obtain labeled examples for inductive learning. Active learning asks domain experts to label a small set of examples, but there is a cost incurred for each answer. While transfer learning could borrow labeled examples from a different domain without incurring any labeling cost, there is no guarantee that the transferred examples will actually help improve the learning accuracy. To solve both problems, we propose a framework to actively transfer the knowledge across domains, and the key intuition is to use the knowledge transferred from other domain as often as possible to help learn the current domain, and query experts only when necessary. To do so, labeled examples from the other domain (out-of-domain) are examined on the basis of their likelihood to correctly label the examples of the current domain (in-domain). When this likelihood is low, these out-of-domain examples will not be used to label the in-domain example, but domain experts are consulted to provide class label. We derive a sampling error bound and a querying bound to demonstrate that the proposed method can effectively mitigate risk of domain difference by transferring domain knowledge only when they are useful, and query domain experts only when necessary. Experimental studies have employed synthetic datasets and two types of real world datasets, including remote sensing and text classification problems. The proposed method is compared with previously proposed transfer learning and active learning methods. Across all comparisons, the proposed approach can evidently outperform the transfer learning model in classification accuracy given different out-of-domain datasets. For example, upon the remote sensing dataset, the proposed approach achieves an accuracy around 94.5%, while the comparable transfer learning model drops to less than 89% in most cases. The software and datasets are available from the authors.

international conference on data mining | 2009

Naive Bayes Classification of Uncertain Data

Jiangtao Ren; Sau Dan Lee; Xianlu Chen; Ben Kao; Reynold Cheng; David W. Cheung

Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification algorithm for uncertain data with a pdf. Our key solution is to extend the class conditional probability estimation in the Bayes model to handle pdf’s. Extensive experiments on UCI datasets show that the accuracy of naive Bayes model can be improved by taking into account the uncertainty information.

knowledge discovery and data mining | 2009

Cross domain distribution adaptation via kernel mapping

Erheng Zhong; Wei Fan; Jing Peng; Kun Zhang; Jiangtao Ren; Deepak S. Turaga; Olivier Verscheure

When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of target-domain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10% higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.

european conference on machine learning | 2009

Relaxed transfer of different classes via spectral partition

Xiaoxiao Shi; Wei Fan; Qiang Yang; Jiangtao Ren

Most existing transfer learning techniques are limited to problems of knowledge transfer across tasks sharing the same set of class labels. In this paper, however, we relax this constraint and propose a spectral-based solution that aims at unveiling the intrinsic structure of the data and generating a partition of the target data, by transferring the eigenspace that well separates the source data. Furthermore, a clustering-based KL divergence is proposed to automatically adjust how much to transfer. We evaluate the proposed model on text and image datasets where class categories of the source and target data are explicitly different, e.g., 3-classes transfer to 2-classes, and show that the proposed approach improves other baselines by an average of 10% in accuracy. The source code and datasets are available from the authors.

international conference on data mining | 2008

Graph-Based Iterative Hybrid Feature Selection

Erheng Zhong; Sihong Xie; Wei Fan; Jiangtao Ren; Jing Peng; Kun Zhang

When the number of labeled examples is limited, traditional supervised feature selection techniques often fail due to sample selection bias or unrepresentative sample problem. To solve this, semi-supervised feature selection techniques exploit the statistical information of both labeled and unlabeled examples in the same time. However, the results of semi-supervised feature selection can be at times unsatisfactory, and the culprit is on how to effectively use the unlabeled data. Quite different from both supervised and semi-supervised feature selection, we propose a ldquohybridrdquoframework based on graph models. We first apply supervised methods to select a small set of most critical features from the labeled data. Importantly, these initial features might otherwise be missed when selection is performed on the labeled and unlabeled examples simultaneously. Next,this initial feature set is expanded and corrected with the use of unlabeled data. We formally analyze why the expected performance of the hybrid framework is better than both supervised and semi-supervised feature selection. Experimental results demonstrate that the proposed method outperforms both traditional supervised and state-of-the-art semi-supervised feature selection algorithms by at least 10% inaccuracy on a number of text and biomedical problems with thousands of features to choose from. Software and dataset is available from the authors.

knowledge discovery and data mining | 2012

Domain transfer dimensionality reduction via discriminant kernel learning

Ming Zeng; Jiangtao Ren

Kernel discriminant analysis (KDA) is a popular technique for discriminative dimensionality reduction in data analysis. But, when a limited number of labeled data is available, it is often hard to extract the required low dimensional representation from a high dimensional feature space. Thus, one expects to improve the performance with the labeled data in other domains. In this paper, we propose a method, referred to as the domain transfer discriminant kernel learning (DTDKL), to find the optimal kernel by using the other labeled data from out-of-domain distribution to carry out discriminant dimensionality reduction. Our method learns a kernel function and discriminative projection by maximizing the Fisher discriminant distance and minimizing the mismatch between the in-domain and out-of-domain distributions simultaneously, by which we may get a better feature space for discriminative dimensionality reduction with cross-domain.

european conference on machine learning | 2010

Efficient and numerically stable sparse learning

Sihong Xie; Wei Fan; Olivier Verscheure; Jiangtao Ren

We consider the problem of numerical stability and model density growth when training a sparse linear model from massive data. We focus on scalable algorithms that optimize certain loss function using gradient descent, with either l0 or l1 regularization. We observed numerical stability problems in several existing methods, leading to divergence and low accuracy. In addition, these methods typically have weak controls over sparsity, such that model density grows faster than necessary. We propose a framework to address the above problems. First, the update rule is numerically stable with convergence guarantee and results in more reasonable models. Second, besides l1 regularization, it exploits the sparsity of data distribution and achieves a higher degree of sparsity with a PAC generalization error bound. Lastly, it is parallelizable and suitable for training large margin classifiers on huge datasets. Experiments show that the proposed method converges consistently and outperforms other baselines using 10% of features by as much as 6% reduction in error rate on average. Datasets and software are available from the authors.

knowledge discovery and data mining | 2008