Qi Mao
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qi Mao.
Nature Methods | 2017
Xiaojie Qiu; Qi Mao; Ying Tang; Li Wang; Raghav Chawla; Hannah A. Pliner; Cole Trapnell
Single-cell trajectories can unveil how gene regulation governs cell fate decisions. However, learning the structure of complex trajectories with multiple branches remains a challenging computational problem. We present Monocle 2, an algorithm that uses reversed graph embedding to describe multiple fate decisions in a fully unsupervised manner. We applied Monocle 2 to two studies of blood development and found that mutations in the genes encoding key lineage transcription factors divert cells to alternative fates.
IEEE Transactions on Image Processing | 2013
Qi Mao; Ivor W. Tsang; Shenghua Gao
Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four image annotation datasets.
Pattern Recognition Letters | 2015
Jin Yao; Qi Mao; Steve Goodison; Volker Mai; Yijun Sun
Feature selection method for unsupervised learning inspired by human learning.Detected features supporting complex structure not limited to clusters.Automatic parameter estimation alleviating the burden of manually tuning parameters.A scheme to assess the statistical significance of discovered data patterns. We consider the problem of feature selection for unsupervised learning and develop a new algorithm capable of identifying informative features supporting complex structures embedded in a high-dimensional space. The development of the algorithm is inspired by human learning in detecting complex data structures. We formulate it as an optimization problem with a well-defined objective function, and solve the problem by using an iterative approach. The algorithm can be easily implemented and is computationally very efficient. We use gap statistics to estimate the parameters so that the proposed method is completely parameter-free. We also develop a scheme based on permutation tests to estimate the statistical significance of the presence of a data structure. We demonstrate the effectiveness and versatility of the algorithm by comparing it with seven existing methods on a set of synthetic datasets with a wide variety of structures and cancer microarray gene expression datasets.
bioRxiv | 2017
Xiaojie Qiu; Qi Mao; Ying Tang; Li Wang; Raghav Chawla; Hannah A. Pliner; Cole Trapnell
Organizing single cells along a developmental trajectory has emerged as a powerful tool for understanding how gene regulation governs cell fate decisions. However, learning the structure of complex single-cell trajectories with two or more branches remains a challenging computational problem. We present Monocle 2, which uses reversed graph embedding to reconstruct single-cell trajectories in a fully unsupervised manner. Monocle 2 learns an explicit principal graph to describe the data, greatly improving the robustness and accuracy of its trajectories compared to other algorithms. Monocle 2 uncovered a new, alternative cell fate in what we previously reported to be a linear trajectory for differentiating myoblasts. We also reconstruct branched trajectories for two studies of blood development, and show that loss of function mutations in key lineage transcription factors diverts cells to alternative branches on the a trajectory. Monocle 2 is thus a powerful tool for analyzing cell fate decisions with single-cell genomics.
knowledge discovery and data mining | 2015
Qi Mao; Li Wang; Steve Goodison; Yijun Sun
We present a new dimensionality reduction setting for a large family of real-world problems. Unlike traditional methods, the new setting aims to explicitly represent and learn an intrinsic structure from data in a high-dimensional space, which can greatly facilitate data visualization and scientific discovery in downstream analysis. We propose a new dimensionality-reduction framework that involves the learning of a mapping function that projects data points in the original high-dimensional space to latent points in a low-dimensional space that are then used directly to construct a graph. Local geometric information of the projected data is naturally captured by the constructed graph. As a showcase, we develop a new method to obtain a discriminative and compact feature representation for clustering problems. In contrast to assumptions used in traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. Extensive experiments are performed that demonstrate that the proposed method is able to obtain discriminative feature representations yielding superior clustering performance, and correctly recover the intrinsic structures of various real-world datasets including curves, hierarchies and a cancer progression path.
IEEE Transactions on Neural Networks | 2015
Qi Mao; Ivor W. Tsang; Shenghua Gao; Li Wang
Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.
international conference on data mining | 2011
Qi Mao; Ivor W. Tsang
Feature selection with specific multivariate performance measures is the key to the success of many applications, such as information retrieval and bioinformatics. The existing feature selection methods are usually designed for classification error. In this paper, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. Extensive experiments on large-scale and high-dimensional real world datasets show that the proposed method outperforms
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017
Qi Mao; Li Wang; Ivor W. Tsang; Yijun Sun
l_1
international conference on data mining | 2012
Chun-Wei Seah; Ivor W. Tsang; Yew-Soon Ong; Qi Mao
-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVM
PLOS Computational Biology | 2017
Yunpeng Cai; Wei Zheng; Jin Yao; Yujie Yang; Volker Mai; Qi Mao; Yijun Sun
^{perf}