Buyue Qian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Buyue Qian is active.

Explore More

Publication

Featured researches published by Buyue Qian.

Data Mining and Knowledge Discovery | 2014

On constrained spectral clustering and its applications

Xiang Wang; Buyue Qian; Ian Davidson

Constrained clustering has been well-studied for algorithms such as K-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link (ML) and Cannot-Link (CL) constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in ML and CL constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.

Data Mining and Knowledge Discovery | 2015

A relative similarity based method for interactive patient risk prediction

Buyue Qian; Xiang Wang; Nan Cao; Hongfei Li; Yu-Gang Jiang

This paper investigates the patient risk prediction problem in the context of active learning with relative similarities. Active learning has been extensively studied and successfully applied to solve real problems. The typical setting of active learning methods is to query absolute questions. In a medical application where the goal is to predict the risk of patients on certain disease using Electronic Health Records (EHR), the absolute questions take the form of “Will this patient suffer from Alzheimer’s later in his/her life?”, or “Are these two patients similar or not?”. Due to the excessive requirements of domain knowledge, such absolute questions are usually difficult to answer, even for experienced medical experts. In addition, the performance of absolute question focused active learning methods is less stable, since incorrect answers often occur which can be detrimental to the risk prediction model. In this paper, alternatively, we focus on designing relative questions that can be easily answered by domain experts. The proposed relative queries take the form of “Is patient A or patient B more similar to patient C?”, which can be answered by medical experts with more confidence. These questions poll relative information as opposed to absolute information, and even can be answered by non-experts in some cases. In this paper we propose an interactive patient risk prediction method, which actively queries medical experts with the relative similarity of patients. We explore our method on both benchmark and real clinic datasets, and make several interesting discoveries including that querying relative similarities is effective in patient risk prediction, and sometimes can even yield better prediction accuracy than asking for absolute questions.

knowledge discovery and data mining | 2014

Clinical risk prediction with multilinear sparse logistic regression

Fei Wang; Ping Zhang; Buyue Qian; Xiang Wang; Ian Davidson

Logistic regression is one core predictive modeling technique that has been used extensively in health and biomedical problems. Recently a lot of research has been focusing on enforcing sparsity on the learned model to enhance its effectiveness and interpretability, which results in sparse logistic regression model. However, no matter the original or sparse logistic regression, they require the inputs to be in vector form. This limits the applicability of logistic regression in the problems when the data cannot be naturally represented vectors (e.g., functional magnetic resonance imaging and electroencephalography signals). To handle the cases when the data are in the form of multi-dimensional arrays, we propose MulSLR: Multilinear Sparse Logistic Regression. MulSLR can be viewed as a high order extension of sparse logistic regression. Instead of solving one classification vector as in conventional logistic regression, we solve for K classification vectors in MulSLR (K is the number of modes in the data). We propose a block proximal descent approach to solve the problem and prove its convergence. The convergence rate of the proposed algorithm is also analyzed. Finally we validate the efficiency and effectiveness of MulSLR on predicting the onset risk of patients with Alzheimers disease and heart failure.

IEEE Transactions on Knowledge and Data Engineering | 2015

A Reconstruction Error Based Framework for Multi-Label and Multi-View Learning

Buyue Qian; Xiang Wang; Jieping Ye; Ian Davidson

A significant challenge to make learning techniques more suitable for general purpose use is to move beyond i) complete supervision, ii) low dimensional data, iii) a single label and single view per instance. Solving these challenges allows working with complex learning problems that are typically high dimensional with multiple (but possibly incomplete) labelings and views. While other work has addressed each of these problems separately, in this paper we show how to address them together, namely semi-supervised dimension reduction for multi-label and multi-view learning (SSDR-MML), which performs optimization for dimension reduction and label inference in semi-supervised setting. The proposed framework is designed to handle both multi-label and multi-view learning settings, and can be easily extended to many useful applications. Our formulation has a number of advantages. We explicitly model the information combining mechanism as a data structure (a weight/nearest-neighbor matrix) which allows investigating fundamental questions in multi-label and multi-view learning. We address one such question by presenting a general measure to quantify the success of simultaneous learning of multiple labels or views. We empirically demonstrate the usefulness of our SSDR-MML approach, and show that it can outperform many state-of-the-art baseline methods.

international conference on data mining | 2013

Fast Pairwise Query Selection for Large-Scale Active Learning to Rank

Buyue Qian; Xiang Wang; Jun Wang; Hongfei Li; Nan Cao; Weifeng Zhi; Ian Davidson

Pair wise learning to rank algorithms (such as Rank SVM) teach a machine how to rank objects given a collection of ordered object pairs. However, their accuracy is highly dependent on the abundance of training data. To address this limitation and reduce annotation efforts, the framework of active pair wise learning to rank was introduced recently. However, in such a framework the number of possible query pairs increases quadratic ally with the number of instances. In this work, we present the first scalable pair wise query selection method using a layered (two-step) hashing framework. The first step relevance hashing aims to retrieve the strongly relevant or highly ranked points, and the second step uncertainty hashing is used to nominate pairs whose ranking is uncertain. The proposed framework aims to efficiently reduce the search space of pair wise queries and can be used with any pair wise learning to rank algorithm with a linear ranking function. We evaluate our approach on large-scale real problems and show it has comparable performance to exhaustive search. The experimental results demonstrate the effectiveness of our approach, and validate the efficiency of hashing in accelerating the search of massive pair wise queries.

conference on information and knowledge management | 2012

Improving document clustering using automated machine translation

Xiang Wang; Buyue Qian; Ian Davidson

With the development of statistical machine translation, we have ready-to-use tools that can translate documents from one language to many other languages. These translations provide different yet correlated views of the same set of documents. This gives rise to an intriguing question: can we use the extra information to achieve a better clustering of the documents? Some recent work on multiview clustering provided positive answers to this question. In this work, we propose an alternative approach to address this problem using the constrained clustering framework. Unlike traditional Must-Link and Cannot-Link constraints, the constraints generated from machine translation are dense yet noisy. We show how to incorporate this type of constraints by presenting two algorithms, one parametric and one non-parametric. Our algorithms are easy to implement, efficient, and can consistently improve the clustering of real data, namely the Reuters RCV1/RCV2 Multilingual Dataset. In contrast to existing multiview clustering algorithms, our technique does not need the compatibility or the conditional independence assumption, nor does it involve subtle parameter tuning.

international conference on big data | 2013

Alarm prediction in large-scale sensor networks — A case study in railroad

Hongfei Li; Buyue Qian; Dhaivat P. Parikh; Arun Hampapur

Sensor network is broadly used across industries to monitor equipment conditions. Huge volume of information collected from a large set of sensors poses great challenges to make inferences for prediction of significant events. We collaborate with a US Class I railway company and apply advanced analytics techniques to be able to predict alarms associated with catastrophic equipment failures several days ahead of time. We use the case study in railroad to demonstrate the techniques to address the big data concerns. In addition, the alarm-prediction rule development needs to satisfy the critical constraints to meet the high standards of prediction accuracy combined with human interpretability in railroad industry. We build customized SVM algorithm to meet the requirements. By adding a unique and remarkable feature of human interpretability to the rules we develop, our solution is able to facilitate the decision making process of operators and lead to efficient operational decision support.

international conference on data mining | 2012

Labels vs. Pairwise Constraints: A Unified View of Label Propagation and Constrained Spectral Clustering

Xiang Wang; Buyue Qian; Ian Davidson

In many real-world applications we can model the data as a graph with each node being an instance and the edges indicating a degree of similarity. Side information is often available in the form of labels for a small subset of instances, which gives rise to two problem settings and two types of algorithms. In the label propagation style algorithms, the known labels are propagated to the unlabeled nodes. In the constrained clustering style algorithms, known labels are first converted to pair wise constraints (Must-Link and Cannot-Link), then a constrained cut is computed as a tradeoff between minimizing the cut cost and maximizing the constraint satisfaction. Both techniques are evaluated by their ability to recover the ground truth labeling, i.e. by 0/1 loss function either directly on the labels or on the pair wise relations derived from the labels. These two fields have developed separately, but in this paper, we show that they are indeed related. This insight allows us to propose a novel way to generate constraints from the propagated labels, which our empirical study shows outperforms and is more stable than the state-of-the-art label propagation and constrained spectral clustering algorithms.

IEEE Transactions on Image Processing | 2014

Learning multiple relative attributes with humans in the loop.

Buyue Qian; Xiang Wang; Nan Cao; Yu-Gang Jiang; Ian Davidson

Semantic attributes have been recognized as a more spontaneous manner to describe and annotate image content. It is widely accepted that image annotation using semantic attributes is a significant improvement to the traditional binary or multiclass annotation due to its naturally continuous and relative properties. Though useful, existing approaches rely on an abundant supervision and high-quality training data, which limit their applicability. Two standard methods to overcome small amounts of guidance and low-quality training data are transfer and active learning. In the context of relative attributes, this would entail learning multiple relative attributes simultaneously and actively querying a human for additional information. This paper addresses the two main limitations in existing work: 1) it actively adds humans to the learning loop so that minimal additional guidance can be given and 2) it learns multiple relative attributes simultaneously and thereby leverages dependence amongst them. In this paper, we formulate a joint active learning to rank framework with pairwise supervision to achieve these two aims, which also has other benefits such as the ability to be kernelized. The proposed framework optimizes over a set of ranking functions (measuring the strength of the presence of attributes) simultaneously and dependently on each other. The proposed pairwise queries take the form of which one of these two pictures is more natural? These queries can be easily answered by humans. Extensive empirical study on real image data sets shows that our proposed method, compared with several state-of-the-art methods, achieves superior retrieval performance while requires significantly less human inputs.

international conference on data mining | 2016

Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding

Zihao Zhu; Changchang Yin; Buyue Qian; Yu Cheng; Jishang Wei; Fei Wang

Evaluating the clinical similarities between pairwisepatients is a fundamental problem in healthcare informatics. Aproper patient similarity measure enables various downstreamapplications, such as cohort study and treatment comparative effectiveness research. One major carrier for conductingpatient similarity research is the Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, andsparse. Though existing studies on learning patient similarityfrom EHRs have shown being useful in solving real clinicalproblems, their applicability is limited due to the lack of medicalinterpretations. Moreover, most previous methods assume avector based representation for patients, which typically requiresaggregation of medical events over a certain time period. As aconsequence, the temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based ontemporal matching of longitudinal patient EHRs. Two efficientmethods are presented, unsupervised and supervised, both ofwhich preserve the temporal properties in EHRs. The supervisedscheme takes a convolutional neural network architecture, andlearns an optimal representation of patient clinical recordswith medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement overthe baselines.

Explore More