Qiong Fang
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qiong Fang.
international conference on data mining | 2012
Qiong Fang; Wilfred Ng; Jianlin Feng; Yuliang Li
The Order-Preserving SubMatrices (OPSMs) are employed to discover significant biological associations between genes and experiment conditions. Herein, we propose a new relaxed OPSM model by considering the linearity relaxation, which is called the Bucket OPSM (BOPSM) model. An efficient method called ApriBopsm is developed to exhaustively mine such BOPSM patterns. We further generalize the BOPSM model by incorporating the similarity relaxation strategy. We develop a generalized BOPSM model called GeBOPSM and adopt a pattern growing method called SeedGrowth to mine GeBOPSM patterns. Informally, the SeedGrowth algorithm adopts two different growing strategies on rows and columns in order to expand a seed BOPSM into a maximal GeBOPSM pattern. We conduct a series of experiments using both synthetic and biological datasets to study the effectiveness of our proposed relaxed models and the efficiency of the relevant mining methods. The BOPSM model is shown to be able to capture the characteristics of noisy OPSM patterns, and is superior to the strict counterparts. ApriBopsm is also significantly more efficient than OPC-Tree, which is the state-of-the-art OPSM mining method. Compared to all the current relaxed OPSM models, the GeBOPSM model achieves the best performance in terms of the number of mined quality patterns.
database systems for advanced applications | 2013
Yuanfeng Song; Kenneth Wai-Ting Leung; Qiong Fang; Wilfred Ng
Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as document retrieval and recommendation systems. Extensive studies in this area have focused on developing efficient ranking models. While ranking models are usually trained based on given training datasets, besides model training algorithms, the quality of the document features selected for model training also plays a very important aspect on the model performance. The main objective of this paper is to present an approach to discover “significant” document features for learning to rank (LTR) problem. We conduct a systematic exploration of frequent pattern-based ranking. First, we formally analyze the effectiveness of frequent patterns for ranking. Combined features, which constitute a large portion of frequent patterns, perform better than single features in terms of capturing rich underlying semantics of the documents and hence provide good feature candidates for ranking. Based on our analysis, we propose a new ranking approach called FP-Rank. Essentially, FP-Rank adopts frequent pattern mining algorithms to mine frequent patterns, and then a new pattern selection algorithm is adopted to select a set of patterns with high overall significance and low redundancy. Our experiments on the real datasets confirm that, by incorporating effective frequent patterns to train a ranking model, such as RankSVM, the performance of the ranking model can be substantially improved.
knowledge discovery and data mining | 2010
Qiong Fang; Wilfred Ng; Jianlin Feng
Mining order-preserving submatrix (OPSM) patterns has received much attention from researchers, since in many scientific applications, such as those involving gene expression data, it is natural to express the data in a matrix and also important to find the order-preserving submatrix patterns. However, most current work assumes the noise-free OPSM model and thus is not practical in many real situations when sample contamination exists. In this paper, we propose a relaxed OPSM model called ROPSM. The ROPSM model supports mining more reasonable noise-corrupted OPSM patterns than another well-known model called AOPC (approximate order-preserving cluster). While OPSM mining is known to be an NP-hard problem, mining ROPSM patterns is even a harder problem. We propose a novel method called ROPSM-Growth to mine ROPSM patterns. Specifically, two pattern growing strategies, such as column-centric strategy and row-centric strategy, are presented, which are effective to grow the seed OPSMs into significant ROPSMs. An effective median-rank based method is also developed to discover the underlying true order of conditions involved in an ROPSM pattern. Our experiments on a biological dataset show that the ROPSM model better captures the characteristics of noise in gene expression data matrix compared to the AOPC model. Importantly, we find that our approach is able to detect more quality biologically significant patterns with comparable efficiency with the counterparts of AOPC. Specifically, at least 26.6% (75 out of 282) of the patterns mined by our approach are strongly associated with more than 10 gene categories (high biological significance), which is 3 times better than that obtained from using the AOPC approach.
Knowledge and Information Systems | 2015
Yuanfeng Song; Wilfred Ng; Kenneth Wai-Ting Leung; Qiong Fang
Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.
ACM Transactions on Database Systems | 2014
Qiong Fang; Wilfred Ng; Jianlin Feng; Yuliang Li
Order-preserving submatrices (OPSMs) capture consensus trends over columns shared by rows in a data matrix. Mining OPSM patterns discovers important and interesting local correlations in many real applications, such as those involving biological data or sensor data. The prevalence of uncertain data in various applications, however, poses new challenges for OPSM mining, since data uncertainty must be incorporated into OPSM modeling and the algorithmic aspects. In this article, we define new probabilistic matrix representations to model uncertain data with continuous distributions. A novel probabilistic order-preserving submatrix (POPSM) model is formalized in order to capture similar local correlations in probabilistic matrices. The POPSM model adopts a new probabilistic support measure that evaluates the extent to which a row belongs to a POPSM pattern. Due to the intrinsic high computational complexity of the POPSM mining problem, we utilize the anti-monotonic property of the probabilistic support measure and propose an efficient Apriori-based mining framework called ProbApri to mine POPSM patterns. The framework consists of two mining methods, UniApri and NormApri, which are developed for mining POPSM patterns, respectively, from two representative types of probabilistic matrices, the UniDist matrix (assuming uniform data distributions) and the NormDist matrix (assuming normal data distributions). We show that the NormApri method is practical enough for mining POPSM patterns from probabilistic matrices that model more general data distributions. We demonstrate the superiority of our approach by two applications. First, we use two biological datasets to illustrate that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes and greatly promotes the discovery of patterns with high biological significance. Our result is significantly better than the counterpart OPSMRM (OPSM with repeated measurement) model which adopts a set-valued matrix representation to capture data uncertainty. Second, we run the experiments on an RFID trace dataset and show that our POPSM model is effective and efficient in capturing the common visiting subroutes among users.
international conference on data mining | 2011
Qiong Fang; Jianlin Feng; Wilfred Ng
Identifying differentially expressed genes is an important problem in gene expression analysis, since these genes, exhibiting sufficiently different expression levels under distinct experiment conditions, could be critical for tracing the progression of a disease. In a micro array study, genes are usually sorted in terms of their differentiation abilities with the more differentially expressed genes being ranked higher in the list. As more micro array studies are conducted, rank aggregation becomes an important means to combine such ranked gene lists in order to discover more reliable differentially expressed genes. In this paper, we study a novel weighted gene rank aggregation problem whose complexity is at least NP-hard. To tackle the problem, we develop a new Markov-chain based rank aggregation method called Weighted MC (WMC). The WMC algorithm makes use of rank-based weight information to generate the transition matrix. Extensive experiments on the real biological datasets show that our approach is more efficient in aggregating long gene lists. Importantly, the WMC method is much more robust for identifying biologically significant genes compared with the state-of-the-art methods.
very large data bases | 2017
Qiang Huang; Jianlin Feng; Qiong Fang; Wilfred Ng; Wei Wang
The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer
international conference on data engineering | 2017
Qiang Huang; Jianlin Feng; Qiong Fang
knowledge discovery and data mining | 2018
Qiang Huang; Guihong Ma; Jianlin Feng; Qiong Fang; Anthony K. H. Tung
c \ge 2
international conference on management of data | 2012
Junhao Gan; Jianlin Feng; Qiong Fang; Wilfred Ng