Is this you? Create Your Porfile

Haixuan Yang

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haixuan Yang is active.

Explore More

Publication

Featured researches published by Haixuan Yang.

conference on information and knowledge management | 2008

Mining social networks using heat diffusion processes for marketing candidates selection

Hao Ma; Haixuan Yang; Michael R. Lyu; Irwin King

Social Network Marketing techniques employ pre-existing social networks to increase brands or products awareness through word-of-mouth promotion. Full understanding of social network marketing and the potential candidates that can thus be marketed to certainly offer lucrative opportunities for prospective sellers. Due to the complexity of social networks, few models exist to interpret social network marketing realistically. We propose to model social network marketing using Heat Diffusion Processes. This paper presents three diffusion models, along with three algorithms for selecting the best individuals to receive marketing samples. These approaches have the following advantages to best illustrate the properties of real-world social networks: (1) We can plan a marketing strategy sequentially in time since we include a time factor in the simulation of product adoptions; (2) The algorithm of selecting marketing candidates best represents and utilizes the clustering property of real-world social networks; and (3) The model we construct can diffuse both positive and negative comments on products or brands in order to simulate the complicated communications within social networks. Our work represents a novel approach to the analysis of social network marketing, and is the first work to propose how to defend against negative comments within social networks. Complexity analysis shows our model is also scalable to very large social networks.

international acm sigir conference on research and development in information retrieval | 2007

DiffusionRank: a possible penicillin for web spamming

Haixuan Yang; Irwin King; Michael R. Lyu

While the PageRank algorithm has proven to be very effective for ranking Web pages, the rank scores of Web pages can be manipulated. To handle the manipulation problem and to cast a new insight on the Web structure, we propose a ranking algorithm called DiffusionRank. DiffusionRank is motivated by the heat diffusion phenomena, which can be connected to Web ranking because the activities flow on the Web can be imagined as heat flow, the link from a page to another can be treated as the pipe of an air-conditioner, and heat flow can embody the structure of the underlying Web graph. Theoretically we show that DiffusionRank can serve as a generalization of PageRank when the heat diffusion co-efficient γ tends to infinity. In such a case 1=γ= 0, DiffusionRank (PageRank) has low ability of anti-manipulation. When γ = 0, DiffusionRank obtains the highest ability of anti-manipulation, but in such a case, the web structure is completely ignored. Consequently, γ is an interesting factor that can control the balance between the ability of preserving the original Web and the ability of reducing the effect of manipulation. It is found empirically that, when γ = 1, DiffusionRank has a Penicillin-like effect on the link manipulation. Moreover, DiffusionRank can be employed to find group-to-group relations on the Web, to divide the Web graph into several parts, and to find link communities. Experimental results show that the DiffusionRank algorithm achieves the above mentioned advantages as expected.

Bioinformatics | 2012

Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty

Haixuan Yang; Tamás Nepusz; Alberto Paccanaro

MOTIVATION Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG-this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient-random walks do not need to be simulated as we provide formulas to calculate their stationary distributions. RESULTS To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999), Lin (1998), and Jiang and Conrath (1997); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008); GraSM by Couto et al. (2007); and Couto and Silva (2011). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae, and tested how they correlate with sequence similarity, mRNA co-expression and protein-protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures.

Bioinformatics | 2014

GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

Horacio Caniza; Alfonso Romero; Samuel Heron; Haixuan Yang; Alessandra Devoto; Marco Frasca; Marco Mesiti; Giorgio Valentini; Alberto Paccanaro

Summary: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. Contact: [email protected] Availability: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.

international world wide web conferences | 2005

Predictive ranking: a novel page ranking approach by estimating the web structure

Haixuan Yang; Irwin King; Michael R. Lyu

PageRank (PR) is one of the most popular ways to rank web pages. However, as the Web continues to grow in volume, it is becoming more and more difficult to crawl all the available pages. As a result, the page ranks computed by PR are only based on a subset of the whole Web. This produces inaccurate outcome because of the inherent incomplete information (dangling pages) that exist in the calculation. To overcome this incompleteness, we propose a new variant of the PageRank algorithm called, Predictive Ranking (PreR), in which different classes of dangling pages are analyzed individually so that the link structure can be predicted more accurately. We detail our proposed steps. Furthermore, experimental results show that this algorithm achieves encouraging results when compared with previous methods.

international joint conference on neural network | 2006

Predictive Random Graph Ranking on the Web

Haixuan Yang; Irwin King; Michael R. Lyu

The incomplete information about the Web structure causes inaccurate results of various ranking algorithms. In this paper, we propose a solution to this problem by formulating a new framework called, Predictive Random Graph Ranking, in which we generate a random graph based on the known information about the Web structure. The random graph can be considered as the predicted Web structure, on which ranking algorithm are expected to be improved in accuracy. For this purpose, we extend some current ranking algorithms from a static graph to a random graph. Experimental results show that the Predictive Random Graph Ranking framework can improve the accuracy of the ranking algorithms such as PageRank, Common Neighbor, and Jaccards Coefficient.

conference on information and knowledge management | 2009

Semi-nonnegative matrix factorization with global statistical consistency for collaborative filtering

Hao Ma; Haixuan Yang; Irwin King; Michael R. Lyu

Collaborative Filtering, considered by many researchers as the most important technique for information filtering, has been extensively studied by both academic and industrial communities. One of the most popular approaches to collaborative filtering recommendation algorithms is based on low-dimensional factor models. The assumption behind such models is that a users preferences can be modeled by linearly combining item factor vectors using user-specific coefficients. In this paper, aiming at several aspects ignored by previous work, we propose a semi-nonnegative matrix factorization method with global statistical consistency. The major contribution of our work is twofold: (1) We endow a new understanding on the generation or latent compositions of the user-item rating matrix. Under the new interpretation, our work can be formulated as the semi-nonnegative matrix factorization problem. (2) Moreover, we propose a novel method of imposing the consistency between the statistics given by the predicted values and the statistics given by the data. We further develop an optimization algorithm to determine the model complexity automatically. The complexity of our method is linear with the number of the observed ratings, hence it is scalable to very large datasets. Finally, comparing with other state-of-the-art methods, the experimental analysis on the EachMovie dataset illustrates the effectiveness of our approach.

international conference on wireless communications and mobile computing | 2006

A point-distribution index and its application to sensor-grouping in wireless sensor networks

Yangfan Zhou; Haixuan Yang; Michael R. Lyu; Edith C.-H. Ngai

We propose ι a novel index for evaluation of point-distribution. ι is the minimum distance between each pair of points normalized by the average distance between each pair of points. We find that a set of points that achieve a maximum value of ι result in a honeycomb structure. We propose that ι can serve as a good index to evaluate the distribution of the points, which can be employed in coverage-related problems in wireless sensor networks (WSNs). To validate this idea, we formulate a general sensorgrouping problem for WSNs and provide a general sensing model. We show that locally maximizing ι at sensor nodes is a good approach to solve this problem with an algorithm called Maximizing-ι Node-Deduction (MIND). Simulation results verify that MIND outperforms a greedy algorithm that exploits sensor-redundancy we design. This demonstrates a good application of employing ι in coverage-related problems for WSNs.

PLOS ONE | 2012

Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses

Prajwal Bhat; Haixuan Yang; László Bögre; Alessandra Devoto; Alberto Paccanaro

The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/].

PLOS ONE | 2016

Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization.

Haixuan Yang; Cathal Seoighe

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.

Explore More