Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Haiquan Li is active.

Publication


Featured researches published by Haiquan Li.


PLOS ONE | 2012

Oligo- and Polymetastatic Progression in Lung Metastasis(es) Patients Is Associated with Specific MicroRNAs

Yves A. Lussier; Nikolai N. Khodarev; Kelly Regan; Kimberly S. Corbin; Haiquan Li; Sabha Ganai; Sajid A. Khan; Jennifer L. Gnerlich; Thomas E. Darga; Hanli Fan; Oleksiy Karpenko; Philip B. Paty; Mitchell C. Posner; Steven J. Chmura; Samuel Hellman; Mark K. Ferguson; Ralph R. Weichselbaum

Rationale Strategies to stage and treat cancer rely on a presumption of either localized or widespread metastatic disease. An intermediate state of metastasis termed oligometastasis(es) characterized by limited progression has been proposed. Oligometastases are amenable to treatment by surgical resection or radiotherapy. Methods We analyzed microRNA expression patterns from lung metastasis samples of patients with ≤5 initial metastases resected with curative intent. Results Patients were stratified into subgroups based on their rate of metastatic progression. We prioritized microRNAs between patients with the highest and lowest rates of recurrence. We designated these as high rate of progression (HRP) and low rate of progression (LRP); the latter group included patients with no recurrences. The prioritized microRNAs distinguished HRP from LRP and were associated with rate of metastatic progression and survival in an independent validation dataset. Conclusion Oligo- and poly- metastasis are distinct entities at the clinical and molecular level.


IEEE Transactions on Knowledge and Data Engineering | 2007

Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

Jinyan Li; Guimei Liu; Haiquan Li; Limsoon Wong

Maximal biclique (also known as complete bipartite) subgraphs can model many applications in Web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without self-loops, we prove that 1) the number of closed patterns in the adjacency matrix of G is even, 2) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G, and 3) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular, for enumerating those maximal bicliques with a large size, where m and n. are the number of edges and vertices of the graph, respectively. We evaluate the high efficiency of our algorithm by comparing it to state- of-the-art algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a real-life protein interaction network. In this paper, we also prove that if self-loops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even, but the maximal bicliques are exactly the same as those of the graph after removing all the self-loops.


Plant Physiology | 2010

Genomic Inventory and Transcriptional Analysis of Medicago truncatula Transporters

Vagner A. Benedito; Haiquan Li; Xinbin Dai; Maren Wandrey; Ji He; R. Kaundal; Ivone Torres-Jerez; S. K. Gomez; Maria J. Harrison; Yuhong Tang; Patrick Xuechun Zhao; Michael K. Udvardi

Transporters move hydrophilic substrates across hydrophobic biological membranes and play key roles in plant nutrition, metabolism, and signaling and, consequently, in plant growth, development, and responses to the environment. To initiate and support systematic characterization of transporters in the model legume Medicago truncatula, we identified 3,830 transporters and classified 2,673 of these into 113 families and 146 subfamilies. Analysis of gene expression data for 2,611 of these transporters identified 129 that are expressed in an organ-specific manner, including 50 that are nodule specific and 36 specific to mycorrhizal roots. Further analysis uncovered 196 transporters that are induced at least 5-fold during nodule development and 44 in roots during arbuscular mycorrhizal symbiosis. Among the nodule- and mycorrhiza-induced transporter genes are many candidates for known transport activities in these beneficial symbioses. The data presented here are a unique resource for the selection and functional characterization of legume transporters.


Bioinformatics | 2006

Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale

Haiquan Li; Jinyan Li; Limsoon Wong

MOTIVATIONnProtein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining.nnnRESULTSnWe found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches.nnnAVAILABILITYnhttp://research.i2r.a-star.edu.sg/BindingMotifPairs/resources.nnnSUPPLEMENTARY INFORMATIONnhttp://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online.


symposium on principles of database systems | 2005

Relative risk and odds ratio: a data mining perspective

Haiquan Li; Jinyan Li; Limsoon Wong; Mengling Feng; Yap-Peng Tan

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of relative risk: What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of odds ratio: The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.


BMC Bioinformatics | 2009

TransportTP: A two-phase classification approach for membrane transporter prediction and characterization

Haiquan Li; Vagner A. Benedito; Michael K. Udvardi; Patrick Xuechun Zhao

BackgroundMembrane transporters play crucial roles in living cells. Experimental characterization of transporters is costly and time-consuming. Current computational methods for transporter characterization still require extensive curation efforts, especially for eukaryotic organisms. We developed a novel genome-scale transporter prediction and characterization system called TransportTP that combined homology-based and machine learning methods in a two-phase classification approach. First, traditional homology methods were employed to predict novel transporters based on sequence similarity to known classified proteins in the Transporter Classification Database (TCDB). Second, machine learning methods were used to integrate a variety of features to refine the initial predictions. A set of rules based on transporter features was developed by machine learning using well-curated proteomes as guides.ResultsIn a cross-validation using the yeast proteome for training and the proteomes of ten other organisms for testing, TransportTP achieved an equivalent recall and precision of 81.8%, based on TransportDB, a manually annotated transporter database. In an independent test using the Arabidopsis proteome for training and four recently sequenced plant proteomes for testing, it achieved a recall of 74.6% and a precision of 73.4%, according to our manual curation.ConclusionsTransportTP is the most effective tool for eukaryotic transporter characterization up to date.


Bioinformatics | 2008

A nearest neighbor approach for automated transporter prediction and categorization from protein sequences

Haiquan Li; Xinbin Dai; Xuechun Zhao

MOTIVATIONnMembrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework.nnnRESULTSnOur approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.


Journal of the American Medical Informatics Association | 2013

Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex diseases

Younghee Lee; Haiquan Li; Jianrong Li; Ellen Rebman; Ikbel Achour; Kelly Regan; Eric R. Gamazon; James L. Chen; Xinan Holly Yang; Nancy J. Cox; Yves A. Lussier

Background While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits. Methods We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance. Results Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fishers exact test p<2.24×10−12). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland–USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes). Conclusion Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.


Journal of the American Medical Informatics Association | 2012

Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory.

Haiquan Li; Younghee Lee; James L. Chen; Ellen Rebman; Jianrong Li; Yves A. Lussier

Objective Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning. Methods Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits. Results A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fishers exact test p=0.001 and 3.5×10−7, respectively). Conclusion An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches.


european conference on machine learning | 2005

A correspondence between maximal complete bipartite subgraphs and closed patterns

Jinyan Li; Haiquan Li; Donny Soh; Limsoon Wong

For an undirected graph G without self-loop, we prove: (i) that the number of closed patterns in the adjacency matrix of G is even; (ii) that the number of the closed patterns is precisely double the number of maximal complete bipartite subgraphs of G; (iii) that for every maximal complete bipartite subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, we can enumerate all maximal complete bipartite subgraphs by using efficient algorithms for mining closed patterns which have been extensively studied in the data mining field.

Collaboration


Dive into the Haiquan Li's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ikbel Achour

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qike Li

University of Arizona

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Limsoon Wong

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge