Xiaowei Xu
University of Arkansas at Little Rock
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaowei Xu.
knowledge discovery and data mining | 2007
Xiaowei Xu; Nurcan Yuruk; Zhidan Feng; Thomas A. J. Schweiger
Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play special roles - vertices that bridge clusters (hubs) and vertices that are marginally connected to clusters (outliers). Identifying hubs is useful for applications such as viral marketing and epidemiology since hubs are responsible for spreading ideas or disease. In contrast, outliers have little or no influence, and may be isolated as noise in the data. In this paper, we proposed a novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks. It clusters vertices based on a structural similarity measure. The algorithm is fast and efficient, visiting each vertex only once. An empirical evaluation of the method using both synthetic and real datasets demonstrates superior performance over other methods such as the modularity-based algorithms.
BMC Bioinformatics | 2008
Mutlu Mete; Fusheng Tang; Xiaowei Xu; Nurcan Yuruk
BackgroundBiological systems can be modeled as complex network systems with many interactions between the components. These interactions give rise to the function and behavior of that system. For example, the protein-protein interaction network is the physical basis of multiple cellular functions. One goal of emerging systems biology is to analyze very large complex biological networks such as protein-protein interaction networks, metabolic networks, and regulatory networks to identify functional modules and assign functions to certain components of the system. Network modules do not occur by chance, so identification of modules is likely to capture the biologically meaningful interactions in large-scale PPI data. Unfortunately, existing computer-based clustering methods developed to find those modules are either not so accurate or too slow.ResultsWe devised a new methodology called SCAN (Structural Clustering Algorithm for Networks) that can efficiently find clusters or functional modules in complex biological networks as well as hubs and outliers. More specifically, we demonstrated that we can find functional modules in complex networks and classify nodes into various roles based on their structures. In this study, we showed the effectiveness of our methodology using the budding yeast (Saccharomyces cerevisiae) protein-protein interaction network. To validate our clustering results, we compared our clusters with the known functions of each protein. Our predicted functional modules achieved very high purity comparing with state-of-the-art approaches. Additionally the theoretical and empirical analysis demonstrated a linear running-time of the algorithm, which is the fastest approach for networks.ConclusionWe compare our algorithm with well-known modularity based clustering algorithm CNM. We successfully detect functional groups that are annotated with putative GO terms. Top-10 clusters with minimum p-value theoretically prove that newly proposed algorithm partitions network more accurately then CNM. Furthermore, manual interpretations of functional groups found by SCAN show superior performance over CNM.
Drug Discovery Today | 2013
Zhichao Liu; Hong Fang; Kelly Reagan; Xiaowei Xu; Donna L. Mendrick; William Slikker; Weida Tong
Drug repositioning, exemplified by sildenafil and thalidomide, is a promising way to explore alternative indications for existing drugs. Recent research has shown that bioinformatics-based approaches have the potential to offer systematic insights into the complex relationships among drugs, targets and diseases necessary for successful repositioning. In this article, we propose the key bioinformatics steps essential for discovering valuable repositioning methods. The proposed steps (repurposing with a purpose, repurposing with a strategy and repurposing with confidence) are aimed at providing a repurposing pipeline, with particular focus on the proposed Drugs of New Indications (DNI) database, which can be used alongside currently available resources to improve in silico drug repositioning.
web intelligence | 2010
Halil Bisgin; Nitin Agarwal; Xiaowei Xu
Similarity breeds connections, the principle of homophily, has been well studied in existing sociology literature. %Several studies have observed this phenomena by conducting surveys on human subjects. These studies have concluded that new ties are formed between similar individuals. This phenomenon has been used to explain several socio-psychological concepts such as segregation, community development, social mobility, etc. However, due to the nature of these studies and limitations because of involvement of human subjects, conclusions from these studies are not easily extensible in online social media. %Social media, which is becoming the infinite space for interactions, has exceeded all the expectations in terms of growth, for reasons beyond human mind. New ties are formed in social media just like the way they emerge in real-world. However, given the differences between real-world and online social media, do the same factors that govern the construction of new ties in real-world also govern the construction of new ties in social media? In other words, does homophily exist in social media? In this article, we study this extremely significant question. We propose a systematic approach by studying two online social media sites, BlogCatalog and Last.fm and report our findings along with some interesting observations.
BMC Bioinformatics | 2011
Halil Bisgin; Zhichao Liu; Hong Fang; Xiaowei Xu; Weida Tong
BackgroundThe Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive.MethodIn this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs.ResultsThe results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific medications via topics.ConclusionsThe successful application of topic modeling on the FDA drug labeling demonstrates its potential utility as a hypothesis generation means to infer hidden relationships of concepts such as, in this study, drug safety and therapeutic use in the study of biomedical documents.
World Wide Web | 2012
Halil Bisgin; Nitin Agarwal; Xiaowei Xu
The fact that similarity breeds connections, the principle of homophily, has been well-studied in existing sociology literature. Several studies have observed this phenomenon by conducting surveys on human subjects. These studies have concluded that new ties are formed between similar individuals. This phenomenon has been used to explain several socio-psychological concepts such as, segregation, community development, social mobility, etc. However, due to the nature of these studies and limitations because of involvement of human subjects, conclusions from these studies are not easily extensible in online social media. Social media, which is becoming the infinite space for interactions, has exceeded all the expectations in terms of growth, for reasons beyond human mind. New ties are formed in social media in the same way that they emerge in real-world. However, given the differences between real world and online social media, do the same factors that govern the construction of new ties in real world also govern the construction of new ties in social media? In other words, does homophily exist in social media? In this article, we study this extremely significant question. We propose a systematic approach by studying three online social media sites, BlogCatalog, Last.fm, and LiveJournal and report our findings along with some interesting observations. The results indicate that the influence of interest-based homophily is not a very strong leading factor for constructing new ties specifically in the three social media sites with implications to strategic advertising, recommendations, and promoting applications at large.
BMC Bioinformatics | 2012
Halil Bisgin; Zhichao Liu; Reagan Kelly; Hong Fang; Xiaowei Xu; Weida Tong
BackgroundDrug repositioning offers an opportunity to revitalize the slowing drug discovery pipeline by finding new uses for currently existing drugs. Our hypothesis is that drugs sharing similar side effect profiles are likely to be effective for the same disease, and thus repositioning opportunities can be identified by finding drug pairs with similar side effects documented in U.S. Food and Drug Administration (FDA) approved drug labels. The safety information in the drug labels is usually obtained in the clinical trial and augmented with the observations in the post-market use of the drug. Therefore, our drug repositioning approach can take the advantage of more comprehensive safety information comparing with conventional de novo approach.MethodA probabilistic topic model was constructed based on the terms in the Medical Dictionary for Regulatory Activities (MedDRA) that appeared in the Boxed Warning, Warnings and Precautions, and Adverse Reactions sections of the labels of 870 drugs. Fifty-two unique topics, each containing a set of terms, were identified by using topic modeling. The resulting probabilistic topic associations were used to measure the distance (similarity) between drugs. The success of the proposed model was evaluated by comparing a drug and its nearest neighbor (i.e., a drug pair) for common indications found in the Indications and Usage Section of the drug labels.ResultsGiven a drug with more than three indications, the model yielded a 75% recall, meaning 75% of drug pairs shared one or more common indications. This is significantly higher than the 22% recall rate achieved by random selection. Additionally, the recall rate grows rapidly as the number of drug indications increases and reaches 84% for drugs with 11 indications. The analysis also demonstrated that 65 drugs with a Boxed Warning, which indicates significant risk of serious and possibly life-threatening adverse effects, might be replaced with safer alternatives that do not have a Boxed Warning. In addition, we identified two therapeutic groups of drugs (Musculo-skeletal system and Anti-infective for systemic use) where over 80% of the drugs have a potential replacement with high significance.ConclusionTopic modeling can be a powerful tool for the identification of repositioning opportunities by examining the adverse event terms in FDA approved drug labels. The proposed framework not only suggests drugs that can be repurposed, but also provides insight into the safety of repositioned drugs.
advances in social networks analysis and mining | 2009
Nurcan Yuruk; Mutlu Mete; Xiaowei Xu; Thomas A. J. Schweiger
Many systems in sciences, engineering and nature can be modeled as networks. Examples include the internet, WWW and social networks. Finding hidden structures is important for making sense of complex networked data. In this paper we present a new network clustering method that can find clusters in an agglomerative fashion using structural similarity of vertices in the given network. Experiments conducted on real datasets demonstrate promising performance of the new method.
BMC Bioinformatics | 2007
Mutlu Mete; Xiaowei Xu; Chun-Yang Fan; Gal Shafirstein
BackgroundHistopathology, which is one of the most important routines of all laboratory procedures used in pathology, is decisive for the diagnosis of cancer. Experienced histopathologists review the histological slides acquired from biopsy specimen in order to outline malignant areas. Recently, improvements in imaging technologies in terms of histological image analysis led to the discovery of virtual histological slides. In this technique, a computerized microscope scans a glass slide and generates virtual slides at a resolution of 0.25 μm/pixel. As the recognition of intrinsic cancer areas is time consuming and error prone, in this study we develop a novel method to tackle automatic squamous cell carcinoma of the head and neck detection problem in high-resolution, wholly-scanned histopathological slides.ResultsA density-based clustering algorithm improved for this study plays a key role in the determination of the corrupted cell nuclei. Using the Support Vector Machines (SVMs) Classifier, experimental results on seven head and neck slides show that the proposed algorithm performs well, obtaining an average of 96% classification accuracy.ConclusionRecent advances in imaging technology enable us to investigate cancer tissue at cellular level. In this study we focus on wholly-scanned histopathological slides of head and neck tissues. In the context of computer-aided diagnosis, delineation of malignant regions is achieved using a powerful classification algorithm, which heavily depends on the features extracted by aid of a newly proposed cell nuclei clustering technique. The preliminary experimental results demonstrate a high accuracy of the proposed method.
advanced information networking and applications | 2013
Venkata Swamy Martha; Weizhong Zhao; Xiaowei Xu
The big data analytics community has accepted MapReduce as a programming model for processing massive data on distributed systems such as a Hadoop cluster. MapReduce has been evolving to improve its performance. We identified skewed workload among workers in the MapReduce ecosystem. The problem of skewed workload is of serious concern for massive data processing. We tackled the workload balancing issue by introducing a hierarchical MapReduce, or h-MapReduce for short. h-MapReduce identifies a heavy task by a properly defined cost function. The heavy task is divided into child tasks that are distributed among available workers as a new job in MapReduce framework. The invocation of new jobs from a task poses several challenges that are addressed by h-MapReduce. Our experiments on h-MapReduce proved the performance gain over standard MapReduce for data-intensive algorithms. More specifically, the increase of the performance gain is exponential in terms of the size of the networks. In addition to the exponential performance gains, our investigations also found a negative effect of deploying h-MapReduce due to an inappropriate definition of heavy tasks, which provides us a guideline for an effective application of h-MapReduce.