Houping Xiao
University at Buffalo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Houping Xiao.
knowledge discovery and data mining | 2016
Houping Xiao; Jing Gao; Qi Li; Fenglong Ma; Lu Su; Yunlong Feng; Aidong Zhang
The demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All existing truth discovery methods focus on providing a point estimator for each objects truth, but in many real-world applications, confidence interval estimation of truths is more desirable, since confidence interval contains richer information. To address this challenge, in this paper, we propose a novel truth discovery method (ETCIBoot) to construct confidence interval estimates as well as identify truths, where the bootstrapping techniques are nicely integrated into the truth discovery procedure. Due to the properties of bootstrapping, the estimators obtained by ETCIBoot are more accurate and robust compared with the state-of-the-art truth discovery approaches. Theoretically, we prove the asymptotical consistency of the confidence interval obtained by ETCIBoot. Experimentally, we demonstrate that ETCIBoot is not only effective in constructing confidence intervals but also able to obtain better truth estimates.
knowledge discovery and data mining | 2017
Fenglong Ma; Chuishi Meng; Houping Xiao; Qi Li; Jing Gao; Lu Su; Aidong Zhang
Drug side-effects become a worldwide public health concern, which are the fourth leading cause of death in the United States. Pharmaceutical industry has paid tremendous effort to identify drug side-effects during the drug development. However, it is impossible and impractical to identify all of them. Fortunately, drug side-effects can also be reported on heterogeneous platforms (i.e., data sources), such as FDA Adverse Event Reporting System and various online communities. However, existing supervised and semi-supervised approaches are not practical as annotating labels are expensive in the medical field. In this paper, we propose a novel and effective unsupervised model Sifter to automatically discover drug side-effects. Sifter enhances the estimation on drug side-effects by learning from various online platforms and measuring platform-level and user-level quality simultaneously. In this way, Sifter demonstrates better performance compared with existing approaches in terms of correctly identifying drug side-effects. Experimental results on five real-world datasets show that Sifter can significantly improve the performance of identifying side-effects compared with the state-of-the-art approaches.
knowledge discovery and data mining | 2016
Houping Xiao; Jing Gao; Zhaoran Wang; Shiyu Wang; Lu Su; Han Liu
In the information age, people can easily collect information about the same set of entities from multiple sources, among which conflicts are inevitable. This leads to an important task, truth discovery, i.e., to identify true facts (truths) via iteratively updating truths and source reliability. However, the convergence to the truths is never discussed in existing work, and thus there is no theoretical guarantee in the results of these truth discovery approaches. In contrast, in this paper we propose a truth discovery approach with theoretical guarantee. We propose a randomized gaussian mixture model (RGMM) to represent multi-source data, where truths are model parameters. We incorporate source bias which captures its reliability degree into RGMM formulation. The truth discovery task is then modeled as seeking the maximum likelihood estimate (MLE) of the truths. Based on expectation-maximization (EM) techniques, we propose population-based (i.e., on the limit of infinite data) and sample-based (i.e., on a finite set of samples) solutions for the MLE. Theoretically, we prove that both solutions are contractive to an ε-ball around the MLE, under certain conditions. Experimentally, we evaluate our method on both simulated and real-world datasets. Experimental results show that our method achieves high accuracy in identifying truths with convergence guarantee.
international world wide web conferences | 2015
Houping Xiao; Jing Gao; Deepak S. Turaga; Long H. Vu; Alain Biem
In this paper, we investigate the problem of identifying inconsistent hosts in large-scale enterprise networks by mining multiple views of temporal data collected from the networks. The time-varying behavior of hosts is typically consistent across multiple views, and thus hosts that exhibit inconsistent behavior are possible anomalous points to be further investigated. To achieve this goal, we develop an effective approach that extracts common patterns hidden in multiple views and detects inconsistency by measuring the deviation from these common patterns. Specifically, we first apply various anomaly detectors on the raw data and form a three-way tensor (host, time, detector) for each view. We then develop a joint probabilistic tensor factorization method to derive the latent tensor subspace, which captures common time-varying behavior across views. Based on the extracted tensor subspace, an inconsistency score is calculated for each host that measures the deviation from common behavior. We demonstrate the effectiveness of the proposed approach on two enterprise-wide network-based anomaly detection tasks. An enterprise network consists of multiple hosts (servers, desktops, laptops) and each host sends/receives a time-varying number of bytes across network protocols (e.g.,TCP, UDP, ICMP) or send URL requests to DNS under various categories. The inconsistent behavior of a host is often a leading indicator of potential issues (e.g., instability, malicious behavior, or hardware malfunction). We perform experiments on real-world data collected from IBM enterprise networks, and demonstrate that the proposed method can find hosts with inconsistent behavior that are important to cybersecurity applications.
knowledge discovery and data mining | 2017
Houping Xiao; Jing Gao; Long H. Vu; Deepak S. Turaga
Diabetes is a serious disease affecting a large number of people. Although there is no cure for diabetes, it can be managed. Especially, with advances in sensor technology, lots of data may lead to the improvement of diabetes management, if properly mined. However, there usually exists noise or errors in the observed behavioral data which poses challenges in extracting meaningful knowledge. To overcome this challenge, we learn the latent state which represents the patients condition. Such states should be inferred from the behavioral data but unknown a priori. In this paper, we propose a novel framework to capture the trajectory of latent states for patients from behavioral data while exploiting their demographic differences and similarities to other patients. We conduct a hypothesis test to illustrate the importance of the demographic data in diabetes management, and validate that each behavioral feature follows an exponential or a Gaussian distribution. Integrating these aspects, we use a Demographic feature restricted hidden Markov model (DfrHMM) to estimate the trajectory of latent states by integrating the demographic and behavioral data. In DfrHMM, the latent state is mainly determined by the previous state and the demographic features in a nonlinear way. Markov Chain Monte Carlo techniques are used for model parameter estimation. Experiments on synthetic and real datasets show that DfrHMM is effective in diabetes management.
international conference on data mining | 2015
Xiaoyi Li; Xiaowei Jia; Hui Li; Houping Xiao; Jing Gao; Aidong Zhang
Sequential data modeling has received growing interests due to its impact on real world problems. Sequential data is ubiquitous - financial transactions, advertise conversions and disease evolution are examples of sequential data. A long-standing challenge in sequential data modeling is how to capture the strong hidden correlations among complex features in high volumes. The sparsity and skewness in the features extracted from sequential data also add to the complexity of the problem. In this paper, we address these challenges from both discriminative and generative perspectives, and propose novel stochastic learning algorithms to model nonlinear variances from static time frames and their transitions. The proposed model, Deep Recurrent Network (DRN), can be trained in an unsupervised fashion to capture transitions, or in a discriminative fashion to conduct sequential labeling. We analyze the conditional independence of each functional module and tackle the diminishing gradient problem by developing a two-pass training algorithm. Extensive experiments on both simulated and real-world dynamic networks show that the trained DRN outperforms all baselines in the sequential classification task and obtains excellent performance in the regression task.
design automation conference | 2018
Cunxi Yu; Houping Xiao; Giovanni De Micheli
Design flows are the explicit combinations of design transformations, primarily involved in synthesis, placement and routing processes, to accomplish the design of Integrated Circuits (ICs) and System-on-Chip (SoC). Mostly, the flows are developed based on the knowledge of the experts. However, due to the large search space of design flows and the increasing design complexity, developing Intellectual Property (IP)-specific synthesis flows providing high Quality of Result (QoR) is extremely challenging. This work presents a fully autonomous framework that artificially produces design-specific synthesis flows without human guidance and baseline flows, using Convolutional Neural Network (CNN). The demonstrations are made by successfully designing logic synthesis flows of three large scaled designs.
international conference on data mining | 2015
Houping Xiao; Jing Gao
Nowadays, a vast ocean of data from different sources is collected, and numerous applications call for the extraction of actionable insights from multi-source data. One important task is to detect untrustworthy information because such information usually indicates critical, unusual, or suspicious activities. The limitation of existing approaches is that they focus on one single source or ignore temporal information. To tackle the challenge brought by dynamic multi-source data, in this dissertation, we propose a multi-source information trustworthiness analysis framework. We represent the data as high-dimensional tensors and then apply joint tensor factorization techniques to find the common subspace across multiple sources, based on which untrustworthy information is detected. In the future, we will consider unique characteristics of various application domains and develop effective trustworthiness analysis approaches for these applications. We will also develop approaches to speed up the framework for processing large-scale data based on parallel tucker decomposition or low rank representation.
mobile ad hoc networking and computing | 2016
Haiming Jin; Lu Su; Houping Xiao; Klara Nahrstedt
international conference on embedded networked sensor systems | 2015
Chenglin Miao; Wenjun Jiang; Lu Su; Yaliang Li; Suxin Guo; Zhan Qin; Houping Xiao; Jing Gao; Kui Ren