Haohan Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haohan Wang is active.

Explore More

Publication

Featured researches published by Haohan Wang.

bioinformatics and biomedicine | 2015

Learning structure in gene expression data using deep architectures, with an application to gene clustering

Aman Gupta; Haohan Wang; Madhavi Ganapathiraju

Genes play a central role in all biological processes. DNA microarray technology has made it possible to study the expression behavior of thousands of genes in one go. Often, gene expression data is used to generate features for supervised and unsupervised learning tasks. At the same time, advances in the field of deep learning have made available a plethora of architectures. In this paper, we use deep architectures pre-trained in an unsupervised manner using denoising autoencoders as a preprocessing step for a popular unsupervised learning task. Denoising autoencoders (DA) can be used to learn a compact representation of input, and have been used to generate features for further supervised learning tasks. We propose that our deep architectures can be treated as empirical versions of Deep Belief Networks (DBNs). We use our deep architectures to regenerate gene expression time series data for two different data sets. We test our hypothesis on two popular datasets for the unsupervised learning task of clustering and find promising improvements in performance.

bioinformatics and biomedicine | 2016

Multiple confounders correction with regularized linear mixed effect models, with application in biological processes

Haohan Wang; Jingkang Yang

In this paper, we inspect the performance of regularized linear mixed effect models, as an extension of linear mixed effect model, when multiple confounding factors coexist. We first review its parameter estimation algorithms before we introduce three different methods for multiple confounding factors correction, namely concatenation, sequence, and interpolation. Then we investigate the performance on variable selection task and predictive task on three different data sets, synthetic data set, semi-empirical synthetic data set based on genome sequences and brain wave data set connecting to confused mental states. Our results suggest that sequence multiple confounding factors corrections behave the best when different confounders contribute equally to response variables. On the other hand, when various confounders affect the response variable unevenly, results mainly rely on the degree of how the major confounder is corrected.

international conference on multimedia and expo | 2017

Select-additive learning: Improving generalization in multimodal sentiment analysis

Haohan Wang; Aaksha Meghawat; Louis-Philippe Morency; Eric P. Xing

Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews which are now available aplenty on online platforms. However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms. These limited resources restrict the generalizability of models, where, for example, the unique characteristics of a few speakers (e.g., wearing glasses) may become a confounding factor for the sentiment classification task. In this paper, we propose a Select-Additive Learning (SAL) procedure that improves the generalizability of trained neural networks for multimodal sentiment analysis. In our experiments, we show that our SAL approach improves prediction accuracy significantly in all three modalities (verbal, acoustic, visual), as well as in their fusion. Our results show that SAL, even when trained on one dataset, achieves good generalization across two new test datasets.

Applied Mechanics and Materials | 2014

Localized model to segmentally estimate miles per gallon (MPG) for equipment engines

Jiu Lin Luo; Hao Jing Luo; Ai Min Li; Haohan Wang

In this paper, we built a localized regression model to estimate the miles per gallon (MPG) characteristic for equipment engines based on a serious physical features of this engine. First, we statistically viewed these parameters to build up a basic understanding of the data we collected. Then, with the belief that engines with similar characteristics will perform similarly, we proposed a novel localized model with a novel optimal function based EM algorithm and a novel self-adjusted optimal clustering algorithm to estimate MPG based on the other fully studied engines with similar physical features.

International Journal of Data Mining and Bioinformatics | 2017

Extracting compact representation of knowledge from gene expression data for protein-protein interaction

Haohan Wang; Aman Gupta; Ming Xu

DNA microarrays help measure the expression levels of thousands of genes concurrently. A major challenge is to extract biologically relevant information and knowledge from massive amounts of microarray data. In this paper, we explore learning a compact representation of gene expression profiles by using a multi-task neural network model, so that further analyses can be carried out more efficiently on the data. The proposed network is trained with prediction tasks for Protein-Protein Interactions (PPIs), predicting Gene Ontology (GO) similarities as well as geometrical constrains, while simultaneously learning a high-level representation of gene expression data. We argue that deep networks can extract more information from expression data as compared to standard statistical models. We tested the utility of our method by comparing its performance with famous feature extraction and dimensionality reduction methods on the task of PPI prediction, and found the results to be promising.

computational intelligence | 2017

Learning functional embedding of genes governed by pair-wised labels

Jingjun Cao; Zhengli Wu; Wenting Ye; Haohan Wang

In this work, we build a deep neural network architecture which learns a compact numerical representation of genes supervised by numerous sources of pair-wise information, including Protein-Protein Interaction information and Gene Ontology information. We introduce a new network architecture which can process gene expression data and generate the representation of individual genes while governed by pair-wise information. The learnt representation is aimed to be further used for research of bioinformatics on relevant tasks, and even beyond the information sources from embedding learnt. Within this paper, we evaluate the representation on Protein-Protein Interaction task, and it shows a result which is better than learnt representation from traditional dimension reduction and feature selection methods.

international conference on software engineering | 2016

Predicting usefulness of Yelp reviews with localized linear regression models

Ruhui Shen; Jialiang Shen; Yuhong Li; Haohan Wang

Many websites such as Yelp provide platform for users to write reviews about places they have visited. But not all reviews are equally useful. However, it generally takes from several weeks to months to receive feedback about “usefulness” of review from online community. So there is a need to automatically predict the “usefulness” of review. In this paper, we are trying to solve the specific question “How many ‘useful’ votes a Yelp review will receive?” by using bag-of-words, linguistic, geographical, statistical, popularity and other qualitative features extracted from user, business and review information provided by Yelp. We use state-of-the-art machine learning algorithms for regression to predict required numeric value of ‘usefulness’ of review. We further gained performance improvement by introducing a batch mode localized weighted regression model. This localized regression approach resulted into RMSLE of 0.47769, which is better than traditional methods.

international conference on big data | 2018

GLDA-FP: Gaussian LDA Model for Forward Prediction

Yunpeng Xiao; Liangyun Liu; Ming Xu; Haohan Wang; Yanbing Liu

In social networks, information propagation is affected by diversity factors. In this work, we study the formation of forward behavior, map into multidimensional driving mechanisms and apply the behavioral and structural features to forward prediction. Firstly, by considering the effect of behavioral interest, user activity and network influence, we propose three driving mechanisms: interest-driven, habit-driven and structure-driven. Secondly, by taking advantage of the Latent Dirichlet allocation (LDA) model in dealing with problems of polysemy and synonymy, the traditional text modeling method is improved by Gaussian distribution and applied to user interest, activity and influence modeling. In this way, the user topic distribution for each dimension can be obtained regardless of whether the word is discrete or continuous. Moreover, the model can be extended using the pre-discretizing method which can help LDA detect the topic evolution automatically. By introducing time information, we can dynamically monitor user activity and mine the hidden behavioral habit. Finally, a novel model, Gaussian LDA, for forward prediction is proposed. The experimental results indicate that the model not only mine user latent interest, but also improve forward prediction performance effectively.

bioRxiv | 2018

Removing Confounding Factors Associated Weights in Deep Neural Networks Improves the Prediction Accuracy for Healthcare Applications.

Haohan Wang; Zhenglin Wu; Eric P. Xing

The proliferation of healthcare data has brought the opportunities of applying data-driven approaches, such as machine learning methods, to assist diagnosis. Recently, many deep learning methods have been shown with impressive successes in predicting disease status with raw input data. However, the “black-box” nature of deep learning and the high-reliability requirement of biomedical applications have created new challenges regarding the existence of confounding factors. In this paper, with a brief argument that inappropriate handling of confounding factors will lead to models’ sub-optimal performance in real-world applications, we present an efficient method that can remove the influences of confounding factors such as age or gender to improve the across-cohort prediction accuracy of neural networks. One distinct advantage of our method is that it only requires minimal changes of the baseline model’s architecture so that it can be plugged into most of the existing neu-ral networks. We conduct experiments across CT-scan, MRA, and EEG brain wave with convolutional neural networks and LSTM to verify the efficiency of our method.

bioRxiv | 2018

Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning

Haohan Wang; Xiang Liu; Yifeng Tao; Wenting Ye; Qiao Jin; William W. Cohen; Eric P. Xing

The increasing amount of scientific literature in biological and biomedical science research has created a challenge in the continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this chal-lenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the flexibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative arti-ficial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current pri-mary task is to build the genetic association database between genes and complex traits of the human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effec-tiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.

Explore More