Hanjun Dai
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hanjun Dai.
knowledge discovery and data mining | 2016
Nan Du; Hanjun Dai; Rakshit Trivedi; Utkarsh Upadhyay; Manuel Gomez-Rodriguez; Le Song
Large volumes of event data are becoming increasingly available in a wide variety of applications, such as healthcare analytics, smart cities and social network analysis. The precise time interval or the exact distance between two events carries a great deal of information about the dynamics of the underlying systems. These characteristics make such data fundamentally different from independently and identically distributed data and time-series data where time and space are treated as indexes rather than random variables. Marked temporal point processes are the mathematical framework for modeling event data with covariates. However, typical point process models often make strong assumptions about the generative processes of the event data, which may or may not reflect the reality, and the specifically fixed parametric assumptions also have restricted the expressive power of the respective processes. Can we obtain a more expressive model of marked temporal point processes? How can we learn such a model from massive data? In this paper, we propose the Recurrent Marked Temporal Point Process (RMTPP) to simultaneously model the event timings and the markers. The key idea of our approach is to view the intensity function of a temporal point process as a nonlinear function of the history, and use a recurrent neural network to automatically learn a representation of influences from the event history. We develop an efficient stochastic gradient algorithm for learning the model parameters which can readily scale up to millions of events. Using both synthetic and real world datasets, we show that, in the case where the true models have parametric specifications, RMTPP can learn the dynamics of such models without the need to know the actual parametric forms; and in the case where the true models are unknown, RMTPP can also learn the dynamics and achieve better predictive performance than other parametric alternatives based on particular prior assumptions.
ACM Transactions on Information Systems | 2015
Qing Cui; Bin Gao; Jiang Bian; Siyu Qiu; Hanjun Dai; Tie-Yan Liu
Neural network techniques are widely applied to obtain high-quality distributed representations of words (i.e., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Most recent efforts have proposed several efficient methods to learn word embeddings from context such that they can encode both semantic and syntactic relationships between words. However, it is quite challenging to handle unseen or rare words with insufficient context. Inspired by the study on the word recognition process in cognitive psychology, in this article, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both words’ contextual information and morphological knowledge to learn word embeddings. Meanwhile, this new learning architecture is also able to benefit from noisy knowledge and balance between contextual information and morphological knowledge. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.
Bioinformatics | 2017
Hanjun Dai; Ramzan Umarov; Hiroyuki Kuwahara; Yu Li; Le Song; Xin Gao
Motivation An accurate characterization of transcription factor (TF)‐DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF‐DNA binding affinity landscape still remains a challenging problem. Results Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long‐range dependency in the sequence. A cornerstone of our method is a novel message passing‐like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large‐scale TF‐DNA datasets which were measured by different high‐throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state‐of‐the‐art binding affinity prediction methods. Availability and implementation Our program is freely available at https://github.com/ramzan1990/sequence2vec. Contact [email protected] or [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
national conference on artificial intelligence | 2014
Yuyu Zhang; Hanjun Dai; Chang Xu; Jun Feng; Taifeng Wang; Jiang Bian; Bin Wang; Tie-Yan Liu
international conference on computational linguistics | 2014
Fei Tian; Hanjun Dai; Jiang Bian; Bin Gao; Rui Zhang; Enhong Chen; Tie-Yan Liu
international conference on machine learning | 2016
Hanjun Dai; Bo Dai; Le Song
neural information processing systems | 2017
Elias B. Khalil; Hanjun Dai; Yuyu Zhang; Bistra Dilkina; Le Song
international conference on artificial intelligence and statistics | 2016
Bo Dai; Niao He; Hanjun Dai; Le Song
neural information processing systems | 2015
Shuang Li; Yao Xie; Hanjun Dai; Le Song
national conference on artificial intelligence | 2018
Yuyu Zhang; Hanjun Dai; Zornitsa Kozareva; Alexander J. Smola; Le Song