Is this you? Create Your Porfile

Hanjun Dai

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hanjun Dai is active.

Explore More

Publication

Featured researches published by Hanjun Dai.

knowledge discovery and data mining | 2016

Recurrent Marked Temporal Point Processes: Embedding Event History to Vector

Nan Du; Hanjun Dai; Rakshit Trivedi; Utkarsh Upadhyay; Manuel Gomez-Rodriguez; Le Song

Large volumes of event data are becoming increasingly available in a wide variety of applications, such as healthcare analytics, smart cities and social network analysis. The precise time interval or the exact distance between two events carries a great deal of information about the dynamics of the underlying systems. These characteristics make such data fundamentally different from independently and identically distributed data and time-series data where time and space are treated as indexes rather than random variables. Marked temporal point processes are the mathematical framework for modeling event data with covariates. However, typical point process models often make strong assumptions about the generative processes of the event data, which may or may not reflect the reality, and the specifically fixed parametric assumptions also have restricted the expressive power of the respective processes. Can we obtain a more expressive model of marked temporal point processes? How can we learn such a model from massive data? In this paper, we propose the Recurrent Marked Temporal Point Process (RMTPP) to simultaneously model the event timings and the markers. The key idea of our approach is to view the intensity function of a temporal point process as a nonlinear function of the history, and use a recurrent neural network to automatically learn a representation of influences from the event history. We develop an efficient stochastic gradient algorithm for learning the model parameters which can readily scale up to millions of events. Using both synthetic and real world datasets, we show that, in the case where the true models have parametric specifications, RMTPP can learn the dynamics of such models without the need to know the actual parametric forms; and in the case where the true models are unknown, RMTPP can also learn the dynamics and achieve better predictive performance than other parametric alternatives based on particular prior assumptions.

ACM Transactions on Information Systems | 2015

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

Qing Cui; Bin Gao; Jiang Bian; Siyu Qiu; Hanjun Dai; Tie-Yan Liu

Neural network techniques are widely applied to obtain high-quality distributed representations of words (i.e., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Most recent efforts have proposed several efficient methods to learn word embeddings from context such that they can encode both semantic and syntactic relationships between words. However, it is quite challenging to handle unseen or rare words with insufficient context. Inspired by the study on the word recognition process in cognitive psychology, in this article, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both words’ contextual information and morphological knowledge to learn word embeddings. Meanwhile, this new learning architecture is also able to benefit from noisy knowledge and balance between contextual information and morphological knowledge. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.

Bioinformatics | 2017

Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape

Hanjun Dai; Ramzan Umarov; Hiroyuki Kuwahara; Yu Li; Le Song; Xin Gao

Motivation An accurate characterization of transcription factor (TF)‐DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF‐DNA binding affinity landscape still remains a challenging problem. Results Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long‐range dependency in the sequence. A cornerstone of our method is a novel message passing‐like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large‐scale TF‐DNA datasets which were measured by different high‐throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state‐of‐the‐art binding affinity prediction methods. Availability and implementation Our program is freely available at https://github.com/ramzan1990/sequence2vec. Contact [email protected] or [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

national conference on artificial intelligence | 2014