Zhi Jin
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhi Jin.
empirical methods in natural language processing | 2015
Yan Xu; Lili Mou; Ge Li; Yunchuan Chen; Hao Peng; Zhi Jin
Relation classification is an important research arena in the field of natural language processing (NLP). In this paper, we present SDP-LSTM, a novel neural network to classify the relation of two entities in a sentence. Our neural architecture leverages the shortest dependency path (SDP) between two entities; multichannel recurrent neural networks, with long short term memory (LSTM) units, pick up heterogeneous information along the SDP. Our proposed model has several distinct features: (1) The shortest dependency paths retain most relevant information (to relation classification), while eliminating irrelevant words in the sentence. (2) The multichannel LSTM networks allow effective information integration from heterogeneous sources over the dependency paths. (3) A customized dropout strategy regularizes the neural network to alleviate overfitting. We test our model on the SemEval 2010 relation classification task, and achieve an
meeting of the association for computational linguistics | 2016
Lili Mou; Rui Men; Ge Li; Yan Xu; Lu Zhang; Rui Yan; Zhi Jin
F_1
empirical methods in natural language processing | 2015
Lili Mou; Hao Peng; Ge Li; Yan Xu; Lu Zhang; Zhi Jin
-score of 83.7%, higher than competing methods in the literature.
knowledge science engineering and management | 2015
Hao Peng; Lili Mou; Ge Li; Yuxuan Liu; Lu Zhang; Zhi Jin
In this paper, we propose the TBCNN-pair model to recognize entailment and contradiction between two sentences. In our model, a tree-based convolutional neural network (TBCNN) captures sentence-level semantics; then heuristic matching layers like concatenation, element-wise product/difference combine the information in individual sentences. Experimental results show that our model outperforms existing sentence encoding-based approaches by a large margin.
empirical methods in natural language processing | 2016
Lili Mou; Zhao Meng; Rui Yan; Ge Li; Yan Xu; Lu Zhang; Zhi Jin
This paper proposes a tree-based convolutional neural network (TBCNN) for discriminative sentence modeling. Our models leverage either constituency trees or dependency trees of sentences. The tree-based convolution process extracts sentences structural features, and these features are aggregated by max pooling. Such architecture allows short propagation paths between the output layer and underlying feature detectors, which enables effective structural feature learning and extraction. We evaluate our models on two tasks: sentiment analysis and question classification. In both experiments, TBCNN outperforms previous state-of-the-art results, including existing neural networks and dedicated feature/rule engineering. We also make efforts to visualize the tree-based convolution process, shedding light on how our models work.
empirical methods in natural language processing | 2015
Hao Peng; Lili Mou; Ge Li; Yunchuan Chen; Yangyang Lu; Zhi Jin
Deep learning has made significant breakthroughs in various fields of artificial intelligence. However, it is still virtually impossible to use deep learning to analyze programs since deep architectures cannot be trained effectively with pure back propagation. In this pioneering paper, we propose the coding criterion to build program vector representations, which are the premise of deep learning for program analysis. We evaluate the learned vector representations both qualitatively and quantitatively. We conclude, based on the experiments, the coding criterion is successful in building program representations. To evaluate whether deep learning is beneficial for program analysis, we feed the representations to deep neural networks, and achieve higher accuracy in the program classification task than shallow methods. This result confirms the feasibility of deep learning to analyze programs.
conference on information and knowledge management | 2016
Lili Mou; Ran Jia; Yan Xu; Ge Li; Lu Zhang; Zhi Jin
Transfer learning is aimed to make use of valuable knowledge in a source domain to help model performance in a target domain. It is particularly important to neural networks, which are very likely to be overfitting. In some fields like image processing, many studies have shown the effectiveness of neural network-based transfer learning. For neural NLP, however, existing studies have only casually applied transfer learning, and conclusions are inconsistent. In this paper, we conduct systematic case studies and provide an illuminating picture on the transferability of neural networks in NLP.
meeting of the association for computational linguistics | 2016
Yunchuan Chen; Lili Mou; Yan Xu; Ge Li; Zhi Jin
This paper aims to compare different regularization strategies to address a common phenomenon, severe overfitting, in embedding-based neural networks for NLP. We chose two widely studied neural models and tasks as our testbed. We tried several frequently applied or newly proposed regularization strategies, including penalizing weights (embeddings excluded), penalizing embeddings, re-embedding words, and dropout. We also emphasized on incremental hyperparameter tuning, and combining different regularizations. The results provide a picture on tuning hyperparameters for neural NLP models.
knowledge science, engineering and management | 2016
Zhao Meng; Lili Mou; Ge Li; Zhi Jin
Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling word embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, so that we can reduce model complexity by a large margin as well as retain high accuracy, achieving a good compromise between efficiency and performance. Experiments reveal the phenomenon that distilling knowledge from cumbersome embeddings is better than directly training neural networks with small embeddings.
knowledge science, engineering and management | 2016
Yangyang Lu; Ge Li; Rui Miao; Zhi Jin
Neural networks are among the state-of-the-art techniques for language modeling. Existing neural language models typically map discrete words to distributed, dense vector representations. After information processing of the preceding context words by hidden layers, an output layer estimates the probability of the next word. Such approaches are time- and memory-intensive because of the large numbers of parameters for word embeddings and the output layer. In this paper, we propose to compress neural language models by sparse word representations. In the experiments, the number of parameters in our model increases very slowly with the growth of the vocabulary size, which is almost imperceptible. Moreover, our approach not only reduces the parameter space to a large extent, but also improves the performance in terms of the perplexity measure.