Xuanjing Huang
Fudan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xuanjing Huang.
empirical methods in natural language processing | 2015
Xinchi Chen; Xipeng Qiu; Chenxi Zhu; Pengfei Liu; Xuanjing Huang
Currently most of state-of-the-art methods for Chinese word segmentation are based on supervised learning, whose features aremostly extracted from a local context. Thesemethods cannot utilize the long distance information which is also crucial for word segmentation. In this paper, we propose a novel neural network model for Chinese word segmentation, which adopts the long short-term memory (LSTM) neural network to keep the previous important information inmemory cell and avoids the limit of window size of local context. Experiments on PKU, MSRA and CTB6 benchmark datasets show that our model outperforms the previous neural network models and state-of-the-art methods.
international acm sigir conference on research and development in information retrieval | 2010
Qi Zhang; Yue Zhang; Haomin Yu; Xuanjing Huang
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since partial-duplicates only contain a small piece of text taken from other sources and most existing near-duplicate detection approaches focus on document level, partial duplicates can not be dealt with well. In this paper, we propose a novel algorithm to realize the partial-duplicate detection task. Besides the similarities between documents, our proposed algorithm can simultaneously locate the duplicated parts. The main idea is to divide the partial-duplicate detection task into two subtasks: sentence level near-duplicate detection and sequence matching. For evaluation, we compare the proposed method with other approaches on both English and Chinese web collections. Experimental results appear to support that our proposed method is effectively and efficiently to detect both partial-duplicates on large web collections.
international joint conference on natural language processing | 2015
Xinchi Chen; Xipeng Qiu; Chenxi Zhu; Xuanjing Huang
Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability of alleviating the burden of manual feature engineering. However, the previous neural models cannot extract the complicated feature compositions as the traditional methods with discrete features. In this paper, we propose a gated recursive neural network (GRNN) for Chinese word segmentation, which contains reset and update gates to incorporate the complicated combinations of the context characters. Since GRNN is relative deep, we also use a supervised layer-wise training method to avoid the problem of gradient diffusion. Experiments on the benchmark datasets show that our model outperforms the previous neural network models as well as the state-of-the-art methods.
empirical methods in natural language processing | 2015
Pengfei Liu; Xipeng Qiu; Xinchi Chen; Shiyu Wu; Xuanjing Huang
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, it is still a challenge task to model long texts, such as sentences and documents. In this paper, we propose a multi-timescale long short-termmemory (MT-LSTM) neural network to model long texts. MTLSTM partitions the hidden states of the standard LSTM into several groups. Each group is activated at different time periods. Thus, MT-LSTM can model very long documents as well as short sentences. Experiments on four benchmark datasets show that our model outperforms the other neural models in text classification task.
international acm sigir conference on research and development in information retrieval | 2009
Qi Zhang; Yuanbin Wu; Tao Li; Mitsunori Ogihara; Joseph Johnson; Xuanjing Huang
This paper presents a novel method for mining product reviews, where it mines reviews by identifying product features, expressions of opinions and relations between them. By taking advantage of the fact that most of product features are phrases, a concept of shallow dependency parsing is introduced, which extends traditional dependency parsing to phrase level. This concept is then implemented for extracting relation between product features and expressions of opinions. Experimental evaluations show that the mining task can benefit from shallow dependency parsing.
international joint conference on natural language processing | 2015
Chenxi Zhu; Xipeng Qiu; Xinchi Chen; Xuanjing Huang
In this work, we address the problem to model all the nodes (words or phrases) in a dependency tree with the dense representations. We propose a recursive convolutional neural network (RCNN) architecture to capture syntactic and compositional-semantic representations of phrases and words in a dependency tree. Different with the original recursive neural network, we introduce the convolution and pooling layers, which can model a variety of compositions by the feature maps and choose the most informative compositions by the pooling layers. Based on RCNN, we use a discriminative model to re-rank a
international conference on machine learning | 2009
Xian Qian; Xiaoqian Jiang; Qi Zhang; Xuanjing Huang; Lide Wu
k
meeting of the association for computational linguistics | 2017
Pengfei Liu; Xipeng Qiu; Xuanjing Huang
-best list of candidate dependency parsing trees. The experiments show that RCNN is very effective to improve the state-of-the-art dependency parsing on both English and Chinese datasets.
international joint conference on natural language processing | 2005
Zhushuo Zhang; Yaqian Zhou; Xuanjing Huang; Lide Wu
In real sequence labeling tasks, statistics of many higher order features are not sufficient due to the training data sparseness, very few of them are useful. We describe Sparse Higher Order Conditional Random Fields (SHO-CRFs), which are able to handle local features and sparse higher order features together using a novel tractable exact inference algorithm. Our main insight is that states and transitions with same potential functions can be grouped together, and inference is performed on the grouped states and transitions. Though the complexity is not polynomial, SHO-CRFs are still efficient in practice because of the feature sparseness. Experimental results on optical character recognition and Chinese organization name recognition show that with the same higher order feature set, SHO-CRFs significantly outperform previous approaches.
meeting of the association for computational linguistics | 2016
Pengfei Liu; Xipeng Qiu; Jifan Chen; Xuanjing Huang
Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \url{this http URL}