Lin Gui
Harbin Institute of Technology Shenzhen Graduate School
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lin Gui.
meeting of the association for computational linguistics | 2014
Lin Gui; Ruifeng Xu; Qin Lu; Jun Xu; Jian Xu; Bin Liu; Xiaolong Wang
Transfer learning has been used in opinion analysis to make use of available language resources for other resource scarce languages. However, the cumulative class noise in transfer learning adversely affects performance when more training data is used. In this paper, we propose a novel method in transductive transfer learning to identify noises through the detection of negative transfers. Evaluation on NLP&CC 2013 cross-lingual opinion analysis dataset shows that our approach outperforms the state-of-the-art systems. More significantly, our system shows a monotonic increase trend in performance improvement when more training data are used.
Knowledge Based Systems | 2017
Lin Gui; Yu Zhou; Ruifeng Xu; Yulan He; Qin Lu
There have been increasing interests in natural language processing to explore effective methods in learning better representations of text for sentiment classification in product reviews. However, most existing methods do not consider subtle interplays among words appeared in review text, authors of reviews and products the reviews are associated with. In this paper, we make use of a heterogeneous network to model the shared polarity in product reviews and learn representations of users, products they commented on and words they used simultaneously. The basic idea is to first construct a heterogeneous network which links users, products, words appeared in product reviews, as well as the polarities of the words. Based on the constructed network, representations of nodes are learned using a network embedding method, which are subsequently incorporated into a convolutional neural network for sentiment analysis. Evaluations on the product reviews, including IMDB, Yelp 2013 and Yelp 2014 datasets, show that the proposed approach achieves the state-of-the-art performance.
Database | 2016
Qikang Wei; Tao Chen; Ruifeng Xu; Yulan He; Lin Gui
The recognition of disease and chemical named entities in scientific articles is a very important subtask in information extraction in the biomedical domain. Due to the diversity and complexity of disease names, the recognition of named entities of diseases is rather tougher than those of chemical names. Although there are some remarkable chemical named entity recognition systems available online such as ChemSpot and tmChem, the publicly available recognition systems of disease named entities are rare. This article presents a system for disease named entity recognition (DNER) and normalization. First, two separate DNER models are developed. One is based on conditional random fields model with a rule-based post-processing module. The other one is based on the bidirectional recurrent neural networks. Then the named entities recognized by each of the DNER model are fed into a support vector machine classifier for combining results. Finally, each recognized disease named entity is normalized to a medical subject heading disease name by using a vector space model based method. Experimental results show that using 1000 PubMed abstracts for training, our proposed system achieves an F1-measure of 0.8428 at the mention level and 0.7804 at the concept level, respectively, on the testing data of the chemical-disease relation task in BioCreative V. Database URL: http://219.223.252.210:8080/SS/cdr.html
World Wide Web | 2015
Ruifeng Xu; Lin Gui; Jun Xu; Qin Lu; Kam-Fai Wong
Fine grained opinion analysis has much higher demand for annotated corpus which makes high quality analysis difficult when there are insufficient resources. In this paper we explore the use of cross lingual resources for opinion mining for resource poor languages. This paper presents a novel approach for cross lingual opinion holder extraction through leveraging finely annotated opinion corpus selectively from a source language as the supplementary training samples for the target language. Firstly, the opinion corpus in the source language with fine grained annotations are translated and projected to the target language to generate the training samples. Then, a classifier based on multi-kernel Support Vector Machines (SVMs) is developed to identify opinion holders in the target language, which uses a tree kernel based on syntactic features and a polynomial kernel based on semantic features, respectively. The two kernels are further improved by incorporating a pivot function based on word pair similarity. To reduce the noise of low quality translated samples, a Transfer learning algorithm is applied to select high quality translated samples iteratively for training the multi-kernel classifiers on the target language. Evaluations on transferring MPQA, an English opinion corpus (as the source language), to Chinese opinion analysis (as the target language) show that the opinion holder extraction performance on NTCIR-7 MOAT dataset is improved, which is higher than the Conditional Random Fields (CRFs) based approach and most reported systems in NTCIR-7 MOAT evaluation.
NLPCC | 2013
Lin Gui; Ruifeng Xu; Jun Xu; Li Yuan; Yuanlin Yao; Jiyun Zhou; Qiaoyun Qiu; Shuwei Wang; Kam-Fai Wong; Ricky Cheung
The performances of machine learning based opinion analysis systems are always puzzled by the insufficient training opinion corpus. Such problem becomes more serious for the resource-poor languages. Thus, the cross-lingual opinion analysis (CLOA) technique, which leverages opinion resources on one (source) language to another (target) language for improving the opinion analysis on target language, attracts more research interests. Currently, the transfer learning based CLOA approach sometimes falls to over fitting on single language resource, while the performance of the co-training based CLOA approach always achieves limited improvement during bi-lingual decision. Target to these problems, in this study, we propose a mixed CLOA model, which estimates the confidence of each monolingual opinion analysis system by using their training errors through bilingual transfer self-training and co-training, respectively. By using the weighted average distances between samples and classification hyper-planes as the confidence, the opinion polarity of testing samples are classified. The evaluations on NLP&CC 2013 CLOA bakeoff dataset show that this approach achieves the best performance, which outperforms transfer learning and co-training based approaches.
bioinformatics and biomedicine | 2016
Jiyun Zhou; Qin Lu; Ruifeng Xu; Lin Gui; Hongpeng Wang
Protein-DNA complexes play crucial roles in gene regulation. The prediction of the residues involved in protein-DNA interactions is critical for understanding gene regulation. Although many methods have been proposed, most of them overlooked motif features. Motif features are sub sequences and are important for the recognition between a protein and DNA. In order to efficiently use motif features for the prediction of DNA-binding residues, we first apply the Convolutional Neural Network (CNN) method to capture the motif features from the sequences around the target residues. CNN modeling consists of a set of learnable motif detectors that can capture the important motif features by scanning the sequences around the target residues. Then we use a neural network classifier, referred to as CNNsite, by combining the captured motif features, sequence features and evolutionary features to predict binding residues from sequences. The datasets PDNA-62 and PDNA-224 are used to evaluate the performance of CNNsite by five-fold cross-validation. Performance evaluation shows that the motif features performs better than sequence features and evolutionary features with at least 6.73% on ST, 0.097 on MCC and 0.069 on AUC. When comparing with previously published methods, CNNsite performs better with at least 0.019 on MCC, 4.37% on ST and 0.040 on AUC. CNNsite is also evaluated on an independent dataset TS-72 and CNNsite outperforms the previous methods by at least 0.012 on AUC. The discriminant powers of the motif features of size from 2 to 6 residues show that many motif features with large discriminant power are composed by the residues that play important roles in the DNA-protein interactions. The standalone version of the CNNsite is available at http://hlt.hitsz.edu.cn:8080/CNNsite/.
empirical methods in natural language processing | 2016
Lin Gui; Dongyin Wu; Ruifeng Xu; Qin Lu; Yu Zhou
In this paper, we present our work in emotion cause extraction. Since there is no open dataset available, the lack of annotated resources has limited the research in this area. Thus, we first present a dataset we built using SINA city news. The annotation is based on the scheme of the W3C Emotion Markup Language. Second, we propose a 7-tuple definition to describe emotion cause events. Based on this general definition, we propose a new event-driven emotion cause extraction method using multi-kernel SVMs where a syntactical tree based approach is used to represent events in text. A convolution kernel based multikernel SVM are used to extract emotion causes. Because traditional convolution kernels do not use lexical information at the terminal nodes of syntactic trees, we modify the kernel function with a synonym based improvement. Even with very limited training data, we can still extract sufficient features for the task. Evaluations show that our approach achieves 11.6% higher F-measure compared to referenced methods. The contributions of our work include resource construction, concept definition and algorithm development.
NLPCC | 2014
Lin Gui; Li Yuan; Ruifeng Xu; Bin Liu; Qin Lu; Yu Zhou
To identify the cause of emotion is a new challenge for researchers in nature language processing. Currently, there is no existing works on emotion cause detection from Chinese micro-blogging (Weibo) text. In this study, an emotion cause annotated corpus is firstly designed and developed through anno- tating the emotion cause expressions in Chinese Weibo Text. Up to now, an emotion cause annotated corpus which consists of the annotations for 1,333 Chinese Weibo is constructed. Based on the observations on this corpus, the characteristics of emotion cause expression are identified. Accordingly, a rule- based emotion cause detection method is developed which uses 25 manually complied rules. Furthermore, two machine learning based cause detection me- thods are developed including a classification-based method using support vec- tor machines and a sequence labeling based method using conditional random fields model. It is the largest available resources in this research area. The expe- rimental results show that the rule-based method achieves 68.30% accuracy rate. Furthermore, the method based on conditional random fields model achieved 77.57% accuracy which is 37.45% higher than the reference baseline method. These results show the effectiveness of our proposed emotion cause detection method.
empirical methods in natural language processing | 2017
Lin Gui; Jiannan Hu; Yulan He; Ruifeng Xu; Lu Qin; Jiachen Du
Emotion cause extraction aims to identify the reasons behind a certain emotion expressed in text. It is a much more difficult task compared to emotion classification. Inspired by recent advances in using deep memory networks for question answering (QA), we propose a new approach which considers emotion cause identification as a reading comprehension task in QA. Inspired by convolutional neural networks, we propose a new mechanism to store relevant context in different memory slots to model context information. Our proposed approach can extract both word level sequence features and lexical features. Performance evaluation shows that our method achieves the state-of-the-art performance on a recently released emotion cause dataset, outperforming a number of competitive baselines by at least 3.01% in F-measure.
NLPCC/ICCPOL | 2016
Ruifeng Xu; Yu Zhou; Dongyin Wu; Lin Gui; Jiachen Du; Yun Xue
This paper presents the overview of the shared task, stance detection in Chinese microblogs, in NLPCC-ICCPOL 2016. The submitted systems are expected to automatically determine whether the author of a Chinese microblog is in favor of the given target, against the given target, or whether neither inference is likely. Different from regular evaluation tasks on sentiment analysis, the microblog text may or may not contain the target of interest, and the opinion expressed may or may not be towards to the target of interest. We designed two tasks. Task A is a mandatory supervised task which detects stance towards five targets of interest with given labeled data. Task B is an optional unsupervised task which gives only unlabeled data. Our shared task has had sixteen team participants for Task A and five results of Task B. The highest F-score obtained was 0.7106 for Task A and 0.4687 for Task B, respectively.