Man Lan
East China Normal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Man Lan.
international conference on computational linguistics | 2014
Jiang Zhao; Tian Tian Zhu; Man Lan
This paper presents our approach to semantic relatedness and textual entailment subtasks organized as task 1 in SemEval 2014. Specifically, we address two questions: (1) Can we solve these two subtasks together? (2) Are features proposed for textual entailment task still effective for semantic relatedness task? To address them, we extracted seven types of features including text difference measures proposed in entailment judgement subtask, as well as common text similarity measures used in both subtasks. Then we exploited the same feature set to solve the both subtasks by considering them as a regression and a classification task respectively and performed a study of influence of different features. We achieved the first and the second rank for relatedness and entailment task respectively.
international world wide web conferences | 2012
Zhi-Min Zhou; Man Lan; Zheng-Yu Niu; Yue Lu
Answer ranking is very important for cQA services due to the high variance in the quality of answers. Most existing works in this area focus on using various features or employing machine learning techniques to address this problem. Only a few of them noticed and involved user profile information in this particular task. In this work, we assume the close relationship between user profile information and the quality of their answers under the ground truth that user information records the user behaviors and histories as a summary. Thus, we exploited the effectiveness of three categories of user profile information, i.e. engagement-related, authority-related and level-related, on answer ranking in cQA. Different from previous work, we only employed the information which is easy to extract without any limitations, such as user privacy. Experimental results on Yahoo! Answers manner questions showed that our system by using the user profile information achieved comparable or even better results over the state-of-the-art baseline system. Moreover, we found that the picture existence of a user in cQA community contributed more than other information in the answer ranking task.
Journal of Biomedical Informatics | 2009
Man Lan; Chew Lim Tan; Jian Su
Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.
international conference on acoustics, speech, and signal processing | 2008
Shiliang Sun; Man Lan; Yue Lu
Classification of time-varying electrophysiological signals is an important problem in the development of brain-computer interfaces (BCIs). Designing adaptive classifiers is a potential way to address this task. In this paper, Bayesian classifiers with Gaussian mixture models (GMMs) are adopted as the decision rule to classify electroencephalogram (EEG) signals. The stochastic approximation method (SAM) is used as the specific gradient descent method for updating the parameters of mean values and covariance matrices in the distribution of GMMs, where the parameters are simultaneously updated in a batch mode. Experimental results using data from a BCI show that the stochastic approximation method is effective for EEG classification tasks.
international conference on computational linguistics | 2014
Jiang Zhao; Man Lan; Tian Tian Zhu
Microblogging websites (such as Twitter, Facebook) are rich sources of data for opinion mining and sentiment analysis. In this paper, we describe our approaches used for sentiment analysis in twitter (task 9) organized in SemEval 2014. This task tries to determine whether the sentiment orientations conveyed by the whole tweets or pieces of tweets are positive, negative or neutral. To solve this problem, we extracted several simple and basic features considering the following aspects: surface text, syntax, sentiment score and twitter characteristic. Then we exploited these features to build a classifier using SVM algorithm. Despite the simplicity of features, our systems rank above the average.
conference on information and knowledge management | 2011
Mitra Mohtarami; Hadi Amiri; Man Lan; Chew Lim Tan
Opinion question answering (QA) requires automatic and correct interpretation of an answer relative to its question. However, the ambiguity that often exists in the question-answer pairs causes complexity in interpreting the answers. This paper aims to infer yes/no answers from indirect yes/no question-answer pairs (IQAPs) that are ambiguous due to the presence of ambiguous sentiment adjectives. We propose a method to measure the uncertainty of the answer in an IQAP relative to its question. In particular, to infer the yes or no response from an IQAP, our method employs antonyms, synonyms, word sense disambiguation as well as the semantic association between the sentiment adjectives that appear in the IQAP. Extensive experiments demonstrate the effectiveness of our method over the baseline.
international symposium on neural networks | 2012
Yu Xu; Man Lan; Yue Lu; Zheng Yu Niu; Chew Lim Tan
Implicit discourse relation classification is a challenge task due to missing discourse connective. Some work directly adopted machine learning algorithms and linguistically informed features to address this task. However, one interesting solution is to automatically predict implicit discourse connective. In this paper, we present a novel two-step machine learning-based approach to implicit discourse relation classification. We first use machine learning method to automatically predict the discourse connective that can best express the implicit discourse relation. Then the predicted implicit discourse connective is used to classify the implicit discourse relation. Experiments on Penn Discourse Treebank 2.0 (PDTB) and Biomedical Discourse Relation Bank (BioDRB) show that our method performs better than the baseline system and previous work.
Proceedings of the CoNLL-16 shared task | 2016
Jianxiang Wang; Man Lan
This paper describes our two discourse parsers (i.e., English discourse parser and Chinese discourse parser) for submission to CoNLL-2016 shared task on Shallow Discourse Parsing. For English discourse parser, we build two separate argument extractors for single sentence (SS) case, and adopt a convolutional neural network for Non-Explicit sense classification based on (Wang and Lan, 2015b)’s work. As for Chinese discourse parser, we build a pipeline system following the annotation procedure of Chinese Discourse Treebank in (Zhou and Xue, 2015). Our English discourse parser achieves better performance than the best system of CoNLL2015 and the Chinese discourse parser achieves encouraging results. Our two parsers both rank second on the blind datasets.
international symposium on neural networks | 2014
Jiang Zhao; Man Lan; Zheng-Yu Niu; Donghong Ji
Cross-lingual textual entailment is a relatively new problem that detects the entailment relationship between two text fragments written in different languages. Previous work adopted machine learning algorithms and similarity measures as features to address this task. In order to overcome the high cost of human annotation and further improve the recognition performance, we present a novel co-training approach to solve this problem. We first use an off-the-shelf machine translation tool to eliminate the language gap between two texts. Then we measure the similarities and differences between two texts and regard them as sufficient and redundant views. We use those two views to conduct the co-training procedure to perform classification. Besides, a new effective Kullback-Leibler (KL) based criterion is proposed to select the results from all possible iterations. Experiments on cross-lingual datasets provided by SemEval 2013 show that our method significantly outperforms the baseline systems and previous work.
conference on computational natural language learning | 2015
Jianxiang Wang; Man Lan
The CoNLL-2015 shared task focuses on shallow discourse parsing, which takes a piece of newswire text as input and returns the discourse relations in a PDTB style. In this paper, we describe our discourse parser that participated in the shared task. We use 9 components to construct the whole parser to identify discourse connectives, label arguments and classify the sense of Explicit or Non-Explicit relations in free texts. Compared to previous discourse parser, new components and features are added in our system, which further improves the overall performance of the discourse parser. Our parser ranks the first on two test datasets, i.e., PDTB Section 23 and a blind test dataset.