Yunfang Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunfang Wu is active.

Explore More

Publication

Featured researches published by Yunfang Wu.

meeting of the association for computational linguistics | 2007

SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample

Peng Jin; Yunfang Wu; Shiwen Yu

The Multilingual Chinese-English lexical sample task at SemEval-2007 provides a framework to evaluate Chinese word sense disambiguation and to promote research. This paper reports on the task preparation and the results of six participants.

international conference on computational linguistics | 2011

Combining contextual and structural information for supersense tagging of chinese unknown words

Likun Qiu; Yunfang Wu; Yanqiu Shao

Supersense tagging classifies unknown words into semantic categories defined by lexicographers and inserts them into a thesaurus. Previous studies on supersense tagging show that context-based methods perform well for English unknown words while structure-based methods perform well for Chinese unknown words. The challenge before us is how to successfully combine contextual and structural information together for supersense tagging of Chinese unknown words. We propose a simple yet effective approach to address the challenge. In this approach, contextual information is used for measuring contextual similarity between words while structural information is used to filter candidate synonyms and adjusting contextual similarity score. Experiment results show that the proposed approach outperforms the state-of-art context-based method and structure-based method.

international conference on computational linguistics | 2009

Word Clustering for Collocation-Based Word Sense Disambiguation

Peng Jin; Xu Sun; Yunfang Wu; Shiwen Yu

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the F-measure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.

international conference natural language processing | 2008

Exploiting external knowledge sources to improve kernel-based Word Sense Disambiguation

Peng Jin; Fuxin Li; Danqing Zhu; Yunfang Wu; Shiwen Yu

This paper proposes a novel approach to improve the kernel-based word sense disambiguation (WSD). We first explain why linear kernels are more suitable to WSD and many other natural language processing problems than translation-invariant kernels. Based on the linear kernel, two external knowledge sources are integrated. One comprises a set of linguistic rules to find the crucial features. For the other, a distributional similarity thesaurus is used to alleviate data sparseness by generalizing crucial features when they do not match the word-form exactly. The experiments show that we have outperformed the state-of-the-art system on the benchmark data from English lexical sample task of SemEval-2007 and the improvement is statistically significant.

international conference natural language processing | 2008

Disambiguating sentiment ambiguous adjectives

Yunfang Wu; Miao Wang; Peng Jin

This paper makes a systematic study on disambiguating sentiment ambiguous adjectives within context in real text, which is an interaction between word sense disambiguation and sentiment analysis. We firstly address the issue of inter-annotator agreement on assigning semantic orientations to word occurrences in real text. Secondly we demonstrate that co-occurring sentiment monosemous adjectives can not effectively disambiguate sentiment ambiguous adjectives. Then collocation-based disambiguation and support vector machine (SVM) algorithm are exploited on the task of disambiguation. We present a new approach of combining collocation information and SVM to disambiguate sentiment ambiguous words. The experimental results show that the combining approach of Coll+SVM outperforms both collocation-based method and SVM algorithm.

international conference on computational linguistics | 2012

Mining market trend from blog titles based on lexical semantic similarity

Fei Wang; Yunfang Wu

Today blog has become an important medium for people to post their ideas and share new information. And the market trend of pricing Up/Down always draws peoples attention. In this paper, we make a thorough study on mining market trend from blog titles in the field of housing market and stock market, based on lexical semantic similarity. We focus on the automatic extraction and construction of Chinese Up/Down verb lexicon, by using both Chinese and Chinese-English bilingual semantic similarity. The experimental results show that verb lexicon extraction based on semantic similarity is of great use in the task of mining public opinions on market trend, and that the performance of applying English similar words to Chinese verb lexicon extraction is well compared with using Chinese similar words.

linguistic annotation workshop | 2007

Building Chinese Sense Annotated Corpus with the Help of Software Tools

Yunfang Wu; Peng Jin; Tao Guo; Shiwen Yu

This paper presents the building procedure of a Chinese sense annotated corpus. A set of software tools is designed to help human annotator to accelerate the annotation speed and keep the consistency. The software tools include 1) a tagger for word segmentation and POS tagging, 2) an annotating interface responsible for the sense describing in the lexicon and sense annotating in the corpus, 3) a checker for consistency keeping, 4) a transformer responsible for the transforming from text file to XML format, and 5) a counter for sense frequency distribution calculating.

international conference on computational linguistics | 2010