Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tianfang Yao is active.

Publication


Featured researches published by Tianfang Yao.


international universal communication symposium | 2010

Combining dependency parsing with shallow semantic analysis for Chinese opinion-element relation identification

Mosha Chen; Tianfang Yao

Sentiment analysis is an important subtask for Opinion Mining, among which how to identify the opinion-element relation between a topic and a sentiment modifying it is an essential step. This paper presents a novel method to identify the opinion-element relation based on the dependency parsing analysis as well as shallow semantic analysis, using an ontology dictionary and a collocation database to take full consideration of the semantic behind the topic and sentiment. The experiment result shows that compared to the baseline our method can further improve both the recall and precision by 7.38% and 1.4% respectively on the annotated corpus. Also we conduct experiments on COAE20081 public corpus to prove its generality. Finally this paper also offers a simple but efficient method to construct and perfect the collocation database for further use.


international conference on advanced language processing and web information technology | 2007

Kernel-based Sentiment Classification for Chinese Sentence

Linlin Li; Tianfang Yao

Recent years have seen a large growth in the online customer reviews. Classifying these reviews into positive or negative ones would be helpful in business intelligence applications and recommender systems. This paper aims to solve the sentiment classification at a fine-grained level, i.e. the sentence level. The challenging aspect of this problem that distinguishes it from the traditional classification problem is that sentiment expression is more free-style. Classification features are more difficult to determine. In this paper, we propose a kernel-based method to make it is feasible for incorporating multiple features from word, n-gram and syntactic levels. Experiment results show that our method is effective, and it outperforms the very competitive n-gram method.


Proceedings of the Second SIGHAN Workshop on Chinese Language Processing | 2003

CHINERS: A Chinese Named Entity Recognition System for the Sports Domain

Tianfang Yao; Wei Ding; Gregor Erbach

In the investigation for Chinese named entity (NE) recognition, we are confronted with two principal challenges. One is how to ensure the quality of word segmentation and Part-of-Speech (POS) tagging, because its consequence has an adverse impact on the performance of NE recognition. Another is how to flexibly, reliably and accurately recognize NEs. In order to cope with the challenges, we propose a system architecture which is divided into two phases. In the first phase, we should reduce word segmentation and POS tagging errors leading to the second phase as much as possible. For this purpose, we utilize machine learning techniques to repair such errors. In the second phase, we design Finite State Cascades (FSC) which can be automatically constructed depending on the recognition rule sets as a shallow parser for the recognition of NEs. The advantages of that are reliable, accurate and easy to do maintenance for FSC. Additionally, to recognize special NEs, we work out the corresponding strategies to enhance the correctness of the recognition. The experimental evaluation of the system has shown that the total average recall and precision for six types of NEs are 83% and 85% respectively. Therefore, the system architecture is reasonable and effective.


computer science and information engineering | 2009

A Kernel-Based Sentiment Classification Approach for Chinese Sentences

Tianfang Yao; Linlin Li

There has been a large growth of online opinioned customer reviews in the recent years. Classifying such reviews into polarized ones would be beneficial in business intelligence and other application domains. This paper aims at finding a solution for the sentiment classification at a fine-grained level, namely the sentence level. The challenge is that because a sentiment expression is more free-style, it is more difficult to determine classification features. Therefore, we propose a kernel-based machine learning approach to make it feasible for incorporating multiple features from lexical and syntactic levels. The experiment results have shown that our approach is effective and outperforms the very competitive n-gram method.


international conference on machine learning and cybernetics | 2002

The model design of a case-based reasoning multilingual natural language interface for database

Dongmo Zhang; Huanye Sheng; Fang Li; Tianfang Yao

Multilingual natural language interface for database (NLIDB) constitutes the primary factor on multilingual information retrieval system. This paper presents a multilingual NLIDB model based on case, which is motivated by the idea of case-based reasoning in machine learning. The model avoids the difficulties of constructing parsers for all intended supporting natural languages by storing every query pattern and its solution as a case into a casebase. Query sentence inputted by user is syntactically compared to cases in the casebase and the solution of the most similar case is reused to query the database. Each case is represented as a XML document fragment and the casebase is a valid XML document. All the facilities provided. by XML greatly enhanced the maintainability and scalability of the model. The model has been implemented in a multilingual NLIDB for a stock market information retrieval system.


Archive | 2002

Correcting Word Segmentation and Part-of-Speech Tagging Errors for Chinese Named Entity Recognition

Tianfang Yao; Wei Ding; Gregor Erbach

In the exploration of Chinese named entity recognition for a specific domain, the authors found that the errors caused during word segmentation and part-of-speech (POS) tagging have obstructed the improvement of the recognition performance. In order to further enhance recognition recall and precision, the authors propose an error correction approach for Chinese named entity recognition. In the error correction component, transformation-based machine learning is adopted because it is suitable to fix Chinese word segmentation and POS tagging errors and produce effective correcting rules automatically. The Chinese named entity recognition component utilizes Finite-State Cascades which are automatically constructed by POS rules with semantic constraints. A prototype system, CNERS (Chinese Named Entity Recognition System), has been implemented. The experimental result shows that the recognition performance of most named entities have significantly been improved. On the other hand, the system is also fast and reliable.


semantics, knowledge and grid | 2006

An Efficient Token-based Approach for Web-Snippet Clustering

Jianchao Li; Tianfang Yao

Online clustering of the results returned by search engines becomes prevailing in recent times. It addresses the problem of too many records returned by current search engines, which renders the manual search of actually desired information difficult, especially if the query encompasses several subtopics. Clustering is a useful technique to group records to clusters and thereby make it more convenient to retrieve information of interest. We first propose an innovative approach by using tokens as basic units for clustering, which avoids segmentation for oriental languages and can be applied to any language. Second, we introduce a Directed Probability Graph (DPG) model that identifies meaningful phrases as cluster labels using statistical methods without any external knowledge. The clustering procedure is performed without calculating the similarity between pair-wise documents. As shown by our experiments, our clustering algorithm is very efficient and suitable for online Web-snippet clustering.


meeting of the association for computational linguistics | 2006

Chinese Named Entity and Relation Identification System

Tianfang Yao; Hans Uszkoreit

In this interactive presentation, a Chinese named entity and relation identification system is demonstrated. The domain-specific system has a three-stage pipeline architecture which includes word segmentation and part-of-speech (POS) tagging, named entity recognition, and named entity relation identitfication. The experimental results have shown that the average F-measure for word segmentation and POS tagging after correcting errors achieves 92.86 and 90.01 separately. Moreover, the overall average F-measure for 6 kinds of name entities and 14 kinds of named entity relations is 83.08% and 70.46% respectively.


meeting of the association for computational linguistics | 2005

A Novel Machine Learning Approach for the Identification of Named Entity Relations

Tianfang Yao; Hans Uszkoreit

In this paper, a novel machine learning approach for the identification of named entity relations (NERs) called positive and negative case-based learning (PNCBL) is proposed. It pursues the improvement of the identification performance for NERs through simultaneously learning two opposite cases and automatically selecting effective multi-level linguistic features for NERs and non-NERs. This approach has been applied to the identification of domain-specific and cross-sentence NERs for Chinese texts. The experimental results have shown that the overall average recall, precision, and F-measure for 14 NERs are 78.50%, 63.92% and 70.46% respectively. In addition, the above F-measure has been enhanced from 63.61% to 70.46% due to adoption of both positive and negative cases.


international conference on machine learning and cybernetics | 2002

Repairing errors for Chinese word segmentation and part-of-speech tagging

Tianfang Yao; Wei Ding; Gregor Erbach

For improving the recognition performance of Chinese named entities, transformation based machine learning has been introduced to repair errors caused during word segmentation and part-of-speech (POS) tagging. Since Chinese is not a segmented language, the words in a sentence must be segmented before they are processed by consequent Chinese named entity recognition components. Similarly, POS tagging is also an important fundamental task for Chinese named entity recognition. In order to enhance the quality of word segmentation and POS tagging, it is necessary to explore different approaches for improving the performance. One of the approaches is to repair errors as much as possible, if word segmentation and POS tagging tool is available on hand. This paper aims at introducing an effective error repairer using transformation based error-driven machine learning technique. It deals with detecting error positions, producing error repairing rules, selecting higher-score rules, ordering rules and distinguishing rule usage conditions, etc. The experimental results show that word segmentation and POS tagging errors are significantly reduced and the performance has been improved.

Collaboration


Dive into the Tianfang Yao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dongmo Zhang

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Fang Li

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Huanye Sheng

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Linlin Li

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Mosha Chen

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jianchao Li

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Jun Liu

Shanghai Jiao Tong University

View shared research outputs
Researchain Logo
Decentralizing Knowledge