Zheng-Yu Niu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zheng-Yu Niu is active.

Explore More

Publication

Featured researches published by Zheng-Yu Niu.

meeting of the association for computational linguistics | 2005

Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

Shortage of manually sense-tagged data is an obstacle to supervised word sense disambiguation methods. In this paper we investigate a label propagation based semi-supervised learning algorithm for WSD, which combines labeled and unlabeled data in learning process to fully realize a global consistency assumption: similar examples should have similar labels. Our experimental results on benchmark corpora indicate that it consistently outperforms SVM when only very few labeled examples are available, and its performance is also better than monolingual bootstrapping, and comparable to bilingual bootstrapping.

meeting of the association for computational linguistics | 2006

Relation Extraction Using Label Propagation Based Semi-Supervised Learning

Jinxiu Chen; Dong-Hong Ji; Chew Lim Tan; Zheng-Yu Niu

Shortage of manually labeled data is an obstacle to supervised relation extraction methods. In this paper we investigate a graph based semi-supervised learning algorithm, a label propagation (LP) algorithm, for relation extraction. It represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph, and tries to obtain a labeling function to satisfy two constraints: 1) it should be fixed on the labeled nodes, 2) it should be smooth on the whole graph. Experiment results on the ACE corpus showed that this LP algorithm achieves better performance than SVM when only very few labeled examples are available, and it also performs better than bootstrapping for the relation extraction task.

meeting of the association for computational linguistics | 2007

I2R: Three Systems for Word Sense Discrimination, Chinese Word Sense Disambiguation, and English Word Sense Disambiguation

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

This paper describes the implementation of our three systems at SemEval-2007, for task 2 (word sense discrimination), task 5 (Chinese word sense disambiguation), and the first subtask in task 17 (English word sense disambiguation). For task 2, we applied a cluster validation method to estimate the number of senses of a target word in untagged data, and then grouped the instances of this target word into the estimated number of clusters. For both task 5 and task 17, We used the label propagation algorithm as the classifier for sense disambiguation. Our system at task 2 achieved 63.9% F-score under unsupervised evaluation, and 71.9% supervised recall with supervised evaluation. For task 5, our system obtained 71.2% micro-average precision and 74.7% macro-average precision. For the lexical sample subtask for task 17, our system achieved 86.4% coarse-grained precision and recall.

conference on information and knowledge management | 2004

Document clustering based on cluster validation

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

This paper presents a cluster validation based document clustering algorithm, which is capable of identifying both important feature words and true model order (cluster number). Important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and cluster number which maximize the cluster validity criterion are chosen as our answer. We have applied our algorithm to several datasets from 20Newsgroup corpus. Experimental results show that our algorithm can find important feature subset, estimate the model order and yield higher micro-averaged precision than other four document clustering algorithms which require cluster number to be provided.

meeting of the association for computational linguistics | 2006

Unsupervised Relation Disambiguation Using Spectral Clustering

Jinxiu Chen; Dong-Hong Ji; Chew Lim Tan; Zheng-Yu Niu

This paper presents an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. It works by calculating eigen-vectors of an adjacency graphs Laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. Experiment results on ACE corpora show that this spectral clustering based approach outperforms the other clustering methods.

north american chapter of the association for computational linguistics | 2006

Semi-supervised Relation Extraction with Label Propagation

Jinxiu Chen; Dong-Hong Ji; Chew Lim Tan; Zheng-Yu Niu

To overcome the problem of not having enough manually labeled relation instances for supervised relation extraction methods, in this paper we propose a label propagation (LP) based semi-supervised learning algorithm for relation extraction task to learn from both labeled and unlabeled data. Evaluation on the ACE corpus showed when only a few labeled examples are available, our LP based relation extraction can achieve better performance than SVM and another bootstrapping method.

Information Processing and Management | 2007

Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

This paper presents a cluster validation based document clustering algorithm, which is capable of identifying an important feature subset and the intrinsic value of model order (cluster number). The important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and the cluster number which maximize the cluster validity criterion are chosen as our answer. We have evaluated our algorithm using several datasets from the 20Newsgroup corpus. Experimental results show that our algorithm can find the important feature subset, estimate the cluster number and achieve higher micro-averaged precision than previous document clustering algorithms which require the value of cluster number to be provided.

empirical methods in natural language processing | 2005

A Semi-Supervised Feature Clustering Algorithm with Application to Word Sense Disambiguation

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

In this paper we investigate an application of feature clustering for word sense disambiguation, and propose a semisupervised feature clustering algorithm. Compared with other feature clustering methods (ex. supervised feature clustering), it can infer the distribution of class labels over (unseen) features unavailable in training data (labeled data) by the use of the distribution of class labels over (seen) features available in training data. Thus, it can deal with both seen and unseen features in feature clustering process. Our experimental results show that feature clustering can aggressively reduce the dimensionality of feature space, while still maintaining state of the art sense disambiguation accuracy. Furthermore, when combined with a semi-supervised WSD algorithm, semi-supervised feature clustering outperforms other dimensionality reduction techniques, which indicates that using unlabeled data in learning process helps to improve the performance of feature clustering and sense disambiguation.

Computer Speech & Language | 2007

Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation

Zheng-Yu Niu; Dong-Hong Ji; Chew Lim Tan

Previous partially supervised classification methods can partition unlabeled data into positive examples and negative examples for a given class by learning from positive labeled examples and unlabeled examples, but they cannot further group the negative examples into meaningful clusters even if there are many different classes in the negative examples. Here we proposed an automatic method to obtain a natural partitioning of mixed data (labeled data+unlabeled data) by maximizing a stability criterion defined on classification results from an extended label propagation algorithm over all the possible values of model order (or the number of classes) in mixed data. Our experimental results on benchmark corpora for word sense disambiguation task indicate that this model order identification algorithm with the extended label propagation algorithm as the base classifier outperforms SVM, a one-class partially supervised classification algorithm, and the model order identification algorithm with semi-supervised k-means clustering as the base classifier when labeled data is incomplete.

asia information retrieval symposium | 2006

Multi-document summarization using a clustering-based hybrid strategy

Yu Nie; Donghong Ji; Lingpeng Yang; Zheng-Yu Niu; Tingting He

In this paper we propose a clustering-based hybrid approach for multi-document summarization which integrates sentence clustering, local recommendation and global search. For sentence clustering, we adopt a stability-based method which can determine the optimal cluster number automatically. We weight sentences with terms they contain for local sentence recommendation of each cluster. For global selection, we propose a global criterion to evaluate overall performance of a summary. Thus the sentences in the final summary are determined by not only the configuration of individual clusters but also the overall performance. This approach successfully gets top-level performance running on corpus of DUC04.

Explore More