Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Deyu Zhou is active.

Publication


Featured researches published by Deyu Zhou.


IEEE Transactions on Knowledge and Data Engineering | 2009

Discriminative Training of the Hidden Vector State Model for Semantic Parsing

Deyu Zhou; Yulan He

In this paper, we discuss how discriminative training can be applied to the hidden vector state (HVS) model in different task domains. The HVS model is a discrete hidden Markov model (HMM) in which each HMM state represents the state of a push-down automaton with a finite stack size. In previous applications, maximum-likelihood estimation (MLE) is used to derive the parameters of the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold. Discriminative training, without making such assumptions, can improve the performance of the HVS model by discriminating the correct hypothesis from the competing hypotheses. Experiments have been conducted in two domains: the travel domain for the semantic parsing task using the DARPA Communicator data and the Air Travel Information Services (ATIS) data and the bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a relative error reduction rate of 31 percent in F-measure when compared with MLE on the DARPA Communicator data and 9 percent on the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4 percent in F-measure is achieved on the GENIA corpus.


international conference on computational science | 2006

Extracting protein-protein interactions from the literature using the hidden vector state model

Deyu Zhou; Yulan He; Chee Keong Kwoh

In the field of bioinformatics in solving biological problems, the huge amount of knowledge is often locked in textual documents such as scientific publications. Hence there is an increasing focus on extracting information from this vast amount of scientific literature. In this paper, we present an information extraction system which employs a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. Unlike other hierarchical parsing models which require fully annotated treebank data for training, the HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure needed to robustly extract task domain semantics. When applied in extracting protein-protein interactions information from medical literature, we found that it performed better than other established statistical methods and achieved 47.9% and 72.8% in recall and precision respectively.


International Journal of Bioinformatics Research and Applications | 2008

Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model

Deyu Zhou; Yulan He; Chee Keong Kwoh

A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5% in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.


pattern recognition in bioinformatics | 2007

Ant-MST: an ant-based minimum spanning tree for gene expression data clustering

Deyu Zhou; Yulan He; Chee Keong Kwoh; Hao Wang

We have proposed an ant-based clustering algorithm for document clustering based on the travelling salesperson scenario. In this paper, we presented an approach called Ant-MST for gene expression data clustering based on both ant-based clustering and minimum spanning trees (MST). The ant-based clustering algorithm is firstly used to construct a fully connected network of nodes. Each node represents one gene, and every edge is associated with a certain level of pheromone intensity describing the co-expression level between two genes. Then MST is used to break the linkages in order to generate clusters. Comparing to other MST-based clustering approaches, our proposed method uses pheromone intensity to measure the similarity between two genes instead of using Euclidean distance or correlation distance. Pheromone intensities associated with every edge in a fully-connected network records the collective memory of the ants. Self-organizing behavior could be easily discovered through pheromone intensities. Experimental results on three gene expression datasets show that our approach in general outperforms the classical clustering methods such as K-means and agglomerate hierarchical clustering.


international conference on computational linguistics | 2008

A Hybrid Generative/Discriminative Framework to Train a Semantic Parser from an Un-annotated Corpus

Deyu Zhou; Yulan He

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences.


Computational Intelligence in Biomedicine and Bioinformatics | 2008

From biomedical literature to knowledge:mining protein-protein interactions

Deyu Zhou; Yulan He; Chee Keong Kwoh

To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.


international conference on data mining | 2008

Ontology-Based Protein-Protein Interactions Extraction from Literature Using the Hidden Vector State Model

Yulan He; Keiichi Nakata; Deyu Zhou

This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the hidden vector state (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.


international conference on computational science | 2007

Training the Hidden Vector State Model from Un-annotated Corpus

Deyu Zhou; Yulan He; Chee Keong Kwoh

Since most knowledge about protein-protein interactions still hides in biological publications, there is an increasing focus on automatically extracting information from the vast amount of biological literature. Existing approaches can be broadly categorized as rule-based or statistically-based. Rule-based approaches require heavy manual effort. On the other hand, statistically-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. We have proposed a hidden vector state (HVS) model for protein-protein interactions extraction. The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. State transitions are factored into a stack shift operation similar to those of a push-down automaton followed by the push of a new preterminal category label. In this paper, we propose a novel approach based on the k-nearest-neighbors classifier to automatically train the HVS model from un-annotated data. Experimental results show the improved performance over the baseline system with the HVS model trained from a small amount of the annotated data.


computational intelligence and data mining | 2007

Semi-supervised Learning of the Hidden Vector State Model for Protein-Protein Interactions Extraction

Deyu Zhou; Yulan He; Chee Keong Kwoh

A major challenge in text mining for biology and biomedicine is automatically extracting protein-protein interactions from the vast amount of biological literature since most knowledge about them still hides in biological publications. Existing approaches can be broadly categorized as rule-based or statistical-based. Rule-based approaches require heavy manual efforts. On the other hand, statistical-based approaches require large-scale, richly annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. The hidden vector state (HVS) model, an extension of the basic discrete Markov model, has been successfully applied to extract protein-protein interactions. In this paper, we propose a novel approach to train the HVS model on both annotated and un-annotated corpus. Sentences selection algorithm is designed to utilize the semantic parsing results of the un-annotated corpus generated by the HVS model. Experimental results show that the performance of the initial HVS model trained on a small amount of the annotated data can be improved by employing this approach


Artificial Intelligence in Medicine | 2007

Semi-supervised learning of the hidden vector state model for extracting protein-protein interactions

Deyu Zhou; Yulan He; Chee Keong Kwoh

Collaboration


Dive into the Deyu Zhou's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chee Keong Kwoh

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Hao Wang

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge