Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David K. Y. Chiu is active.

Publication


Featured researches published by David K. Y. Chiu.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1987

Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

Andrew K. C. Wong; David K. Y. Chiu

The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.


Bulletin of Mathematical Biology | 1992

A survey of multiple sequence comparison methods.

S. C. Chan; Andrew K. C. Wong; David K. Y. Chiu

Multiple sequence comparison refers to the search for similarity in three or more sequences. This article presents a survey of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular sequences. Emphasis is given to the different approaches of the heuristic methods. Four distance measures derived from information engineering and genetic studies are introduced for the comparison between two alignments of sequences. The use of entropy, which plays a central role in information theory as measures of information, choice and uncertainty, is proposed as a simple measure for the evaluation of the optimality of an alignment in the absence of any a priori knowledge about the structures of the sequences being compared. This article also gives two examples of comparison between alternative alignments of the same set of 5SRNAs as obtained by several different heuristic methods.


IEEE Transactions on Evolutionary Computation | 2006

An evolutionary clustering algorithm for gene expression microarray data analysis

Patrick C. H. Ma; Keith C. C. Chan; Xin Yao; David K. Y. Chiu

Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.


Proteomics | 2009

Application of iTRAQ to catalogue the skeletal muscle proteome in pigs and assessment of effects of gender and diet dephytinization.

Hatam A. Hakimov; Sandra Walters; T.C. Wright; Roy G. Meidinger; Chris P. Verschoor; Moshe A. Gadish; David K. Y. Chiu; Martina V. Strömvik; Cecil W. Forsberg; Serguei Golovan

In this study iTRAQ was used to produce a highly confident catalogue of 542 proteins identified in porcine muscle (false positive<5%). To our knowledge this is the largest reported set of skeletal muscle proteins in livestock. Comparison with human muscle proteome demonstrated a low level of false positives with 83% of the proteins common to both proteomes. In addition, for the first time we assess variations in the muscle proteome caused by sexually dimorphic gene expression and diet dephytinization. Preliminary analysis identified 19 skeletal muscle proteins differentially expressed between male and female pigs (≥1.2‐fold, p<0.05), but only one of them, GDP‐dissociation inhibitor 1, was significant (p<0.05) after false discovery rate correction. Diet dephytinization affected expression of 20 proteins (p<0.05). This study would contribute to an evaluation of the suitability of the pig as a model to study human gender‐related differences in gene expression. Transgenic pigs used in this study might also serve as a useful model to understand changes in human physiology resulting from diet dephytinization.


IEEE Transactions on Systems, Man, and Cybernetics | 1994

Learning sequential patterns for probabilistic inductive prediction

Keith C. C. Chan; Andrew K. C. Wong; David K. Y. Chiu

Suppose we are given a sequence of events that are generated probabilistically in the sense that the attributes of one event are dependent, to a certain extent, on those observed before it. This paper presents an inductive method that is capable of detecting the inherent patterns in such a sequence and to make predictions about the attributes of future events. Unlike previous AI-based prediction methods, the proposed method is particularly effective in discovering knowledge in ordered event sequences even if noisy data are being dealt with. The method can be divided into three phases: (i) detection of underlying patterns in an ordered event sequence; (ii) construction of sequence-generation rules based on the detected patterns; and (iii) use of these rules to predict the attributes of future events. The method has been implemented in a program called OBSERVER-II, which has been tested with both simulated and real-life data. Experimental results indicate that it Is capable of discovering underlying patterns and explaining the behaviour of certain sequence-generation processes that are not obvious or easily understood. The performance of OBSERVER-II has been compared with that of existing AI-based prediction systems, and it is found to be able to successfully solve prediction problems programs such as SPARC have failed on. >


systems man and cybernetics | 1986

Synthesizing Knowledge: A Cluster Analysis Approach Using Event Covering

David K. Y. Chiu; Andrew K. C. Wong

An event-covering method [1] for synthesizing knowledge gathered from empirical observations is presented. Based on the detection of statistically significant events, knowledge is synthesized through the use of a special clustering algorithm. This algorithm, employing a probabilistic information measure and a subsidiary distance, is capable of clustering ordered and unordered discrete-valued data that are subject to noise perturbation. It consists of two phases: cluster initiation and cluster refinement. During cluster initiation, an analysis of the nearest-neighbor distance distribution is performed to select a criterion for merging samples into clusters. During cluster refinement, the samples are regrouped using the event-covering method, which selects subsets of statistically relevant events. For performance evaluation, we tested the algorithm using both simulated data and a set of radiological data collected from normal subjects and spina bifida patients.


Briefings in Functional Genomics | 2016

Advances in long noncoding RNAs: identification, structure prediction and function annotation

Xingli Guo; Lin Gao; Yu Wang; David K. Y. Chiu; Tong Wang; Yue Deng

Abstract Long noncoding RNAs (lncRNAs), generally longer than 200 nucleotides and with poor protein coding potential, are usually considered collectively as a heterogeneous class of RNAs. Recently, an increasing number of studies have shown that lncRNAs can involve in various critical biological processes and a number of complex human diseases. Not only the primary sequences of many lncRNAs are directly interrelated to a specific functional role, strong evidence suggests that their secondary structures are even more interrelated to their known functions. As functional molecules, lncRNAs have become more and more relevant to many researchers. Here, we review recent, state-of-the-art advances in the three levels (the primary sequence, the secondary structure and the function annotation) of the lncRNA research, as well as computational methods for lncRNA data analysis.


Pattern Recognition | 1987

An event-covering method for effective probabilistic inference

Andrew K. C. Wong; David K. Y. Chiu

Abstract The probabilistic approach is useful in many artificial intelligence applications, especially when a certain degree of uncertainty or probabilistic variations exists in either the data or the decision process. The event-covering approach detects statistically significant event associations and can deduce a certain structure of inherent data relationships. By event-covering, we mean the process of covering or selecting statistically significant events which are outcomes in the outcome space of variable pairs, disregarding whether the variables (with regards to the complete outcome space) are statistically significant for inference or not. This approach enables us to tackle two problems well known in many artificial intelligence applications, namely: (1) the selection of useful information inherent in the data when the causal relationship is uncertain or unknown, and (2) the necessity to discover and disregard uncertain events which are erroneous or simply irrelevant. Our proposed method can be applied to a large class of decision-support tasks. By analyzing only the useful statistically significant information extracted from the event-covering process, we can formulate an effective probabilistic inference method applicable to incomplete discrete-valued (symbolic) data. The statistical patterns detected by our method then represent important empirical knowledge gained. To demonstrate the methods effectiveness in solving pattern recognition problems with incomplete data and/or data with high “noise” content (with uncertain and irrelevant events), this method has been evaluated using both simulated and real life biomolecular data.


Journal of Experimental and Theoretical Artificial Intelligence | 1990

Information synthesis based on hierarchical maximum entropy discretization

David K. Y. Chiu; Benny Cheung; Andrew K. C. Wong

Abstract This paper outlines a new approach to the synthesis of information from data. Information is defined as a detected organization of data after a process of discretization (or partitioning) and event covering. The discretization is based on a hierarchical maximum entropy scheme which iteratively minimizes the loss of information according to Shannon. The event-covering process is based on an evaluation of the deviation of the observed frequencies of an event from the expectation due to prior knowledge (defined by the null hypothesis and/or domain knowledge). The hierarchical maximum entropy discretization scheme provides a rigorous and efficient way in solving the non-uniform scaling problem in multivariate data analysis. Because our method refines the boundaries dynamically depending on the detection of information, it directs the analysis on the outcome subspace with high information content. In addition, it naturally produces a hierarchical view of information so that data can be analyzed/synthe...


Comparative Biochemistry and Physiology Part D: Genomics and Proteomics | 2008

Analysis of Sus scrofa liver proteome and identification of proteins differentially expressed between genders, and conventional and genetically enhanced lines

Serguei P. Golovan; Hatam A. Hakimov; Chris P. Verschoor; Sandra Walters; Moshe A. Gadish; Christine G. Elsik; F.S. Schenkel; David K. Y. Chiu; Cecil W. Forsberg

Porcine liver proteome iTRAQ analysis enabled the confident identification of 880 proteins with a rate of false positive identifications of less than 5%. Proteins involved in energy metabolism, catabolism, protein biosynthesis, electron transport, and other oxidoreductase reactions were highly enriched confirming the central role of liver as the major chemical and energy factory. Comparative analysis with human and mouse liver proteomes demonstrated that 80% of proteins were common to all three liver proteomes. In addition, it was also demonstrated that both sex of the animal and introduction of a novel phytase transgene into the genome each affected around 5% of total liver proteome. After controlling the false discovery rate (FDR</=0.1) using the Storey q value only four proteins (EPHX1, CAT, PAH, ST13) were shown to be differentially expressed between genders (Males/Females) and two proteins (SELENBP2, TAGLN) were differentially expressed between two lines (Transgenic/Conventional pigs). Current analysis is the largest proteome analysis for pig and complements the more extensive human and mouse proteome projects.

Collaboration


Dive into the David K. Y. Chiu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Keith C. C. Chan

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Iker Gondra

St. Francis Xavier University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Xu

University of Guelph

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Helen C. Shen

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

C.Y.C. Bie

University of Waterloo

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge