Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jonathan H. Chan is active.

Publication


Featured researches published by Jonathan H. Chan.


Journal of Biomedical Informatics | 2013

Methodological Review: Biomedical text mining and its applications in cancer research

Fei Zhu; Preecha Patumcharoenpol; Cheng Zhang; Yang Yang; Jonathan H. Chan; Asawin Meechai; Wanwipa Vongsangnak; Bairong Shen

Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research.


Neurocomputing | 2015

Pathway activity transformation for multi-class classification of lung cancer datasets

Worrawat Engchuan; Jonathan H. Chan

Pathway-based microarray analysis has been found to be a powerful tool to study disease mechanisms and to identify biological markers of complex diseases like lung cancer. From previous studies, the use of pathway activity transformed from gene expression data has been shown to be more informative in disease classification. However, current works on a pathway activity transformation method are for binary-class classification. In this study, we propose a pathway activity transformation method for multi-class data termed Analysis-of-Variance-based Feature Set (AFS). The classification results of using pathway activity derived from our proposed method show high classification power in three-fold cross-validation and robustness in across dataset validation for all four lung cancer datasets used.


Neural Computing and Applications | 2012

Pathway-based microarray analysis for robust disease classification

Pitak Sootanan; Santitham Prom-on; Asawin Meechai; Jonathan H. Chan

The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.


Journal of Bioinformatics and Computational Biology | 2011

ENHANCING BIOLOGICAL RELEVANCE OF A WEIGHTED GENE CO-EXPRESSION NETWORK FOR FUNCTIONAL MODULE IDENTIFICATION

Santitham Prom-on; Atthawut Chanthaphan; Jonathan H. Chan; Asawin Meechai

Relationships among gene expression levels may be associated with the mechanisms of the disease. While identifying a direct association such as a difference in expression levels between case and control groups links genes to disease mechanisms, uncovering an indirect association in the form of a network structure may help reveal the underlying functional module associated with the disease under scrutiny. This paper presents a method to improve the biological relevance in functional module identification from the gene expression microarray data by enhancing the structure of a weighted gene co-expression network using minimum spanning tree. The enhanced network, which is called a backbone network, contains only the essential structural information to represent the gene co-expression network. The entire backbone network is decoupled into a number of coherent sub-networks, and then the functional modules are reconstructed from these sub-networks to ensure minimum redundancy. The method was tested with a simulated gene expression dataset and case-control expression datasets of autism spectrum disorder and colorectal cancer studies. The results indicate that the proposed method can accurately identify clusters in the simulated dataset, and the functional modules of the backbone network are more biologically relevant than those obtained from the original approach.


BMC Medical Genomics | 2015

Performance of case-control rare copy number variation annotation in classification of autism

Worrawat Engchuan; Kiret Dhindsa; Anath C. Lionel; Stephen W. Scherer; Jonathan H. Chan; Daniele Merico

BackgroundA substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry.ResultsWe analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations.ConclusionsCF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders.


international symposium on neural networks | 2011

Feature selection of pathway markers for microarray-based disease classification using negatively correlated feature sets

Jonathan H. Chan; Pitak Sootanan; Ponlavit Larpeampaisarl

Microarray-based classification of disease states is based on gene expression profiles of subjects. Various methods have been proposed to identify diagnostic markers that can accurately discriminate between two classes such as case and control. Many of the methods that used only a subset of ranked genes in the pathway may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) for identifying phenotype-correlated genes (PCOGs) and inferring pathway activities is used here. The NCFS-based pathway activity inference schemes significantly improved the power of pathway markers to discriminate between normal and cancer, as well as relapse and non-relapse, classes in microarray expression datasets of breast cancer. Furthermore, the use of ranker feature selection methods with top 3 pathway markers has been shown to be suitable for both logistic and NB classifiers. In addition, the proposed single pathway classification (SPC) ranker provided similar performance to the traditional SVM and Relief-F feature selection methods. The identification of PCOGs within each pathway, especially with the use of NCFS based on correlation with ideal markers (NCFS-i), helps to minimize the effect of potentially noisy experimental data, leading to accurate and robust classification results.


The Journal of Supercomputing | 2016

NSPRING: the SPRING extension for subsequence matching of time series supporting normalization

Xueyuan Gong; Simon Fong; Jonathan H. Chan; Sabah Mohammed

Mining sequences and patterns in time series data streams is fast becoming a common practice in today’s world. The rapid progress of data collection and web technologies yields tremendous growth of flowing data in various complex forms that need to be analyzed in real time. Traditional data mining methods that typically require the process data to be scanned repeatedly are not feasible for stream data applications. However, new techniques like SPRING attempt to address these challenges by identifying sequences of patterns on time series streams, thus reducing the complexity to be linear in both time and space. Unfortunately, SPRING does not support data normalization, which renders it to be not applicable for most data sets. In this paper, we are proposing an approach called NSPRING based on SPRING that extends the advantages of SPRING, e.g., low in time and space complexity, while it can support normalization. Furthermore, NSPRING retains similar mining accuracy to SPRING.


congress on evolutionary computation | 2011

Classification-assisted memetic algorithms for solving optimization problems with restricted equality constraint function mapping

Stephanus Daniel Handoko; Kwoh Chee Keong; Ong Yew Soon; Jonathan H. Chan

The success of Memetic Algorithms (MAs) has driven many researchers to be more focused on the efficiency aspect of the algorithms such that it would be possible to effectively employ MAs to solve computationally expensive optimization problems where single evaluation of the objective and constraint functions may require minutes to hours of CPU time. One of the important design issues in MAs is the choice of the individuals upon which local search procedure should be applied. Selecting only some potential individuals lessens the demand for functional evaluations hence accelerates convergence to the global optimum. In recent years, advances have been made targeting optimization problems with single equality constraint h(x) = 0. The presence of previously evaluated candidate solutions with different signs of constraint values within some localities thus allows the estimation of the constraint boundary. An individual will undergo local search only if it is sufficiently close to the approximated boundary. Elegant as it may seem, the approach had unfortunately assumed that every constraint function maps the design variables to optimize into unbounded real values. This, however, may not always be the case in practice. In this paper, we present a strategy to efficiently solve constrained problems with a single equality constraint; the function of which maps the design variables into restricted (either strictly non-negative or strictly non-positive) real values only.


PeerJ | 2016

An integrated text mining framework for metabolic interaction network reconstruction

Preecha Patumcharoenpol; Narumol Doungpan; Asawin Meechai; Bairong Shen; Jonathan H. Chan; Wanwipa Vongsangnak

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.


advances in information technology | 2012

Multimedia Delivery for Elderly People: A Conceptual Model

Jutarat Choomkasean; Pornchai Mongkolnam; Jonathan H. Chan

The number of elderly people living alone is increasing proportionally as the ageing population increases. As elderly people are considered to have a greater risk for loneliness, depression, and decreased mobility, they may need someone else’s help for proper health and social care. That concern is particularly important for those living alone. Most of them become less active and less focused in their daily lives. The receiving of care would largely depend on how many family members and friends the elderly has, and whether or not they live close by to the elderly. This research proposes a conceptual model to deliver multimedia to the elderly people in order to lessen the aforementioned problems. The content is sent via the Internet from family members’ and friends’ devices such as smartphones and tablet PCs to the elderly people’s television set. The proposed model is simple yet efficient and effective and is intended to increase the quality of life (QoL) of the ageing population.

Collaboration


Dive into the Jonathan H. Chan's collaboration.

Top Co-Authors

Avatar

Asawin Meechai

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Worrawat Engchuan

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Santitham Prom-on

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Pornchai Mongkolnam

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Narumol Doungpan

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Sissades Tongsima

Thailand National Science and Technology Development Agency

View shared research outputs
Top Co-Authors

Avatar

Pitak Sootanan

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Vajirasak Vanijja

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Praisan Padungweang

King Mongkut's University of Technology Thonburi

View shared research outputs
Top Co-Authors

Avatar

Thammarsat Visutarrom

King Mongkut's University of Technology Thonburi

View shared research outputs
Researchain Logo
Decentralizing Knowledge