Canh Hao Nguyen
Kyoto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Canh Hao Nguyen.
Pattern Recognition | 2008
Canh Hao Nguyen; Tu Bao Ho
We study the problem of evaluating the goodness of a kernel matrix for a classification task. As kernel matrix evaluation is usually used in other expensive procedures like feature and model selections, the goodness measure must be calculated efficiently. Most previous approaches are not efficient except for kernel target alignment (KTA) that can be calculated in O(n^2) time complexity. Although KTA is widely used, we show that it has some serious drawbacks. We propose an efficient surrogate measure to evaluate the goodness of a kernel matrix based on the data distributions of classes in the feature space. The measure not only overcomes the limitations of KTA but also possesses other properties like invariance, efficiency and an error bound guarantee. Comparative experiments show that the measure is a good indication of the goodness of a kernel matrix.
Briefings in Bioinformatics | 2016
Ahmed Mohamed; Canh Hao Nguyen; Hiroshi Mamitsuka
Research in natural products has always enhanced drug discovery by providing new and unique chemical compounds. However, recently, drug discovery from natural products is slowed down by the increasing chance of re-isolating known compounds. Rapid identification of previously isolated compounds in an automated manner, called dereplication, steers researchers toward novel findings, thereby reducing the time and effort for identifying new drug leads. Dereplication identifies compounds by comparing processed experimental data with those of known compounds, and so, diverse computational resources such as databases and tools to process and compare compound data are necessary. Automating the dereplication process through the integration of computational resources has always been an aspired goal of natural product researchers. To increase the utilization of current computational resources for natural products, we first provide an overview of the dereplication process, and then list useful resources, categorizing into databases, methods and software tools and further explaining them from a dereplication perspective. Finally, we discuss the current challenges to automating dereplication and proposed solutions.
New Generation Computing | 2007
Tu Bao Ho; Canh Hao Nguyen; Saori Kawasaki; Si Quang Le; Katsuhiko Takabayashi
Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and relations between events in viral hepatitis such as “event A slightly happened before event B and B simultaneously ended with event C”. We developed algorithms to first detect significant temporal patterns in temporal sequences and then to identify temporal relations between these temporal patterns. Many findings by data mining methods applied to transactions/graphs of temporal relations shown to be significant by physician evaluation and matching with published in Medline.
2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies | 2008
Hiroaki Tanabe; Tu Bao Ho; Canh Hao Nguyen; Saori Kawasaki
Complex biological data generated from various experiments are stored in diverse data types in multiple datasets. By appropriately representing each biological dataset as a kernel matrix then combining them in solving problems, the kernel-based approach has become a spotlight in data integration and its application in bioinformatics and other fields as well. While linear combination of unweighed multiple kernels (UMK) is popular, there have been effort on multiple kernel learning (MKL) where optimal weights are learned by semi-definite programming or sequential minimal optimization (SMO-MKL). These methods provide high accuracy of biological prediction problems, but very complicated and hard to use, especially for non-experts in optimization. These methods are also usually of high computational cost and not suitable for large data sets. In this paper, we propose two simple but effective methods for determining weights for conic combination of multiple kernels. The former is to learn optimal weights formulated by our measure FSM for kernel matrix evaluation (feature space-based kernel matrix evaluation measure), denoted by FSM-MKL. The latter assigns a weight to each kernel that is proportional to the quality of the kernel, determining by direct cross validation, named proportionally weighted multiple kernels (PWMK). Experimental comparative evaluation of the four methods UMK, SMO-MKL, FSM-MKL and PWMK for the problem of protein-protein interactions shows that our proposed methods are simpler, more efficient but still effective. They achieved performances almost as high as that of MKL and higher than that of UMK.
IEEE Transactions on Neural Networks | 2012
Canh Hao Nguyen; Hiroshi Mamitsuka
Predicting new links in a network is a problem of interest in many application domains. Most of the prediction methods utilize information on the networks entities, such as nodes, to build a model of links. Network structures are usually not used except for networks with similarity or relatedness semantics. In this paper, we use network structures for link prediction with a more general network type with latent feature models. The problem with these models is the computational cost to train the models directly for large data. We propose a method to solve this problem using kernels and cast the link prediction problem into a binary classification problem. The key idea is not to infer latent features explicitly, but to represent these features implicitly in the kernels, making the method scalable to large networks. In contrast to the other methods for latent feature models, our method inherits all the advantages of the kernel framework: optimality, efficiency, and nonlinearity. On sparse graphs, we show that our proposed kernels are close enough to the ideal kernels defined directly on latent features. We apply our method to real data of protein-protein interaction and gene regulatory networks to show the merits of our method.
european conference on machine learning | 2011
Canh Hao Nguyen; Hiroshi Mamitsuka
Predicting new links in a network is a problem of interest in many application domains. Most of the prediction methods utilize information on the networks entities such as nodes to build a model of links. Network structures are usually not used except for the networks with similarity or relatedness semantics. In this work, we use network structures for link prediction with a more general network type with latent feature models. The problem is the difficulty to train these models directly for large data. We propose a method to solve this problem using kernels and cast the link prediction problem into a binary classification problem. The key idea is not to infer latent features explicitly, but to represent these features implicitly in the kernels, making the method scalable to large networks. In contrast to the other methods for latent feature models, our method inherits all the advantages of kernel framework: optimality, efficiency and nonlinearity. We apply our method to real data of protein-protein interactions to show the merits of our method.
Bioinformatics | 2016
Ahmed Mohamed; Canh Hao Nguyen; Hiroshi Mamitsuka
UNLABELLED The popularity of using NMR spectroscopy in metabolomics and natural products has driven the development of an array of NMR spectral analysis tools and databases. Particularly, web applications are well used recently because they are platform-independent and easy to extend through reusable web components. Currently available web applications provide the analysis of NMR spectra. However, they still lack the necessary processing and interactive visualization functionalities. To overcome these limitations, we present NMRPro, a web component that can be easily incorporated into current web applications, enabling easy-to-use online interactive processing and visualization. NMRPro integrates server-side processing with client-side interactive visualization through three parts: a python package to efficiently process large NMR datasets on the server-side, a Django App managing server-client interaction, and SpecdrawJS for client-side interactive visualization. AVAILABILITY AND IMPLEMENTATION Demo and installation instructions are available at http://mamitsukalab.org/tools/nmrpro/ CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
european conference on machine learning | 2005
Canh Hao Nguyen; Tu Bao Ho
Imbalanced data learning has recently begun to receive much attention from research and industrial communities as traditional machine learners no longer give satisfactory results. Solutions to the problem generally attempt to adapt standard learners to the imbalanced data setting. Basically, higher weights are assigned to small class examples to avoid their being overshadowed by the large class ones. The difficulty determining a reasonable weight for each example remains. In this work, we propose a scheme to weight examples of the small class based solely on local data distributions. The approach is for categorical data, and a rule learning algorithm is constructed taking the weighting scheme into account. Empirical evaluations prove the advantages of this approach.
Bioinformatics | 2014
Ahmed Mohamed; Timothy Hancock; Canh Hao Nguyen; Hiroshi Mamitsuka
UNLABELLED NetPathMiner is a general framework for mining, from genome-scale networks, paths that are related to specific experimental conditions. NetPathMiner interfaces with various input formats including KGML, SBML and BioPAX files and allows for manipulation of networks in three different forms: metabolic, reaction and gene representations. NetPathMiner ranks the obtained paths and applies Markov model-based clustering and classification methods to the ranked paths for easy interpretation. NetPathMiner also provides static and interactive visualizations of networks and paths to aid manual investigation. AVAILABILITY The package is available through Bioconductor and from Github at http://github.com/ahmohamed/NetPathMiner.
IEEE Transactions on Neural Networks | 2014
Canh Hao Nguyen; Nicolas Wicker; Hiroshi Mamitsuka
Graph cut is a common way of clustering nodes on similarity graphs. As a clustering method, it does not give a unique solution under usually used loss functions. We specifically show the problem in similarity graph-based clustering setting that the resulting clusters might be even disconnected. This is counter-intuitive as one wish to have good clustering solutions in the sense that each cluster is well connected and the clusters are balanced. The key property of good clustering solutions is that the resulting graphs (after clustering) share large components with the original ones. We wish to detect this case by deriving a graph similarity measure that shows high similarity values to the original graph for good clustering solutions. The similarity measure considers global connectivities of graphs by treating graphs as distributions in (potentially different) Euclidean spaces. The global graph comparison is then turned into distribution comparison. Simulation shows that the similarity measure could consistently distinguish different qualities of clustering solution beyond what could be done with the usually used loss functions of clustering algorithms.