Si Quang Le
Japan Advanced Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Si Quang Le.
knowledge discovery and data mining | 2003
Tu Bao Ho; Trong Dung Nguyen; Saori Kawasaki; Si Quang Le; Dung Duc Nguyen; Hideto Yokoi; Katsuhiko Takabayashi
The hepatitis temporal database collected at Chiba university hospital between 1982--2001 was recently given to challenge the KDD research. The database is large where each patient corresponds to 983 tests represented as sequences of irregular timestamp points with different lengths. This paper presents a temporal abstraction approach to mining knowledge from this hepatitis database. Exploiting hepatitis background knowledge and data analysis, we introduce new notions and methods for abstracting short-term changed and long-term changed tests. The abstracted data allow us to apply different machine learning methods for finding knowledge part of which is considered as new and interesting by medical doctors.
Pattern Recognition Letters | 2005
Si Quang Le; Tu Bao Ho
In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.
New Generation Computing | 2007
Tu Bao Ho; Canh Hao Nguyen; Saori Kawasaki; Si Quang Le; Katsuhiko Takabayashi
Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and relations between events in viral hepatitis such as “event A slightly happened before event B and B simultaneously ended with event C”. We developed algorithms to first detect significant temporal patterns in temporal sequences and then to identify temporal relations between these temporal patterns. Many findings by data mining methods applied to transactions/graphs of temporal relations shown to be significant by physician evaluation and matching with published in Medline.
discovery science | 2004
Si Quang Le; Tu Bao Ho
In this paper we propose a solution to the similarity measuring for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statistical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method.
knowledge discovery and data mining | 2006
Si Quang Le; Tu Bao Ho; Le Sy Vinh
Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. This paper presents a dissimilarity measure for categorical data based on the relations among attributes. This measure not only has the advantage of value variance but also overcomes the limitations of condition the probability-based measure when applied to databases whose attributes are independent. Experiments with 30 databases also showed that the proposed measure boosted the accuracy of Nearest Neighbor classification in comparison with other tested measures.
Archive | 2005
Tu Bao Ho; Trong Dung Nguyen; Saori Kawasaki; Si Quang Le
Medicine has been a traditional domain for artificial intelligence (AI) research and application. It can be observed that the focus on expert systems (ES) in medicine in early days of AI has been changed to intelligent data analysis (IDA) in medicine, especially by machine learning and data mining techniques [Kononenko 01], [Lavrac et al. 97], [Cios 01]. At least, two reasons for the new trend are the bottleneck of knowledge acquisition and the explosive growth of medical databases. Intelligent data analysis in medicine has its own features because of the characteristics of medical data. These characteristics include the incompleteness (missing values), incorrectness (noise in data), sparseness (few and/or non-representable patient records available), and inexactness (inappropriate selection of parameters for a given task). Moreover, medical databases are characterized by the particular constraints and difficulties of the privacy-sensitive, heterogeneous, but voluminous, data of medicine [Cios and Moore 02].
Genome Informatics | 2004
Si Quang Le; Tu Bao Ho; T.T Hang Phan
Studies in health technology and informatics | 2007
Katsuhiko Takabayashi; Tu Bao Ho; Hideto Yokoi; Trong Dung Nguyen; Saori Kawasaki; Si Quang Le; Takahiro Suzuki; Osamu Yokosuka
Lecture Notes in Computer Science | 2006
Si Quang Le; Tu Bao Ho; Le Sy Vinh
知識ベ-スシステム研究会 | 2004
Tu Bao Ho; Si Quang Le; Canh Hao Nguyen