Is this you? Create Your Porfile

Si Quang Le

Japan Advanced Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Si Quang Le is active.

Explore More

Publication

Featured researches published by Si Quang Le.

knowledge discovery and data mining | 2003

Mining hepatitis data with temporal abstraction

Tu Bao Ho; Trong Dung Nguyen; Saori Kawasaki; Si Quang Le; Dung Duc Nguyen; Hideto Yokoi; Katsuhiko Takabayashi

The hepatitis temporal database collected at Chiba university hospital between 1982--2001 was recently given to challenge the KDD research. The database is large where each patient corresponds to 983 tests represented as sequences of irregular timestamp points with different lengths. This paper presents a temporal abstraction approach to mining knowledge from this hepatitis database. Exploiting hepatitis background knowledge and data analysis, we introduce new notions and methods for abstracting short-term changed and long-term changed tests. The abstracted data allow us to apply different machine learning methods for finding knowledge part of which is considered as new and interesting by medical doctors.

Pattern Recognition Letters | 2005

An association-based dissimilarity measure for categorical data

Si Quang Le; Tu Bao Ho

In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier.

New Generation Computing | 2007

Exploiting temporal relations in mining hepatitis data

Tu Bao Ho; Canh Hao Nguyen; Saori Kawasaki; Si Quang Le; Katsuhiko Takabayashi

Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and relations between events in viral hepatitis such as “event A slightly happened before event B and B simultaneously ended with event C”. We developed algorithms to first detect significant temporal patterns in temporal sequences and then to identify temporal relations between these temporal patterns. Many findings by data mining methods applied to transactions/graphs of temporal relations shown to be significant by physician evaluation and matching with published in Medline.

discovery science | 2004

Measuring the similarity for heterogenous data: An ordered probability-based approach

Si Quang Le; Tu Bao Ho

In this paper we propose a solution to the similarity measuring for heterogenous data. The key idea is to consider the similarity of a given attribute-value pair as the probability of picking randomly a value pair that is less similar than or equally similar in terms of order relations defined appropriately for data types. Similarities of attribute value pairs are then integrated into similarities between data objects using a statistical method. Applying our method in combination with distance-based clustering to real data shows the merit of our proposed method.

knowledge discovery and data mining | 2006

Association-Based dissimilarity measures for categorical data: limitation and improvement

Si Quang Le; Tu Bao Ho; Le Sy Vinh

Measuring the similarity for categorical data is a challenging task in data mining due to the poor structure of categorical data. This paper presents a dissimilarity measure for categorical data based on the relations among attributes. This measure not only has the advantage of value variance but also overcomes the limitations of condition the probability-based measure when applied to databases whose attributes are independent. Experiments with 30 databases also showed that the proposed measure boosted the accuracy of Nearest Neighbor classification in comparison with other tested measures.

Archive | 2005

Combining Temporal Abstraction and Data Mining Methods in Medical Data Analysis

Tu Bao Ho; Trong Dung Nguyen; Saori Kawasaki; Si Quang Le

Medicine has been a traditional domain for artificial intelligence (AI) research and application. It can be observed that the focus on expert systems (ES) in medicine in early days of AI has been changed to intelligent data analysis (IDA) in medicine, especially by machine learning and data mining techniques [Kononenko 01], [Lavrac et al. 97], [Cios 01]. At least, two reasons for the new trend are the bottleneck of knowledge acquisition and the explosive growth of medical databases. Intelligent data analysis in medicine has its own features because of the characteristics of medical data. These characteristics include the incompleteness (missing values), incorrectness (noise in data), sparseness (few and/or non-representable patient records available), and inexactness (inappropriate selection of parameters for a given task). Moreover, medical databases are characterized by the particular constraints and difficulties of the privacy-sensitive, heterogeneous, but voluminous, data of medicine [Cios and Moore 02].

Genome Informatics | 2004