Kanoksri Sarinnapakorn
University of Miami
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kanoksri Sarinnapakorn.
acm international workshop on multimedia databases | 2003
Mei Ling Shyu; Shu-Ching Chen; Min Chen; Chengcui Zhang; Kanoksri Sarinnapakorn
Recent research effort in Content-Based Image Retrieval (CBIR) focuses on bridging the gap between low-level features and high-level semantic contents of images as this gap has become the bottleneck of CBIR. In this paper, an effective image database retrieval framework using a new mechanism called the Markov Model Mediator (MMM) is presented to meet this demand by taking into consideration not only the low-level image features, but also the high-level concepts learned from the history of users access pattern and access frequencies on the images in the database. Also, the proposed framework is efficient in two aspects: 1) Overhead for real-time training is avoided in the image retrieval process because the high-level concepts of images are captured in the off-line training process. 2) Before the exact similarity matching process, Principal Component Analysis (PCA) is applied to reduce the image search space. A training subsystem for this framework is implemented and integrated into our system. The experimental results demonstrate that the MMM mechanism can effectively assist in retrieving more accurate results from image databases.
international workshop on research issues in data engineering | 2005
Mei Ling Shyu; Kanoksri Sarinnapakorn; Indika Kuruppu-Appuhamilage; Shu-Ching Chen; LiWu Chang; Thomas Goldring
Computer network data stream used in intrusion detection usually involve many data types. A common data type is that of symbolic or nominal features. Whether being coded into numerical values or not, nominal features need to be treated differently from numeric features. This paper studies the effectiveness of two approaches in handling nominal features: a simple coding scheme via the use of indicator variables and a scaling method based on multiple correspondence analysis (MCA). In particular, we apply the techniques with two anomaly detection methods: the principal component classifier (PCC) and the Canberra metric. The experiments with KDD 1999 data demonstrate that MCA works better than the indicator variable approach for both detection methods with the PCC coming much ahead of the Canberra metric.
intelligent data analysis | 2014
Peerapon Vateekul; Miroslav Kubat; Kanoksri Sarinnapakorn
Hierarchical multi-label classification is a relatively new research topic in the field of classifier induction. What dis- tinguishes it from earlier tasks is that it allows each example to belong to two or more classes at the same time, and by assuming that the classes are mutually related by generalization/specialization operators. The paper first investigates the problem of per- formance evaluation in these domains. After this, it proposes a new induction system, HR-SVM, built around support vector machines. In our experiments, we demonstrate that this systems performance compares favorably with that earlier attempts, and then we proceed to an investigation of how HR-SVMs individual modules contribute to the overall systems behavior. As a testbed, we use a set of benchmark domains from the field of gene-function prediction.
Applied Artificial Intelligence | 2008
Kanoksri Sarinnapakorn; Miroslav Kubat
Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a “baseline induction algorithm” on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a documents classes, a “master classifier” combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.
international conference on data mining | 2009
Peerapon Vateekul; Kanoksri Sarinnapakorn
Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called “Imputation Tree” (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called “Missing Pattern Tree” (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.
Multimedia Tools and Applications | 2007
Mei Ling Shyu; Shu-Ching Chen; Min Chen; Chengcui Zhang; Kanoksri Sarinnapakorn
In this paper, we present a mechanism called Markov Model Mediator (MMM) to facilitate the efficient and effective capturing of high-level image concepts in content-based image retrieval (CBIR). MMM serves as the retrieval engine of the CBIR system and uses affinity-based similarity measures. This mechanism is effective in capturing subjective user concepts in that it not only takes into consideration the global image features, but also learns the high-level concepts of the images from the history of user access patterns and access frequencies on the images in the image database, which differentiates it from the common methods in CBIR. The advantage of our proposed mechanism is that it exploits the richness in the structured description of visual contents as well as the relative affinity relationships among the images. Consequently, it provides the capability to bridge the gap between the low-level features and the high-level concepts. This mechanism is also efficient in that it integrates Principal Component Analysis (PCA) to significantly reduce the image search space at a low cost before performing exact similarity matching. An off-line training subsystem for this framework was implemented and integrated into our system. The experimental results demonstrate that MMM can effectively capture user’s high-level concept more quickly.
asian conference on intelligent information and database systems | 2014
Pattarasai Markpeng; Piraya Wongnimmarn; Nattarat Champreeda; Peerapon Vateekul; Kanoksri Sarinnapakorn
Climate change has increased the number of occurrences of extreme events around the world. Warning and monitoring system is very important for reducing the damage of disasters. The performance of the warning system relies heavily on the quality of data from automated telemetry system ATS and the accuracy of the predicting system. Traditional quality management systems cannot discover complicated cases, such as outliers, missing patterns, and inhomogeneity. This paper proposes novel procedures to handle these complex issues in hydrological data focusing on water level. In the proposed system, DBSCAN, which is a clustering algorithm, is applied to discover outliers and missing patterns. The experimental results show that the system outperforms a statistical criterion, mean ± n×SD, where n is a constant. Also, all missing patterns can perfectly be discovered by our approach. For the inhomogeneity problem, several statistical approaches are compared. The comparison results suggest that the best homogenization tool is changepoint, a method based on F-test.
international joint conference on computer science and software engineering | 2015
Nuttapon Pattanavijit; Peerapon Vateekul; Kanoksri Sarinnapakorn
Hydro and Agro Informatics Institute (HAII) has installed more than 800 telemetry stations across Thailand to collect water level data for operation tasks and researches, e.g., flooding prevention system. To have an accurate result, it is crucial to control the quality of data by detecting and filtering out anomalies. In our previous work, a data quality management system to capture various types of errors was proposed. However, the algorithms to detect outliers and missing patterns are based on DBSCAN, which requires complicated implementation and excessive computational cost. In this paper, we present a novel clustering algorithm specially designed for water-level data called “Linear Clustering. ” Compared to DBSCAN, it is not only much easier to develop, but it also requires less computational time without losing any detection accuracies. An analysis of the runtime showed that the proposed algorithm requires linear time. Experiments were conducted on large scale water-level data. For outlier detection, the new method took only 3 seconds on 30,000 records of data, while the previous work took 261 seconds. For missing pattern detection, although there is no difference in runtime, Linear Clusterings code is uncomplicated, and therefore it requires less developing time.
Advances in Machine Learning II | 2010
Miroslav Kubat; Kanoksri Sarinnapakorn; Sareewan Dendamrongvit
Automated classification of text documents has two distinctive aspects. First, each training or testing example can be labeled with more than two classes at the same time—this has serious consequences not only for the induction algorithms, but also for how we evaluate the performance of the induced classifier. Second, the examples are usually described by great many attributes, which makes induction from hundreds of thousands of training examples prohibitively expensive. Both issues have been addressed by recent machine-learning literature, but the behaviors of existing solutions in real-world domains are still far from satisfactory. Here, we describe our own technique and report experiments with a concrete text database.
Archive | 2003
Mei Ling Shyu; Shu-Ching Chen; Kanoksri Sarinnapakorn; LiWu Chang