Chotirat Ann Ratanamahatana

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chotirat Ann Ratanamahatana is active.

Explore More

Publication

Featured researches published by Chotirat Ann Ratanamahatana.

Knowledge and Information Systems | 2005

Exact indexing of dynamic time warping

Eamonn J. Keogh; Chotirat Ann Ratanamahatana

The problem of indexing time series has attracted much interest. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry and finance. Unfortunately, however, DTW does not obey the triangular inequality and thus has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques or abandoned the idea of indexing and concentrated on speeding up sequential searches. In this work, we introduce a novel technique for the exact indexing of DTW. We prove that our method guarantees no false dismissals and we demonstrate its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.

knowledge discovery and data mining | 2004

Towards parameter-free data mining

Eamonn J. Keogh; Stefano Lonardi; Chotirat Ann Ratanamahatana

Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process.Data mining algorithms should have as few parameters as possible, ideally none. A parameter-free algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm. The results are motivated by observations in Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen or so lines of code. We will show that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.

international conference on machine learning | 2006

Fast time series classification using numerosity reduction

Xiaopeng Xi; Eamonn J. Keogh; Christian R. Shelton; Li Wei; Chotirat Ann Ratanamahatana

Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. While the idea of numerosity reduction for nearest-neighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy. We test our ideas with a comprehensive set of experiments, and show that it can efficiently produce extremely fast accurate classifiers.

very large data bases | 2005

Scaling and time warping in time series querying

Ada Wai-Chee Fu; Eamonn J. Keogh; Leo Yung Hang Lau; Chotirat Ann Ratanamahatana; Raymond Chi-Wing Wong

The last few years have seen an increasing understanding that dynamic time warping (DTW), a technique that allows local flexibility in aligning time series, is superior to the ubiquitous Euclidean distance for time series classification, clustering, and indexing. More recently, it has been shown that for some problems, uniform scaling (US), a technique that allows global scaling of time series, may just be as important for some problems. In this work, we note that for many real world problems, it is necessary to combine both DTW and US to achieve meaningful results. This is particularly true in domains where we must account for the natural variability of human actions, including biometrics, query by humming, motion-capture/animation, and handwriting recognition. We introduce the first technique which can handle both DTW and US simultaneously, our techniques involve search pruning by means of a lower bounding technique and multi-dimensional indexing to speed up the search. We demonstrate the utility and effectiveness of our method on a wide range of problems in industry, medicine, and entertainment.

knowledge discovery and data mining | 2005

A novel bit level time series representation with implication of similarity search and clustering

Chotirat Ann Ratanamahatana; Eamonn J. Keogh; Anthony J. Bagnall; Stefano Lonardi

Because time series are a ubiquitous and increasingly prevalent type of data, there has been much research effort devoted to time series data mining recently. As with all data mining problems, the key to effective and scalable algorithms is choosing the right representation of the data. Many high level representations of time series have been proposed for data mining. In this work, we introduce a new technique based on a bit level approximation of the data. The representation has several important advantages over existing techniques. One unique advantage is that it allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance. This fact can be exploited to produce faster exact algorithms for similarly search. In addition, we demonstrate that our new representation allows time series clustering to scale to much larger datasets.

Applied Artificial Intelligence | 2003

Feature selection for the naive bayesian classifier using decision trees

Chotirat Ann Ratanamahatana; Dimitrios Gunopulos

It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten data sets indicate that SBC performs markedly better than NB on all domains, and SBC outperforms C4.5 on many data sets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) is also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, in most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach a high accuracy of classifications.

Data Mining and Knowledge Discovery | 2007

Compression-based data mining of sequential data

Eamonn J. Keogh; Stefano Lonardi; Chotirat Ann Ratanamahatana; Li Wei; Sang-Hee Lee; John C. Handley

The vast majority of data mining algorithms require the setting of many input parameters. The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process. Data mining algorithms should have as few parameters as possible. A parameter-light algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics, learning, and computational theory hold great promise for a parameter-light data-mining paradigm. The results are strongly connected to Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen lines of code. We will show that this approach is competitive or superior to many of the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/XML/video datasets. As a further evidence of the advantages of our method, we will demonstrate its effectiveness to solve a real world classification problem in recommending printing services and products.

multimedia and ubiquitous engineering | 2007

On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping

Vit Niennattrakul; Chotirat Ann Ratanamahatana

After the generation of multimedia data turned digital, an explosion of interest in their data storage, retrieval, and processing has drastically increased. This includes videos, images, and audios, where we now have higher expectations in exploiting these data at hands. Typical manipulations are in some forms of video/image/audio processing, including automatic speech recognition, which require fairly large amount of storage and are computationally intensive. In our recent work, we have demonstrated the utility of time series representation in the task of clustering multimedia data using k-medoids method, which allows considerable amount of reduction in computational effort and storage space. However, k- means is a much more generic clustering method when Euclidean distance is used. In this work, we will demonstrate that unfortunately, k-means clustering will sometimes fail to give correct results, an unaware fact that may be overlooked by many researchers. This is especially the case when Dynamic Time Warping (DTW) is used as the distance measure in averaging the shape of time series. We also will demonstrate that the current averaging algorithm may not produce the real average of the time series, thus generates incorrect k-means clustering results, and then show potential causes why DTW averaging methods may not achieve meaningful clustering results. Lastly, we conclude with a suggestion of a method to potentially find the shape-based time series average that satisfies the required properties.

Data Mining and Knowledge Discovery | 2006

A Bit Level Representation for Time Series Data Mining with Shape Based Similarity

Anthony J. Bagnall; Chotirat Ann Ratanamahatana; Eamonn J. Keogh; Stefano Lonardi; Gareth J. Janacek

Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.

international conference on data mining | 2005

Partial elastic matching of time series

Longin Jan Latecki; Vasileios Megalooikonomou; Qiang Wang; Rolf Lakaemper; Chotirat Ann Ratanamahatana; Eamonn J. Keogh

We consider the problem of elastic matching of time series. We propose an algorithm that determines a subsequence of a target time series that best matches a query series. In the proposed algorithm, we map the problem of the best matching subsequence to the problem of a cheapest path in a DAG (directed acyclic graph). The proposed approach allows us to also compute the optimal scale and translation of time series values, which is a nontrivial problem in the case of subsequence matching.

Explore More