Is this you? Create Your Porfile

Michail Vlachos

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michail Vlachos is active.

Explore More

Publication

Featured researches published by Michail Vlachos.

international conference on data engineering | 2002

Discovering similar multidimensional trajectories

Michail Vlachos; George Kollios; Dimitrios Gunopulos

We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.

knowledge discovery and data mining | 2003

Indexing multi-dimensional time-series with support for multiple distance measures

Michail Vlachos; Marios Hadjieleftheriou; Dimitrios Gunopulos; Eamonn J. Keogh

Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for a single index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of trajectory similarities. Trajectory datasets are very common in environmental applications, mobility experiments, video surveillance and are especially important for the discovery of certain biological patterns. Our primary similarity measure is based on the Longest Common Subsequence (LCSS) model, that offers enhanced robustness, particularly for noisy data, which are encountered very often in real world applications. However, our index is able to accommodate other distance measures as well, including the ubiquitous Euclidean distance, and the increasingly popular Dynamic Time Warping (DTW). While other researchers have advocated one or other of these similarity measures, a major contribution of our work is the ability to support all these measures without the need to restructure the index. Our framework guarantees no false dismissals and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall. The experimental results demonstrate that our index can help speed-up the computation of expensive similarity measures such as the LCSS and the DTW.

extending database technology | 2004

Iterative Incremental Clustering of Time Series

Jessica Lin; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos

We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems.

knowledge discovery and data mining | 2002

Non-linear dimensionality reduction techniques for classification and visualization

Michail Vlachos; Carlotta Domeniconi; Dimitrios Gunopulos; George Kollios; Nick Koudas

In this paper we address the issue of using local embeddings for data visualization in two and three dimensions, and for classification. We advocate their use on the basis that they provide an efficient mapping procedure from the original dimension of the data, to a lower intrinsic dimension. We depict how they can accurately capture the users perception of similarity in high-dimensional data for visualization purposes. Moreover, we exploit the low-dimensional mapping provided by these embeddings, to develop new classification techniques, and we show experimentally that the classification accuracy is comparable (albeit using fewer dimensions) to a number of other classification procedures.

international conference on data engineering | 2004

Online amnesic approximation of streaming time series

Themistoklis Palpanas; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos; Wagner Truppel

The past decade has seen a wealth of research on time series representations, because the manipulation, storage, and indexing of large volumes of raw time series data is impractical. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic. While there has been previous work on amnesic representations, the class of amnesic functions possible was dictated by the representation itself. We introduce a novel representation of time series that can represent arbitrary, user-specified amnesic functions. For example, a meteorologist may decide that data that is twice as old can tolerate twice as much error, and thus, specify a linear amnesic function. In contrast, an econometrist might opt for an exponential amnesic function. We propose online algorithms for our representation, and discuss their properties. Finally, we perform an extensive empirical evaluation on 40 datasets, and show that our approach can efficiently maintain a high quality amnesic approximation.

knowledge discovery and data mining | 2004

Rotation invariant distance measures for trajectories

Michail Vlachos; Dimitrios Gunopulos; Gautam Das

For the discovery of similar patterns in 1D time-series, it is very typical to perform a normalization of the data (for example a transformation so that the data follow a zero mean and unit standard deviation). Such transformations can reveal latent patterns and are very commonly used in datamining applications. However, when dealing with multidimensional time-series, which appear naturally in applications such as video-tracking, motion-capture etc, similar motion patterns can also be expressed at different orientations. It is therefore imperative to provide support for additional transformations, such as rotation. In this work, we transform the positional information of moving data, into a space that is translation, scale and rotation invariant. Our distance measure in the new space is able to detect elastic matches and can be efficiently lower bounded, thus being computationally tractable. The proposed methods are easy to implement, fast to compute and can have many applications for real world problems, in areas such as handwriting recognition and posture estimation in motion-capture data. Finally, we empirically demonstrate the accuracy and the efficiency of the technique, using real and synthetic handwriting data.

database and expert systems applications | 2002

Robust similarity measures for mobile object trajectories

Michail Vlachos; Dimitrios Gunopulos; George Kollios

We investigate techniques for similarity analysis of spatio-temporal trajectories for mobile objects. Such data may contain a large number of outliers, which degrade the performance of Euclidean and time warping distance. Therefore, we propose the use of non-metric distance functions based on the longest common subsequence (LCSS), in conjunction with a sigmoidal matching function. Finally, we compare these new methods to various L/sub p/ norms and also to time warping distance (for real and synthetic data) and present experimental results that validate the accuracy and efficiency of our approach, especially in the presence of noise.

knowledge discovery and data mining | 2005

A MPAA-Based iterative clustering algorithm augmented by nearest neighbors search for time-series data streams

Jessica Lin; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos; Jianwei Liu; Shoujian Yu; Jiajin Le

In streaming time series the Clustering problem is more complex, since the dynamic nature of streaming data makes previous clustering methods inappropriate. In this paper, we propose firstly a new method to evaluate Clustering in streaming time series databases. First, we introduce a novel multi-resolution PAA (MPAA) transform to achieve our iterative clustering algorithm. The method is based on the use of a multi-resolution piecewise aggregate approximation representation, which is used to extract features of time series. Then, we propose our iterative clustering approach for streaming time series. We take advantage of the multiresolution property of MPPA and equip a stopping criteria based on Hoeffding bound in order to achieve fast response time. Our streaming time-series clustering algorithm also works by leveraging off the nearest neighbors of the incoming streaming time series datasets and fulfill incremental clustering approach. The comprehensive experiments based on several publicly available real data sets shows that significant performance improvement is achieved and produce high-quality clusters in comparison to the previous methods.

Archive | 2007

Multiresolution Clustering of Time Series and Application to Images

Jessica Lin; Michail Vlachos; Eamonn J. Keogh; Dimitrios Gunopulos

Clustering is vital in the process of condensing and outlining information, since it can provide a synopsis of the stored data. However, the high dimensionality of multimedia data today presents an insurmountable challenge for clustering algorithms. Based on the well-known fact that time series and image histograms can both be represented accurately in a lower resolution using orthonormal decompositions, we present an anytime version of the k-means algorithm. The algorithm works by leveraging off the multiresolution property of wavelets. The dilemma of choosing the initial centers for k-means is mitigated by assigning the final centers at each approximation level as the initial centers for the subsequent, finer approximation. In addition to casting k-means as an anytime algorithm, our approach has two other very desirable properties. We observe that even by working at coarser approximations, the achieved quality is better than the batch algorithm, and that even if the algorithm is run to completion, the running time is significantly reduced. We show how this algorithm can be suitably extended to chromatic and textural features extracted from images. Finally, we demonstrate the applicability of this approach on the online image search engine scenario.

international conference on management of data | 2004