Is this you? Create Your Porfile

Dragomir Yankov

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dragomir Yankov is active.

Explore More

Publication

Featured researches published by Dragomir Yankov.

knowledge discovery and data mining | 2007

Detecting time series motifs under uniform scaling

Dragomir Yankov; Eamonn J. Keogh; Jose Medina; Bill Yuan-chi Chiu; Victor B. Zordan

Time series motifs are approximately repeated patterns foundwithin the data. Such motifs have utility for many data mining algorithms, including rule-discovery,novelty-detection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied tomany domains, including medicine, motion capture, robotics and meteorology. In this work we show that most previous applications of time series motifs have been severely limited by the definitions brittleness to even slight changes of uniform scaling, the speed at which the patterns develop. We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains. Apart from being more general than all other motifdiscovery algorithms, a further contribution of our work isthat it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified.

international conference on data mining | 2007

Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets

Dragomir Yankov; Eamonn J. Keogh; Umaa Rebbapragada

The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, Web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.

Knowledge and Information Systems | 2008

Disk aware discord discovery: finding unusual time series in terabyte sized datasets

Dragomir Yankov; Eamonn J. Keogh; Umaa Rebbapragada

The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk /tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature.

international conference on data mining | 2006

Manifold Clustering of Shapes

Dragomir Yankov; Eamonn J. Keogh

Shape clustering can significantly facilitate the automatic labeling of objects present in image collections. For example, it could outline the existing groups of pathological cells in a bank of cyto-images; the groups of species on photographs collected from certain aerials; or the groups of objects observed on surveillance scenes from an office building. Here we demonstrate that a nonlinear projection algorithm such as Isomap can attract together shapes of similar objects, suggesting the existence of isometry between the shape space and a low dimensional nonlinear embedding. Whenever there is a relatively small amount of noise in the data, the projection forms compact, convex clusters that can easily be learned by a subsequent partitioning scheme. We further propose a modification of the Isomap projection based on the concept of degree-bounded minimum spanning trees. The proposed approach is demonstrated to move apart bridged clusters and to alleviate the effect of noise in the data.

international conference on tools with artificial intelligence | 2005

Dot plots for time series analysis

Dragomir Yankov; Eamonn J. Keogh; Stefano Lonardi

Since their introduction in the seventies by Gibbs and McIntyre, dot plots have proved to be a powerful and intuitive technique for visual sequence analysis and mining. Their main domain of application is the field of bioinformatics where they are frequently used by researchers in order to elucidate genomic sequence similarities and alignment. However, this useful technique has remained comparatively constrained to domains where the data has an inherent discrete structure (i.e., text). In this paper we demonstrate how dot plots can be used for the analysis and mining of real-valued time series. We design a tool that creates highly descriptive dot plots which allow one to easily detect similarities, anomalies, reverse similarities, and periodicities well as changes in the frequencies of repetitions. As the underlying algorithm scales we with the input size, we also show the feasibility of the plots for on-line data monitoring

IEEE Transactions on Multimedia | 2008

Fast Best-Match Shape Searching in Rotation-Invariant Metric Spaces

Dragomir Yankov; Eamonn J. Keogh; Li Wei; Xiaopeng Xi; Wendy L. Hodges

Object recognition and content-based image retrieval systems rely heavily on the accurate and efficient identification of 2-D shapes. Features such as color, texture, positioning etc., are insufficient to convey the information that could be obtained through shape analysis. A fundamental requirement in this analysis is that shape similarities are computed invariantly to basic geometric transformations, e.g., scaling, shifting, and most importantly, rotations. And while scale and shift invariance are easily achievable through a suitable shape representation, rotation invariance is much harder to deal with. In this work, we explore the metric properties of the rotation-invariant distance measures and propose an algorithm for fast similarity search in the shape space. The algorithm can be utilized in a number of important data mining tasks such as shape clustering and classification, or for discovering of motifs and discords in large image collections. The technique is demonstrated to introduce a dramatic speed-up over the current approaches, and is guaranteed to introduce no false dismissals.

international conference on data mining | 2007

Locally Constrained Support Vector Clustering

Dragomir Yankov; Eamonn J. Keogh; Kin Fai Kan

Support vector clustering transforms the data into a high dimensional feature space, where a decision function is computed. In the original space, the function outlines the boundaries of higher density regions, naturally splitting the data into individual clusters. The method, however, though theoretically sound, has certain drawbacks which make it not so appealing to the practitioner. Namely, it is unstable in the presence of outliers and it is hard to control the number of clusters that it identifies. Parametrizing the algorithm incorrectly in noisy settings, can either disguise some objectively present clusters in the data, or can identify a large number of small and nonintuitive clusters. Here, we explore the properties of the data in small regions building a mixture of factor analyzers. The obtained information is used to regularize the complexity of the outlined cluster boundaries, by assigning suitable weighting to each example. The approach is demonstrated to be less susceptible to noise and to outline better interpretable clusters than support vector clustering alone.

european conference on machine learning | 2006

Ensembles of nearest neighbor forecasts

Dragomir Yankov; Dennis DeCoste; Eamonn J. Keogh

Nearest neighbor forecasting models are attractive with their simplicity and the ability to predict complex nonlinear behavior. They rely on the assumption that observations similar to the target one are also likely to have similar outcomes. A common practice in nearest neighbor model selection is to compute the globally optimal number of neighbors on a validation set, which is later applied for all incoming queries. For certain queries, however, this number may be suboptimal and forecasts that deviate a lot from the true realization could be produced. To address the problem we propose an alternative approach of training ensembles of nearest neighbor predictors that determine the best number of neighbors for individual queries. We demonstrate that the forecasts of the ensembles improve significantly on the globally optimal single predictors.

siam international conference on data mining | 2008