Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Liudmila Ulanova is active.

Publication


Featured researches published by Liudmila Ulanova.


conference on software maintenance and reengineering | 2013

An Empirical Analysis of Bug Reports and Bug Fixing in Open Source Android Apps

Pamela Bhattacharya; Liudmila Ulanova; Iulian Neamtiu; Sai Charan Koduru

Smartphone platforms and applications (apps) have gained tremendous popularity recently. Due to the novelty of the smartphone platform and tools, and the low barrier to entry for app distribution, apps are prone to errors, which affects user experience and requires frequent bug fixes. An essential step towards correcting this situation is understanding the nature of the bugs and bug-fixing processes associated with smartphone platforms and apps. However, prior empirical bug studies have focused mostly on desktop and server applications. Therefore, in this paper, we perform an in-depth empirical study on bugs in the Google Android smartphone platform and 24 widely-used open-source Android apps from diverse categories such as communication, tools, and media. Our analysis has three main thrusts. First, we define several metrics to understand the quality of bug reports and analyze the bug-fix process, including developer involvement. Second, we show how differences in bug life-cycles can affect the bug-fix process. Third, as Android devices carry significant amounts of security-sensitive information, we perform a study of Android security bugs. We found that, although contributor activity in these projects is generally high, developer involvement decreases in some projects, similarly, while bug-report quality is high, bug triaging is still a problem. Finally, we observe that in Android apps, security bug reports are of higher quality but get fixed slower than non-security bugs. We believe that the findings of our study could potentially benefit both developers and users of Android apps.


knowledge discovery and data mining | 2015

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Nurjahan Begum; Liudmila Ulanova; Jun Wang; Eamonn J. Keogh

Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.


international conference on data mining | 2016

Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets

Chin-Chia Michael Yeh; Yan Zhu; Liudmila Ulanova; Nurjahan Begum; Yifei Ding; Hoang Anh Dau; Diego Furtado Silva; Abdullah Mueen; Eamonn J. Keogh

The all-pairs-similarity-search (or similarity join) problem has been extensively studied for text and a handful of other datatypes. However, surprisingly little progress has been made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for two time series data mining problems, including motif discovery and novelty discovery.


Sensors | 2015

Energy-Efficient Integration of Continuous Context Sensing and Prediction into Smartwatches

Reza Rawassizadeh; Martin Tomitsch; Manouchehr Nourizadeh; Elaheh Momeni; Aaron Peery; Liudmila Ulanova; Michael Pazzani

As the availability and use of wearables increases, they are becoming a promising platform for context sensing and context analysis. Smartwatches are a particularly interesting platform for this purpose, as they offer salient advantages, such as their proximity to the human body. However, they also have limitations associated with their small form factor, such as processing power and battery life, which makes it difficult to simply transfer smartphone-based context sensing and prediction models to smartwatches. In this paper, we introduce an energy-efficient, generic, integrated framework for continuous context sensing and prediction on smartwatches. Our work extends previous approaches for context sensing and prediction on wrist-mounted wearables that perform predictive analytics outside the device. We offer a generic sensing module and a novel energy-efficient, on-device prediction module that is based on a semantic abstraction approach to convert sensor data into meaningful information objects, similar to human perception of a behavior. Through six evaluations, we analyze the energy efficiency of our framework modules, identify the optimal file structure for data access and demonstrate an increase in accuracy of prediction through our semantic abstraction method. The proposed framework is hardware independent and can serve as a reference model for implementing context sensing and prediction on small wearable devices beyond smartwatches, such as body-mounted cameras.


knowledge discovery and data mining | 2015

Efficient Long-Term Degradation Profiling in Time Series for Complex Physical Systems

Liudmila Ulanova; Tan Yan; Haifeng Chen; Guofei Jiang; Eamonn J. Keogh; Kai Zhang

The long term operation of physical systems inevitably leads to their wearing out, and may cause degradations in performance or the unexpected failure of the entire system. To reduce the possibility of such unanticipated failures, the system must be monitored for tell-tale symptoms of degradation that are suggestive of imminent failure. In this work, we introduce a novel time series analysis technique that allows the decomposition of the time series into trend and fluctuation components, providing the monitoring software with actionable information about the changes of the systems behavior over time. We analyze the underlying problem and formulate it to a Quadratic Programming (QP) problem that can be solved with existing QP-solvers. However, when the profiling resolution is high, as generally required by real-world applications, such a decomposition becomes intractable to general QP-solvers. To speed up the problem solving, we further transform the problem and present a novel QP formulation, Non-negative QP, for the problem and demonstrate a tractable solution that bypasses the use of slow general QP-solvers. We demonstrate our ideas on both synthetic and real datasets, showing that our method allows us to accurately extract the degradation phenomenon of time series. We further demonstrate the generality of our ideas by applying them beyond classic machine prognostics to problems in identifying the influence of news events on currency exchange rates and stock prices. We fully implement our profiling system and deploy it into several physical systems, such as chemical plants and nuclear power plants, and it greatly helps detect the degradation phenomenon, and diagnose the corresponding components.


Data Mining and Knowledge Discovery | 2018

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

Chin-Chia Michael Yeh; Yan Zhu; Liudmila Ulanova; Nurjahan Begum; Yifei Ding; Hoang Anh Dau; Zachary Zimmerman; Diego Furtado Silva; Abdullah Mueen; Eamonn J. Keogh

The last decade has seen a flurry of research on all-pairs-similarity-search (or similarity joins) for text, DNA and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. However, there has been surprisingly little progress made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce only one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time and/or be accelerated by a trivial porting to a GPU framework. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for many time series data mining problems, including motif discovery, novelty discovery, shapelet discovery, semantic segmentation, density estimation, and contrast set mining. Moreover, we demonstrate the utility of our ideas on domains as diverse as seismology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring and medicine.


international conference on data mining | 2017

Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels

Shaghayegh Gharghabi; Yifei Ding; Chin-Chia Michael Yeh; Kaveh Kamgar; Liudmila Ulanova; Eamonn J. Keogh

Unsupervised semantic segmentation in the time series domain is a much-studied problem due to its potential to detect unexpected regularities and regimes in poorly understood data. However, the current techniques have several shortcomings, which have limited the adoption of time series semantic segmentation beyond academic settings for three primary reasons. First, most methods require setting/learning many parameters and thus may have problems generalizing to novel situations. Second, most methods implicitly assume that all the data is segmentable, and have difficulty when that assumption is unwarranted. Finally, most research efforts have been confined to the batch case, but online segmentation is clearly more useful and actionable. To address these issues, we present an algorithm which is domain agnostic, has only one easily determined parameter, and can handle data streaming at a high rate. In this context, we test our algorithm on the largest and most diverse collection of time series datasets ever considered, and demonstrate our algorithms superiority over current solutions. Furthermore, we are the first to show that semantic segmentation may be possible at superhuman performance levels.


siam international conference on data mining | 2016

Clustering in the Face of Fast Changing Streams.

Liudmila Ulanova; Nurjahan Begum; Mohammad Shokoohi-Yekta; Eamonn J. Keogh

Clustering is arguably the most important primitive for data mining, finding use as a subroutine in many higher-order algorithms. In recent years, the community has redirected its attention from the batch case to the online case. This need to support online clustering is engendered by the proliferation of cheap ubiquitous sensors that continuously monitor various aspects of our world, from heartbeats as we exercise to the number of mosquitoes visiting a well in a village in Ethiopia. In this work, we argue that current online clustering solutions offer a room for improvement. To some degree they all have at least one of the following shortcomings: they are parameter-laden, only defined for certain distance functions, sensitive to outliers, and/or they are approximate. This last point requires clarification; in some sense almost all clustering algorithms are approximate. For example, in general, k-means only approximately optimizes its objective function. However, streaming versions of the k-means algorithm are further approximating this approximation, potentially leading to very poor solutions. In this work, we introduce an algorithm that mitigates these flaws. It is parameter-lite, defined for any distance function, insensitive to outliers and produces the same output as the batch version of the algorithm. We demonstrate the utility and effectiveness of our ideas with case studies in entomology, cardiology and biological audio processing.


similarity search and applications | 2014

Generating Synthetic Data to Allow Learning from a Single Exemplar per Class

Liudmila Ulanova; Yuan Hao; Eamonn J. Keogh

Recent years have seen an explosion in the volume of historical documents placed online. The individuality of fonts combined with the degradation suffered by century old manuscripts means that Optical Character Recognition Systems do not work well here. As human transcription is prohibitively expensive, recent efforts focused on human/computer cooperative transcription: a human annotates a small fraction of a text to provide labeled data for recognition algorithms. Such a system naturally begs the question of how much data must the human label? In this work we show that we can do well even if the human labels only a single instance from each class. We achieve this good result using two novel observations: we can leverage off a recently introduced parameter-free distance measure, improving it by taking into account the “complexity” of the glyphs being compared; we can estimate this complexity using synthetic but plausible instances made from the single training instance. We demonstrate the utility of our observations on diverse historical manuscripts.


international conference on data mining | 2013

Classification of Multi-dimensional Streaming Time Series by Weighting Each Classifier's Track Record

Bing Hu; Yanping Chen; Jesin Zakaria; Liudmila Ulanova; Eamonn J. Keogh

Collaboration


Dive into the Liudmila Ulanova's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nurjahan Begum

University of California

View shared research outputs
Top Co-Authors

Avatar

Hoang Anh Dau

University of California

View shared research outputs
Top Co-Authors

Avatar

Yifei Ding

University of California

View shared research outputs
Top Co-Authors

Avatar

Chin-Chia Michael Yeh

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Abdullah Mueen

University of New Mexico

View shared research outputs
Top Co-Authors

Avatar

Jun Wang

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Yan Zhu

University of California

View shared research outputs
Top Co-Authors

Avatar

Diego Furtado Silva

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Bing Hu

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge