Bilson J. L. Campana | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bilson J. L. Campana is active.

Explore More

Publication

Featured researches published by Bilson J. L. Campana.

knowledge discovery and data mining | 2012

Searching and mining trillions of time series subsequences under dynamic time warping

Thanawin Rakthanmanon; Bilson J. L. Campana; Abdullah Mueen; Gustavo E. A. P. A. Batista; M. Brandon Westover; Qiang Zhu; Jesin Zakaria; Eamonn J. Keogh

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

Statistical Analysis and Data Mining | 2010

A compression‐based distance measure for texture

Bilson J. L. Campana; Eamonn J. Keogh

The analysis of texture is an important subroutine in application areas as diverse as biology, medicine, robotics, and forensic science. While the last three decades have seen extensive research in algorithms to measure texture similarity, almost all existing methods require the careful setting of many parameters. There are many problems associated with a lot of parameters, the most obvious of which is that with many parameters to fit, it is very difficult to avoid overfitting. In this work, we propose to extend recent advances in Kolmogorov complexity‐based similarity measures to texture matching problems. These Kolmogorov‐based methods have been shown to be very useful in intrinsically discrete domains such as DNA, protein sequences, MIDI music, and natural languages; however, they are not well defined for real‐valued data. To address this, we introduce a very simple idea, the Campana‐Keogh (CK) video compression‐based method for texture measures. These measures utilize video compressors to approximate the Kolmogorov complexity. Using the parameter‐free CK method, we novely utilize lossy compression to create an efficient and robust parameter‐lite texture similarity measure: the CK‐1 distance measure. We demonstrate the utility of our measure with extensive empirical evaluations on real‐world case studies drawn from nematology, arachnology, entomology, medicine, forensics, texture analysis benchmarks, and many other domains. Copyright

knowledge discovery and data mining | 2015

Discovery of Meaningful Rules in Time Series

Mohammad Shokoohi-Yekta; Yanping Chen; Bilson J. L. Campana; Bing Hu; Jesin Zakaria; Eamonn J. Keogh

The ability to make predictions about future events is at the heart of much of science; so, it is not surprising that prediction has been a topic of great interest in the data mining community for the last decade. Most of the previous work has attempted to predict the future based on the current value of a stream. However, for many problems the actual values are irrelevant, whereas the shape of the current time series pattern may foretell the future. The handful of research efforts that consider this variant of the problem have met with limited success. In particular, it is now understood that most of these efforts allow the discovery of spurious rules. We believe the reason why rule discovery in real-valued time series has failed thus far is because most efforts have more or less indiscriminately applied the ideas of symbolic stream rule discovery to real-valued rule discovery. In this work, we show why these ideas are not directly suitable for rule discovery in time series. Beyond our novel definitions/representations, which allow for meaningful and extendable specifications of rules, we further show novel algorithms that allow us to quickly discover high quality rules in very large datasets that accurately predict the occurrence of future events.

Journal of Insect Behavior | 2013

Monitoring and Mining Animal Sounds in Visual Space

Yuan Hao; Bilson J. L. Campana; Eamonn J. Keogh

Monitoring animals by the sounds they produce is an important and challenging task, whether the application is outdoors in a natural habitat, or in the controlled environment of a laboratory setting. In the former case, the density and diversity of animal sounds can act as a measure of biodiversity. In the latter case, researchers often create control and treatment groups of animals, expose them to different interventions, and test for different outcomes. One possible manifestation of different outcomes may be changes in the bioacoustics of the animals. With such a plethora of important applications, there have been significant efforts to build bioacoustic classification tools. However, we argue that most current tools are severely limited. They often require the careful tuning of many parameters (and thus huge amounts of training data), are either too computationally expensive for deployment in resource-limited sensors, specialized for a very small group of species, or are simply not accurate enough to be useful. In this work we introduce a novel bioacoustic recognition/classification framework that mitigates or solves all of the above problems. We propose to classify animal sounds in the visual space, by treating the texture of their sonograms as an acoustic fingerprint using a recently introduced parameter-free texture measure as a distance measure. We further show that by searching for the most representative acoustic fingerprint, we can significantly outperform other techniques in terms of speed and accuracy.

Pattern Analysis and Applications | 2015

Establishing the provenance of historical manuscripts with a novel distance measure

Bing Hu; Thanawin Rakthanmanon; Bilson J. L. Campana; Abdullah Mueen; Eamonn J. Keogh

The recent digitization of more than 20 million books has been led by initiatives from countries wishing to preserve their cultural heritage and by several commercial endeavors, including the Google Print Library Project. It is expected that within a few years a significant fraction of the world’s books will be online. However, for millions of complete books and tens of millions of loose pages, the provenance of the manuscripts may be completely unknown or disputed, thus denying historians an understanding of the context in which the content was created. In a handful of cases, it may be possible for experts to regain the provenance by examining linguistic, cultural and/or stylistic clues. However, such experts are a rarity and these investigations are time-consuming and expensive. One technique used by experts to establish provenance is the examination of the ornate initial letters appearing in the questioned manuscript. By comparing the initial letters in the manuscript to annotated initial letters whose origin is known, the provenance can be determined. In this work, we show for the first time that we can reproduce this ability with a computer algorithm. We use a recently introduced technique to measure texture similarity and show that it can recognize initial letters with an accuracy that rivals or exceeds human performance. A brute force implementation of this measure would require several months to process a single large book; however, we introduce a novel lower bound that allows us to process the books in hours or minutes.

international conference on machine learning and applications | 2010

Classification of Live Moths Combining Texture, Color and Shape Primitives

Gustavo E. A. P. A. Batista; Bilson J. L. Campana; Eamonn J. Keogh

Each year, insect-borne diseases kill more than one million people, and harmful insects destroy tens of billions of dollars worth of crops and livestock. At the same time, beneficial insects pollinate three-quarters of all food consumed by humans. Given the extraordinary impact of insects on human life, it is somewhat surprising that machine learning has made very little impact on understanding (and hence, controlling) insects. In this work we discuss why this is the case, and argue that a confluence of facts make the time ripe for machine learning research to reach out to the entomological community and help them solve some important problems. As a concrete example, we show how we can solve an important classification problem in commercial entomology by leveraging off recent progress in shape, color and texture measures.

siam international conference on data mining | 2010