Yasuko Matsubara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasuko Matsubara is active.

Explore More

Publication

Featured researches published by Yasuko Matsubara.

knowledge discovery and data mining | 2012

Rise and fall patterns of information diffusion: model and implications

Yasuko Matsubara; Yasushi Sakurai; B. Aditya Prakash; Lei Li; Christos Faloutsos

The recent explosion in the adoption of search engines and new media such as blogs and Twitter have facilitated faster propagation of news and rumors. How quickly does a piece of news spread over these media? How does its popularity diminish over time? Does the rising and falling pattern follow a simple universal law? In this paper, we propose SpikeM, a concise yet flexible analytical model for the rise and fall patterns of influence propagation. Our model has the following advantages: (a) unification power: it generalizes and explains earlier theoretical models and empirical observations; (b) practicality: it matches the observed behavior of diverse sets of real data; (c) parsimony: it requires only a handful of parameters; and (d) usefulness: it enables further analytics tasks such as fore- casting, spotting anomalies, and interpretation by reverse- engineering the system parameters of interest (e.g. quality of news, count of interested bloggers, etc.). Using SpikeM, we analyzed 7.2GB of real data, most of which were collected from the public domain. We have shown that our SpikeM model accurately and succinctly describes all the patterns of the rise-and-fall spikes in these real datasets.

knowledge discovery and data mining | 2012

Fast mining and forecasting of complex time-stamped events

Yasuko Matsubara; Yasushi Sakurai; Christos Faloutsos; Tomoharu Iwata; Masatoshi Yoshikawa

Given huge collections of time-evolving events such as web-click logs, which consist of multiple attributes (e.g., URL, userID, times- tamp), how do we find patterns and trends? How do we go about capturing daily patterns and forecasting future events? We need two properties: (a) effectiveness, that is, the patterns should help us understand the data, discover groups, and enable forecasting, and (b) scalability, that is, the method should be linear with the data size. We introduce TriMine, which performs three-way mining for all three attributes, namely, URLs, users, and time. Specifically TriMine discovers hidden topics, groups of URLs, and groups of users, simultaneously. Thanks to its concise but effective summarization, it makes it possible to accomplish the most challenging and important task, namely, to forecast future events. Extensive experiments on real datasets demonstrate that TriMine discovers meaningful topics and makes long-range forecasts, which are notoriously difficult to achieve. In fact, TriMine consistently outperforms the best state-of-the-art existing methods in terms of accuracy and execution speed (up to 74x faster).

international conference on management of data | 2014

AutoPlait: automatic mining of co-evolving time sequences

Yasuko Matsubara; Yasushi Sakurai; Christos Faloutsos

Given a large collection of co-evolving multiple time-series, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? In this paper we present AutoPlait, a fully automatic mining algorithm for co-evolving time sequences. Our method has the following properties: (a) effectiveness: it operates on large collections of time-series, and finds similar segment groups that agree with human intuition; (b) scalability: it is linear with the input size, and thus scales up very well; and (c) AutoPlait is parameter-free, and requires no user intervention, no prior training, and no parameter tuning. Extensive experiments on 67GB of real datasets demonstrate that AutoPlait does indeed detect meaningful patterns correctly, and it outperforms state-of-the-art competitors as regards accuracy and speed: AutoPlait achieves near-perfect, over 95% precision and recall, and it is up to 472 times faster than its competitors.

knowledge discovery and data mining | 2014

FUNNEL: automatic mining of spatially coevolving epidemics

Yasuko Matsubara; Yasushi Sakurai; Willem G. van Panhuis; Christos Faloutsos

Given a large collection of epidemiological data consisting of the count of d contagious diseases for l locations of duration n, how can we find patterns, rules and outliers? For example, the Project Tycho provides open access to the count infections for U.S. states from 1888 to 2013, for 56 contagious diseases (e.g., measles, influenza), which include missing values, possible recording errors, sudden spikes (or dives) of infections, etc. So how can we find a combined model, for all these diseases, locations, and time-ticks? In this paper, we present FUNNEL, a unifying analytical model for large scale epidemiological data, as well as a novel fitting algorithm, FUNNELFIT, which solves the above problem. Our method has the following properties: (a) Sense-making: it detects important patterns of epidemics, such as periodicities, the appearance of vaccines, external shock events, and more; (b) Parameter-free: our modeling framework frees the user from providing parameter values; (c) Scalable: FUNNELFIT is carefully designed to be linear on the input size; (d) General: our model is general and practical, which can be applied to various types of epidemics, including computer-virus propagation, as well as human diseases. Extensive experiments on real data demonstrate that FUNNELFIT does indeed discover important properties of epidemics: (P1) disease seasonality, e.g., influenza spikes in January, Lyme disease spikes in July and the absence of yearly periodicity for gonorrhea; (P2) disease reduction effect, e.g., the appearance of vaccines; (P3) local/state-level sensitivity, e.g., many measles cases in NY; (P4) external shock events, e.g., historical flu pandemics; (P5) detect incongruous values, i.e., data reporting errors.

international conference on management of data | 2015

Mining and Forecasting of Big Time-series Data

Yasushi Sakurai; Yasuko Matsubara; Christos Faloutsos

Given a large collection of time series, such as motion capture sensors and automobile trajectories, how can we efficiently and effectively find typical patterns? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? What are the major tools for fore-casting and outlier detection? Time-series data analysis becomes of increasingly high importance, thanks to the decreasing cost of hardware and the increasing online processing abilities. The objective of our project is to develop fundamental technologies for the real-time modeling and forecasting of big time-series data. We provide the intuition behind these powerful technologies, as well as to introduce case studies that illustrate their practical use.

european conference on machine learning | 2014

Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries

Flavio Figueiredo; Jussara M. Almeida; Yasuko Matsubara; Bruno F. Ribeiro; Christos Faloutsos

How many listens will an artist receive on a online radio? How about plays on a YouTube video? How many of these visits are new or returning users? Modeling and mining popularity dynamics of social activity has important implications for researchers, content creators and providers. We here investigate the effect of revisits (successive visits from a single user) on content popularity. Using four datasets of social activity, with up to tens of millions media objects (e.g., YouTube videos, Twitter hashtags or LastFM artists), we show the effect of revisits in the popularity evolution of such objects. Secondly, we propose the Phoenix-R model which captures the popularity dynamics of individual objects. Phoenix-R has the desired properties of being: (1) parsimonious, being based on the minimum description length principle, and achieving lower root mean squared error than state-of-the-art baselines; (2) applicable, the model is effective for predicting future popularity values of objects.

international conference on data mining | 2014

Fast and Exact Monitoring of Co-Evolving Data Streams

Yasuko Matsubara; Yasushi Sakurai; Naonori Ueda; Masatoshi Yoshikawa

Given a huge stream of multiple co-evolving sequences, such as motion capture and web-click logs, how can we find meaningful patterns and spot anomalies? Our aim is to monitor data streams statistically, and find sub sequences that have the characteristics of a given hidden Markov model (HMM). For example, consider an online web-click stream, where massive amounts of access logs of millions of users are continuously generated every second. So how can we find meaningful building blocks and typical access patterns such as weekday/weekend patterns, and also, detect anomalies and intrusions? In this paper, we propose Stream Scan, a fast and exact algorithm for monitoring multiple co-evolving data streams. Our method has the following advantages: (a) it is effective, leading to novel discoveries and surprising outliers, (b) it is exact, and we theoretically prove that Stream Scan guarantees the exactness of the output, (c) it is fast, and requires O (1) time and space per time-tick. Our experiments on 67GB of real data illustrate that Stream Scan does indeed detect the qualifying subsequence patterns correctly and that it can offer great improvements in speed (up to 479,000 times) over its competitors.

international conference on data mining | 2009

Scalable Algorithms for Distribution Search

Yasuko Matsubara; Yasushi Sakurai; Masatoshi Yoshikawa

Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2-d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation, anomaly/fraud detection. We propose to address this problem and present D-Search, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloud-similarity computations, (2) multi-step sequential scan, which efficiently prunes a significant number of search candidates and leads to a direct reduction in the search cost. We also introduce an extended version of D-Search: (3) time-series distribution mining, which finds similar subsequences in time-series distribution datasets. Extensive experiments on real multi-dimensional datasets show that our solution achieves up to 2,300 faster wall-clock time over the naive implementation while it does not sacrifice accuracy.

pacific-asia conference on knowledge discovery and data mining | 2013

F-Trail: Finding Patterns in Taxi Trajectories

Yasuko Matsubara; Lei Li; Evangelos E. Papalexakis; David Lo; Yasushi Sakurai; Christos Faloutsos

Given a large number of taxi trajectories, we would like to find interesting and unexpected patterns from the data. How can we summarize the major trends, and how can we spot anomalies? The analysis of trajectories has been an issue of considerable interest with many applications such as tracking trails of migrating animals and predicting the path of hurricanes. Several recent works propose methods on clustering and indexing trajectories data. However, these approaches are not especially well suited to pattern discovery with respect to the dynamics of social and economic behavior. To further analyze a huge collection of taxi trajectories, we develop a novel method, called F-Trail, which allows us to find meaningful patterns and anomalies. Our approach has the following advantages: (a) it is fast, and scales linearly on the input size, (b) it is effective, leading to novel discoveries, and surprising outliers. We demonstrate the effectiveness of our approach, by performing experiments on real taxi trajectories. In fact, F-Trail does produce concise, informative and interesting patterns.

international world wide web conferences | 2016

Mining Big Time-series Data on the Web

Yasushi Sakurai; Yasuko Matsubara; Christos Faloutsos

Online news, blogs, SNS and many other Web-based services has been attracting considerable interest for business and marketing purposes. Given a large collection of time series, such as web-click logs, online search queries, blog and review entries, how can we efficiently and effectively find typical time-series patterns? What are the major tools for mining, forecasting and outlier detection? Time-series data analysis is becoming of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-line processing capability. The objective of this tutorial is to provide a concise and intuitive overview of the most important tools that can help us find meaningful patterns in large-scale time-series data. Specifically we review the state of the art in three related fields: (1) similarity search, pattern discovery and summarization, (2) non-linear modeling and forecasting, and (3) the extension of time-series mining and tensor analysis. We also introduce case studies that illustrate their practical use for social media and Web-based services.

Explore More