Huiping Cao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huiping Cao is active.

Explore More

Publication

Featured researches published by Huiping Cao.

knowledge discovery and data mining | 2004

Mining, indexing, and querying historical spatiotemporal data

Nikos Mamoulis; Huiping Cao; George Kollios; Marios Hadjieleftheriou; Yufei Tao; David W. Cheung

In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data, apart from unveiling important information to the data analyst, can facilitate data management substantially. Based on this observation, we propose a framework that analyzes, manages, and queries object movements that follow such patterns. We define the spatiotemporal periodic pattern mining problem and propose an effective and fast mining algorithm for retrieving maximal periodic patterns. We also devise a novel, specialized index structure that can benefit from the discovered patterns to support more efficient execution of spatiotemporal queries. We evaluate our methods experimentally using datasets with object trajectories that exhibit periodicity.

international conference on data mining | 2005

Mining frequent spatio-temporal sequential patterns

Huiping Cao; Nikos Mamoulis; David W. Cheung

Many applications track the movement of mobile objects, which can be represented as sequences of timestamped locations. Given such a spatiotemporal series, we study the problem of discovering sequential patterns, which are routes frequently followed by the object. Sequential pattern mining algorithms for transaction data are not directly applicable for this setting. The challenges to address are: (i) the fuzziness of locations in patterns, and (ii) the identification of non-explicit pattern instances. In this paper, we define pattern elements as spatial regions around frequent line segments. Our method first transforms the original sequence into a list of sequence segments, and detects frequent regions in a heuristic way. Then, we propose algorithms to find patterns by employing a newly proposed substring tree structure and improving a priori technique. A performance evaluation demonstrates the effectiveness and efficiency of our approach.

IEEE Transactions on Knowledge and Data Engineering | 2007

Discovery of Periodic Patterns in Spatiotemporal Sequences

Huiping Cao; Nikos Mamoulis; David W. Cheung

In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data could unveil important information to the data analyst. Existing approaches for discovering periodic patterns focus on symbol sequences. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining periodic patterns in spatiotemporal data and propose an effective and efficient algorithm for retrieving maximal periodic patterns. In addition, we study two interesting variants of the problem. The first is the retrieval of periodic patterns that are frequent only during a continuous subinterval of the whole history. The second problem is the discovery of periodic patterns, whose instances may be shifted or distorted. We demonstrate how our mining technique can be adapted for these variants. Finally, we present a comprehensive experimental evaluation, where we show the effectiveness and efficiency of the proposed techniques

international conference on data mining | 2006

Discovery of Collocation Episodes in Spatiotemporal Data

Huiping Cao; Nikos Mamoulis; David W. Cheung

Given a collection of trajectories of moving objects with different types (e.g., pumas, deers, vultures, etc.), we introduce the problem of discovering collocation episodes in them (e.g., if a puma is moving near a deer, then a vulture is also going to move close to the same deer with high probability within the next 3 minutes). Collocation episodes catch the inter-movement regularities among different types of objects. We formally define the problem of mining collocation episodes and propose two scaleable algorithms for its efficient solution. We empirically evaluate the performance of the proposed methods using synthetically generated data that emulate real-world object movements.

pacific-asia conference on knowledge discovery and data mining | 2004

Discovering Partial Periodic Patterns in Discrete Data Sequences

Huiping Cao; David W. Cheung; Nikos Mamoulis

The problem of partial periodic pattern mining in a discrete data sequence is to find subsequences that appear periodically and frequently in the data sequence. Two essential subproblems are the efficient mining of frequent patterns and the automatic discovery of periods that correspond to these patterns. Previous methods for this problem in event sequence databases assume that the periods are given in advance or require additional database scans to compute periods that define candidate patterns. In this work, we propose a new structure, the abbreviated list table (ALT), and several efficient algorithms to compute the periods and the patterns, that require only a small number of passes. A performance study is presented to demonstrate the effectiveness and efficiency of our method.

symposium on large spatial databases | 2003

Evaluation of Iceberg Distance Joins

Yutao Shou; Nikos Mamoulis; Huiping Cao; Dimitris Papadias; David W. Cheung

The iceberg distance join returns object pairs within some distance from each other, provided that the first object appears at least a number of times in the result, e.g., “find hotels which are within 1km to at least 10 restaurants”. The output of this query is the subset of the corresponding distance join (e.g., “find hotels which are within 1km to some restaurant”) that satisfies the additional cardinality constraint. Therefore, it could be processed by using a conventional spatial join algorithm and then filtering-out the non-qualifying pairs. This approach, however, is expensive, especially when the cardinality constraint is highly selective. In this paper, we propose output-sensitive algorithms that prune the search space by integrating the cardinality with the distance constraint. We deal with cases of indexed/non-indexed datasets and evaluate the performance of the proposed techniques with extensive experimental evaluation covering a wide range of problem parameters.

power and energy society general meeting | 2016

Characterizing and quantifying noise in PMU data

Michael Brown; Milan Biswal; Sukumar M. Brahma; Satish J. Ranade; Huiping Cao

Data recorded by Phasor Measurement Units (PMUs) contains noise. This paper characterizes and quantifies this noise for voltage, current and frequency data recorded at three different voltage levels. The probability distribution of the measurement noise and its typical power are identified. The PMU noise quantification can help in generation of experimental PMU data in close conformity with field PMU data, bad data removal, missing data prediction, and effective design of statistical filters for noise rejection.

IEEE Transactions on Power Delivery | 2017

Real Time Identification of Dynamic Events in Power Systems using PMU data, and Potential Applications - Models, Promises, and Challenges

Sukumar M. Brahma; Rajesh Kavasseri; Huiping Cao; Nilanjan Ray Chaudhuri; Theodoros Alexopoulos; Yinan Cui

This paper explores the task of real-time identification of dynamic events leading to a layer of situational awareness that can become a reality due to increased penetration of phasor measurement units in transmission systems. Two underlying models for this task—data driven and physics based—are explored with examples. Challenges, advantages, and drawbacks of each model are discussed based on the availability of data, attributes of such data, and processing options. Potential applications of the task to improve security of power system protection and anomaly detection in the case of a cyberattack are conceptualized. Some known issues in data communications are discussed vis-a-vis the requirements imposed by the proposed task.

IEEE Transactions on Power Delivery | 2016

Supervisory Protection and Automated Event Diagnosis Using PMU Data

Milan Biswal; Sukumar M. Brahma; Huiping Cao

This paper presents a new framework for supervisory protection and situational awareness to enhance grid operations and protection using modern wide-area monitoring systems. In contrast to earlier approaches dealing with the combined processing of data from multiple phasor measurement units (PMUs), the proposed approach analyzes only the PMU data with the strongest or the most prominent disturbance signature. The specific contributions of this paper are: (a) new criteria for identification of PMU with the strongest signature, (b) simplified approach for quick detection of faults, (c) early classification of eight other disturbances suitable for near real-time response, (d) time-frequency transform-based feature extraction techniques for speedy and reliable classifiers, and (e) a promising approach to locate disturbances within narrow geographical constraints. The contributions are verified with exhaustive simulation data from the Western Electricity Coordination Council system model and limited real PMU data.

extending database technology | 2009

AlphaSum: size-constrained table summarization using value lattices

K. Selçuk Candan; Huiping Cao; Yan Qi; Maria Luisa Sapino

Consider a scientist who wants to explore multiple data sets to select the relevant ones for further analysis. Since the visualization real estate may put a stringent constraint on how much detail can be presented to this user in a single page, effective table summarization techniques are needed to create summaries that are both sufficiently small and effective in communicating the available content. In this paper, we first argue that table summarization can benefit from knowledge about acceptable value clustering alternatives for clustering the values in the database. We formulate the problem of table summarization with the help of value lattices. We then provide a framework to express alternative clustering strategies and to account for various utility measures (such as information loss) in assessing different summarization alternatives. Based on this interpretation, we introduce three preference criteria, max-min-util (cautious), max-sum-util (cumulative), and pareto-util, for the problem of table summarization. To tackle with the inherent complexity, we rely on the properties of the fuzzy interpretation to further develop a novel ranked set cover based evaluation mechanism (RSC). These are brought together in an AlphaSum, table summarization system. Experimental evaluations showed that RSC improves both execution times and the summary qualities in AlphaSum, by pruning the search space more effectively than the existing solutions.

Explore More