Pusheng Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pusheng Zhang is active.

Explore More

Publication

Featured researches published by Pusheng Zhang.

Geoinformatica | 2003

A Unified Approach to Detecting Spatial Outliers

Shashi Shekhar; Chang-Tien Lu; Pusheng Zhang

Spatial outliers represent locations which are significantly different from their neighborhoods even though they may not be significantly different from the entire population. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. In this paper, we first provide a general definition of S-outliers for spatial outliers. This definition subsumes the traditional definitions of spatial outliers. Second, we characterize the computation structure of spatial outlier detection methods and present scalable algorithms. Third, we provide a cost model of the proposed algorithms. Finally, we experimentally evaluate our algorithms using a Minneapolis-St. Paul (Twin Cities) traffic data set.

knowledge discovery and data mining | 2001

Detecting graph-based spatial outliers: algorithms and applications (a summary of results)

Shashi Shekhar; Chang-Tien Lu; Pusheng Zhang

Identification of outliers can lead to the discovery of unexpected, interesting, and useful knowledge. Existing methods are designed for detecting spatial outliers in multidimensional geometric data sets, where a distance metric is available. In this paper, we focus on detecting spatial outliers in graph structured data sets. We define statistical tests, analyze the statistical foundation underlying our approach, design several fast algorithms to detect spatial outliers, and provide a cost model for outlier detection procedures. In addition, we provide experimental results from the application of our algorithms on a Minneapolis-St.Paul(Twin Cities) traffic dataset to show their effectiveness and usefulness.

IEEE Transactions on Knowledge and Data Engineering | 2008

A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets

Yan Huang; Liqin Zhang; Pusheng Zhang

Given a large spatio-temporal database of events, where each event consists of the fields event ID, time, location, and event type, mining spatio-temporal sequential patterns identifies significant event-type sequences. Such spatio-temporal sequential patterns are crucial to the investigation of spatial and temporal evolutions of phenomena in many application domains. Recent research literature has explored the sequential patterns on transaction data and trajectory analysis on moving objects. However, these methods cannot be directly applied to mining sequential patterns from a large number of spatio-temporal events. Two major research challenges still remain: 1) the definition of significance measures for spatio-temporal sequential patterns to avoid spurious ones and 2) the algorithmic design under the significance measures, which may not guarantee the downward closure property. In this paper, we propose a sequence index as the significance measure for spatio-temporal sequential patterns, which is meaningful due to its interpretability using spatial statistics. We propose a novel algorithm called Slicing-STS-miner to tackle the algorithmic design challenge using the spatial sequence index, which does not preserve the downward closure property. We compare the proposed algorithm with a simple algorithm called STS-miner that utilizes the weak monotone property of the sequence index. Performance evaluations using both synthetic and real-world data sets show that the slicing-STS-miner is an order of magnitude faster than STS-Miner for large data sets.

international conference on tools with artificial intelligence | 2002

Data mining for selective visualization of large spatial datasets

S. Sekhar; Chang-Tien Lu; Pusheng Zhang; Rulin Liu

Data mining is the process of extracting implicit, valuable, and interesting information from large sets of data. Visualization is the process of visually exploring data for pattern and trend analysis, and it is a common method of browsing spatial datasets to look for patterns. However the growing volume of spatial datasets make it difficult for humans to browse such datasets in their entirety, and data mining algorithms are needed to filter out large uninteresting parts of spatial datasets. We construct a web-based visualization software package for observing the summarization of spatial patterns and temporal trends. We also present data mining algorithms for filtering out vast parts of datasets for spatial outlier patterns. The algorithms were implemented and tested with a real-world set of Minneapolis-St. Paul (Twin Cities) traffic data.

knowledge discovery and data mining | 2003

Correlation analysis of spatial time series datasets: a filter-and-refine approach

Pusheng Zhang; Yan Huang; Shashi Shekhar; Vipin Kumar

A spatial time series dataset is a collection of time series, each referencing a location in a common spatial framework. Correlation analysis is often used to identify pairs of potentially interacting elements from the cross product of two spatial time series datasets. However, the computational cost of correlation analysis is very high when the dimension of the time series and the number of locations in the spatial frameworks are large. The key contribution of this paper is the use of spatial autocorrelation among spatial neighboring time series to reduce computational cost. A filter-and-refine algorithm based on coning, i.e. grouping of locations, is proposed to reduce the cost of correlation analysis over a pair of spatial time series datasets. Cone-level correlation computation can be used to eliminate (filter out) a large number of element pairs whose correlation is clearly below (or above) a given threshold. Element pair correlation needs to be computed for remaining pairs. Using experimental studies with Earth science datasets, we show that the filter-and-refine approach can save a large fraction of the computational cost, particularly when the minimal correlation threshold is high.

international conference on tools with artificial intelligence | 2006

On the Relationships between Clustering and Spatial Co-location Pattern Mining

Yan Huang; Pusheng Zhang

The goal of spatial co-location pattern mining is to find subsets of spatial features frequently located together in spatial proximity. Example co-location patterns include services requested frequently and located together from mobile devices (e.g., PDAs and cellular phones) and symbiotic species in ecology (e.g., Nile crocodile and Egyptian plover). Spatial clustering groups similar spatial objects together. Reusing research results in clustering, e.g. algorithms and visualization techniques, by mapping co-location mining problem into a clustering problem would be very useful. However, directly clustering spatial objects from various spatial features may not yield well-defined co-location patterns. Clustering spatial objects in each layer followed by overlaying the layers of clusters may not applicable to many application domains where the spatial objects in some layers are not clustered. In this paper, we propose a new approach to the problem of mining co-location patterns using clustering techniques. First, we propose a novel framework for co-location mining using clustering techniques. We show that the proximity of two spatial features can be captured by summarizing their spatial objects embedded in a continuous space via various techniques. We define the desired properties of proximity functions compared to similarity functions in clustering. Furthermore, we summarize the properties of a list of popular spatial statistical measures as the proximity functions. Finally, we show that clustering techniques can be applied to reveal the rich structure formed by co-located spatial features. A case study on real datasets shows that our method is effective for mining co-locations from large spatial datasets

symposium on large spatial databases | 2003

Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries

Pusheng Zhang; Yan Huang; Shashi Shekhar; Vipin Kumar

Archive | 2005

Discovery of patterns in earth science data using data mining

Pusheng Zhang; Michael Steinbach; Vipin Kumar; Shashi Shekhar; Pang Ning Tan; Steven A. Klooster; Christopher Potter

This chapter focuses on the development of an active learning approach to an image mining problem for detectingEgeria densa(a Brazilian waterweed) in digital imagery. An effective way of automatic image classification is to employ learning systems. However, due to a large number of images, it is often impractical to manually create labeled data for supervised learning. On the other hand, classification systems generally require labeled data to carry out learning. In order to strike a balance between the difficulty of obtaining labeled images and the need for labeled data, we explore an active learning approach to image mining. The goal is to minimize the task of expert labeling of images: if labeling is necessary, only those important parts of an image will be presented to experts for labeling. The critical issues are: (1) how to determine what should be presented to experts; (2) how to minimize the number of those parts for labeling; and (3) after a small number of labeled instances are available, how to effectively learn a classifier and apply it to new images. We propose to use ensemble methods for active learning in Egeria detection. Our approach is to use the combined classifications of the ensemble of classifiers to reduce the number of uncertain instances in the image classification process and thus achieve reduced expert involvement in image labeling. We demonstrate the effectiveness of our proposed system via experiments using a real-world application of Egeria detection. Practical concerns in image mining using active learning are also addressed and discussed.Trends in data-mining applications : from research labs to fortune 500 companies. 1. Mining wafer fabrication : framework and challenges. 2. Damage detection employing data-mining techniques. 3. Data projection techniques and their application in sensor array data processing. 4. An application of evolutionary and neural data-mining techniques to customer relationship management. 5. Sales opportunity miner : data mining for automatic evaluation of sales opportunity. 6. A fully distributed framework for cost-sensitive data mining. 7. Application of variable precision rough set approach to care driver assessment. 8. Discovery of patterns in earth science data using data mining. 9. An active learning approach to Egeria densa detection in digital imagery. 10. Experiences in mining data from computer simulations. 11. Statistical modeling of large-scale scientific simulation data. 12. Data mining for gene mapping. 13. Data-mining techniques for microarray data analysis. 14. The use of emerging patterns in the analysis of gene expression profiles for the diagnosis and understanding of diseases. 15. Proteomic data analysis : pattern recognition for medical diagnosis and biomarker discovery. 16. Discovering patterns and reference models in the medical domain of isokinetics. 17. Mining the cystic fibrosis data. 18. On learning strategies for topic-specific web crawling. 19. On analyzing web log data : a parallel sequence-mining algorithm. 20. Interactive methods for taxonomy editing and validation. 21. The use of data-mining techniques in operational crime fighting. 22 .Using data mining for intrusion detection. 23. Mining closed and maximal frequent itemsets. 24. Using fractals in data mining. 25 .Genetic search for logic structures in data.

Earth Interactions | 2004

Understanding Controls on Historical River Discharge in the World’s Largest Drainage Basins

Christopher Potter; Pusheng Zhang; Steven A. Klooster; Vanessa Genovese; Shashi Shekhar; Vipin Kumar

Abstract Long-term (20 yr) river discharge records from 30 of the world’s largest river basins have been used to characterize surface hydrologic flows in relation to net precipitation inputs, ocean climate teleconnections, and human land/water use patterns. This groundwork study is presented as a precedent to distributed simulation modeling of surface hydrologic flows in large river basins. Correlation analysis is used as a screening method to classify river basins into categories based on major controls on discharge, for example, climate, land use, and dams. Comparisons of paired station records at upstream and downstream discharge locations within each major river basin suggest that the discharge signals represented in upstream discharge records are sustained in the downstream station records for nearly two-thirds of the drainage basins selected. River basins that showed the strongest localized climate control over historical discharge records, in terms of correlations with net basinwide precipitation r...

International Journal on Artificial Intelligence Tools | 2008