William Perrizo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William Perrizo is active.

Explore More

Publication

Featured researches published by William Perrizo.

knowledge discovery and data mining | 2002

k-nearest Neighbor Classification on Spatial Data Streams Using P-trees

Maleq Khan; Qin Ding; William Perrizo

Classification of spatial data streams is crucial, since the training dataset changes often. Building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. KNN is extremely simple to implement and lends itself to a wide variety of variations. We propose a new method of KNN classification for spatial data using a new, rich, data-mining-ready structure, the Peano-count-tree (P-tree). We merely perform some AND/OR operations on P-trees to find the nearest neighbors of a new sample and assign the class label. We have fast and efficient algorithms for the AND/OR operations, which reduce the classification time significantly. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification accuracy as well as significantly higher speed.

acm symposium on applied computing | 2002

Decision tree classification of spatial data streams using Peano Count Trees

Qiang Ding; Qin Ding; William Perrizo

Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc. These data collections are growing rapidly and can therefore be considered as spatial data streams. For data stream classification, time is a major issue. However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods. In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree). The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques. Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved. We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive). Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams.

acm symposium on applied computing | 2002

The P-tree algebra

Qin Ding; Maleq Khan; Amalendu Roy; William Perrizo

The Peano Count Tree (P-tree) is a quadrant-based lossless tree representation of the original spatial data. The idea of P-tree is to recursively divide the entire spatial data, such as Remotely Sensed Imagery data, into quadrants and record the count of 1-bits for each quadrant, thus forming a quadrant count tree. Using P-tree structure, all the count information can be calculated quickly. This facilitates efficient ways for data mining. In this paper, we will focus on the algebra and properties of P-tree structure and its variations. We have implemented fast algorithms for P-tree generation and P-tree operations. Our performance analysis shows P-tree has small space and time costs compared to the original data. We have also implemented some data mining algorithms using P-trees, such as Association Rule Mining, Decision Tree Classification and K-Clustering.

knowledge discovery and data mining | 2002

Association Rule Mining on Remotely Sensed Images Using P-trees

Qin Ding; Qiang Ding; William Perrizo

Association Rule Mining, originally proposed for market basket data, has potential applications in many areas. Remote Sensed Imagery (RSI) data is one of the promising application areas. Extracting interesting patterns and rules from datasets composed of images and associated ground data, can be of importance in precision agriculture, community planning, resource discovery and other areas. However, in most cases the image data sizes are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an approach to derive association rules on RSI data using Peano Count Tree (P-tree) structure. P-tree structure, proposed in our previous work, provides a lossless and compressed representation of image data. Based on P-trees, an efficient association rule mining algorithm P-ARM with fast support calculation and significant pruning techniques are introduced to improve the efficiency of the rule mining process. P-ARM algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.

acm symposium on applied computing | 2004

An optimized approach for KNN text categorization using P-trees

Imad Rahal; William Perrizo

The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is the process of assigning categories or labels to documents based entirely on their contents. Formally, it can be viewed as a mapping from the document space into a set of predefined class labels (aka subjects or categories); F: D← {C1, C2...Cn} where F is the mapping function, D is the document space and {C1, C2...Cn} is the set of class labels. Given an unlabeled document d, we need to find its class label, Ci, using the mapping function F where F(d) = Ci. In this paper, an optimized k-Nearest Neighbors (KNN) classifier that uses intervalization and the P-tree1 technology to achieve a high degree of accuracy, space utilization and time efficiency is proposed: As new samples arrive, the classifier finds the k nearest neighbors to the new sample from the training space without a single database scan.

international conference on data mining | 2004

RDF: a density-based outlier detection method using vertical data representation

Dongmei Ren; Baoying Wang; William Perrizo

Outlier detection can lead to discovering unexpected and interesting knowledge, which is critical important to some areas such as monitoring of criminal activities in electronic commerce, credit card fraud, etc. In this paper, we developed an efficient density-based outlier detection method for large datasets. Our contributions are: a) we introduce a relative density factor (RDF); b) based on RDF, we propose an RDF-based outlier detection method which can efficiently prune the data points which are deep in clusters, and detect outliers only within the remaining small subset of the data; c) the performance of our method is further improved by means of a vertical data representation, P-trees. We tested our method with NHL and NBA data. Our method shows an order of magnitude speed improvement compared to the contemporary approaches.

international conference on management of data | 1991

HYDRO: a heterogeneous distributed database system

William Perrizo; Joseph Rajkumar; Prabhu Ram

In this paper we show how global serializability and atomic commit can be attained in a Heterogeneous Distributed Database Management System in which each local DBMS is assumed to be an off-the-shelf, binarylicensed commercial product providing the IBM SAA Common Programming Interface ([SAA88]). Our HYDRO system achieves global serializability using a set of objects based on the Request Order Linked List or ROLL object developed in [PER89] and [PER91 ]. ROLL is based on the general Serialization Graph Test methodology ([BER87]), and provides freedom from idle-wait, deadJock, livelock and restart. Atomic commitment is based on Two-Phase Commit. Two options are offered to achieve the PREPARED state locally. HYDRO-I achieves the PREPARED state by protecting writes during the uncertainty period. HYDRO-H provides more concurrency, but raises the commitment overhead in the absence of a visible PREPARED state offered by the local DBMS.

systems man and cybernetics | 2008

PARM—An Efficient Algorithm to Mine Association Rules From Spatial Data

Qin Ding; Qiang Ding; William Perrizo

Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano count tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based association rule mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.

Israel Journal of Mathematics | 1978

Unique ergodicity of flows on homogeneous spaces

Robert Ellis; William Perrizo

LetG be a unimodular Lie group, Γ a co-compact discrete subgroup ofG and ‘a’ a semisimple element ofG. LetTa be the mapgΓ →ag Γ:G/Γ →G/Γ. The following statements are pairwise equivalent: (1) (Ta, G/Γ,θ) is weak-mixing. (2) (Ta, G/Γ) is topologically weak-mixing. (3) (Gu, G/Γ) is uniquely ergodic. (4) (Gu, G/Γ,θ) is ergodic. (5) (Gu, G/Γ) is point transitive. (6) (Gu, G/Γ) is minimal. If in additionG is semisimple with finite center and no compact factors, then the statement “(Ta, G/Γ,θ) is ergodic” may be added to the above list.

web age information management | 2001

Deriving High Confidence Rules from Spatial Data Using Peano Count Trees

William Perrizo; Qin Ding; Qiang Ding; Amalendu Roy

The traditional task of association rule mining is to find all rules with high support and high confidence. In some applications, such as mining spatial datasets for natural resource location, the task is to find high confidence rules even though the support may be low. In still other applications, such as the identification of agricultural pest infestations, the task is to find high confidence rules preferably while the support is still very low. The basic Apriori algorithm cannot be used to solve these problems efficiently since it relies on first identifying all high support itemsets. In this paper, we propose a new model to derive high confidence rules for spatial data regardless of their support level. A new data structure, the Peano Count Tree (P-tree), is used in our model to represent all the information we need. P-trees represent spatial data bit-by-bit in a recursive quadrant-by-quadrant arrangement. Based on the P-tree, we build a special data cube, the Tuple Count Cube (T-cube), to derive high confidence rules. Our algorithm for deriving confident rules is fast and efficient. In addition, we discuss some strategies for avoiding over-fitting (removing redundant and misleading rules).

Explore More