Cory Reina | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cory Reina is active.

Explore More

Publication

Featured researches published by Cory Reina.

international conference on pattern recognition | 2000

Clustering very large databases using EM mixture models

Paul S. Bradley; U. M. Fayyad; Cory Reina

Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the expectation-maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required to converge and large memory requirement of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small amount of memory by limiting iterations to small data subsets. The scalable EM approach requires at most one database scan and is based on identifying regions of the data that are discardable, regions that are compressible, and regions that must be maintained in memory. Data resolution is preserved to the extent possible based upon the size of the memory buffer and fit of the current model to the data. Computational tests demonstrate that the scalable scheme outperforms similarly constrained EM approaches.

Archive | 2001

Scalable Probabilistic Clustering

Paul S. Bradley; U. M. Fayyad; Cory Reina

The Expectation-Maximization (EM) algorithm is a popular approach to probabilistic database clustering. A database of observations is clustered by identifying k sub-populations and summarizing each sub- population with a model or probability density function. The EM algorithm is an approach that iteratively estimates the memberships of the observations in each cluster and the parameters of the k density functions for each cluster. Typical EM implementations require a full database scan at each iteration and the number of iterations required to converge is arbitrary. For large databases, these scans become prohibitively expensive. We present a scalable implementation of the EM algorithm based upon identifying regions of the data that are compressible and regions that must be maintained in memory. The approach operates within the confines of a limited main memory buffer. Data resolution is preserved to the extent possible based upon the size of the memory buffer and the fit of the current clustering model to the data. We extend the framework to update multiple cluster models simultaneously. Computational tests indicate that this scalable scheme outperforms sampling-based and incremental approaches — the straightforward alternatives to “scaling” existing traditional in-memory implementations to large databases.

knowledge discovery and data mining | 1998

Scaling clustering algorithms to large databases

Paul S. Bradley; Usama M. Fayyad; Cory Reina

knowledge discovery and data mining | 1998

Initialization of iterative refinement clustering algorithms

Usama M. Fayyad; Cory Reina; Paul S. Bradley

Archive | 1998

Scaling EM (Expectation Maximization) Clustering to Large Databases

Paul S. Bradley; Usama M. Fayyad; Cory Reina

Archive | 1998

Scalable system for K-means clustering of large databases

Usama M. Fayyad; Paul S. Bradley; Cory Reina

Archive | 1998

Scalable system for clustering of large databases

Usama M. Fayyad; Paul S. Bradley; Cory Reina

Archive | 1999

Scalable system for clustering of large databases having mixed data attributes

Usama M. Fayyad; Paul S. Bradley; Cory Reina

Archive | 1998

Scalable system for expectation maximization clustering of large databases

Usama M. Fayyad; Paul S. Bradley; Cory Reina

Archive | 1995

Interface sharing between objects

Raman Narayanan; Cory Reina

Explore More

Collaboration

Dive into the Cory Reina's collaboration.

Top Co-Authors

Paul S. Bradley

Microsoft

View shared research outputs

Top Co-Authors

Usama M. Fayyad

Microsoft

View shared research outputs

Top Co-Authors

Girish Bablani

Microsoft

View shared research outputs

Top Co-Authors

Raman Narayanan

Microsoft

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Cory Reina is active.

Publication

Featured researches published by Cory Reina.

Clustering very large databases using EM mixture models

Scalable Probabilistic Clustering

Scaling clustering algorithms to large databases

Initialization of iterative refinement clustering algorithms

Scaling EM (Expectation Maximization) Clustering to Large Databases

Scalable system for K-means clustering of large databases

Scalable system for clustering of large databases

Scalable system for clustering of large databases having mixed data attributes

Scalable system for expectation maximization clustering of large databases

Interface sharing between objects

Collaboration

Dive into the Cory Reina's collaboration.