Michael Steinbach
IEEE Computer Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Steinbach.
Archive | 2004
Michael Steinbach; Levent Ertoz; Vipin Kumar
Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, or as a means of data compression. While clustering has a long history and a large number of clustering techniques have been developed in statistics, pattern recognition, data mining, and other fields, significant challenges still remain. In this chapter we provide a short introduction to cluster analysis, and then focus on the challenge of clustering high dimensional data. We present a brief overview of several recent techniques, including a more detailed description of recent work of our own which uses a concept-based clustering approach.
IEEE Transactions on Knowledge and Data Engineering | 2006
Hui Xiong; Gaurav Pandey; Michael Steinbach; Vipin Kumar
Removing objects that are noisy is an important goal of data cleaning as noise hinders most types of data analysis. Most existing data cleaning methods focus on removing noise that is the product of low-level data errors that result from an imperfect data collection process, but data objects that are irrelevant or only weakly relevant can also significantly hinder data analysis. Thus, if the goal is to enhance the data analysis as much as possible, these objects should also be considered as noise, at least with respect to the underlying analysis. Consequently, there is a need for data cleaning techniques that remove both types of noise. Because data sets can contain large amounts of noise, these techniques also need to be able to discard a potentially large fraction of the data. This paper explores four techniques intended for noise removal to enhance data analysis in the presence of high noise levels. Three of these methods are based on traditional outlier detection techniques: distance-based, clustering-based, and an approach based on the local outlier factor (LOF) of an object. The other technique, which is a new method that we are proposing, is a hyperclique-based data cleaner (HCleaner). These techniques are evaluated in terms of their impact on the subsequent data analysis, specifically, clustering and association analysis. Our experimental results show that all of these methods can provide better clustering performance and higher quality association patterns as the amount of noise being removed increases, although HCleaner generally leads to better clustering performance and higher quality associations than the other three methods for binary data.
Earth Interactions | 2005
C. Potter; Steven A. Klooster; Pang Ning Tan; Michael Steinbach; Vipin Kumar; V. Genovese
Abstract Seventeen years (1982–98) of net carbon flux predictions for Southern Hemisphere continents have been analyzed, based on a simulation model using satellite observations of monthly vegetation cover. The NASA Carnegie Ames Stanford Approach (CASA) model was driven by vegetation-cover properties derived from the Advanced Very High Resolution Radiometer and radiative transfer algorithms that were developed for the Moderate Resolution Imaging Spectroradiometer (MODIS). The terrestrial ecosystem flux for atmospheric CO2 for the Amazon region of South America has been predicted between a biosphere source of –0.17 Pg C per year (in 1983) and a biosphere sink of +0.64 Pg C per year (in 1989). The areas of highest variability in net ecosystem production (NEP) fluxes across all of South America were detected in the south-central rain forest areas of the Amazon basin and in southeastern Brazil. Similar levels of variability were recorded across central forested portions of Africa and in the southern horn of ...
Archive | 2005
Pang-Ning Tan; Michael Steinbach; Vipin Kumar
Archive | 2005
Pang Ning Tan; Michael Steinbach; Vipin Kumar
Archive | 2000
Michael Steinbach; George Karypis; Vipin Kumar
Archive | 2006
Pang Ning Tan; Michael Steinbach; Vipin Kumar
Pulmonology and Respiratory Research | 2015
Sarah Roark; Brian Sandri; Sanjoy Dey; Michael Steinbach; Trisha Becker; Chris H. Wendt
Archive | 2004
Michael Steinbach; Pang Ning Tan; Huayu Xiong; Vipin Kumar
Archive | 2016
Pang Ning Tan; Michael Steinbach; Vipin Kumar