Michael Steinbach | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Steinbach is active.

Explore More

Publication

Featured researches published by Michael Steinbach.

Archive | 2004

The Challenges of Clustering High Dimensional Data

Michael Steinbach; Levent Ertoz; Vipin Kumar

Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, or as a means of data compression. While clustering has a long history and a large number of clustering techniques have been developed in statistics, pattern recognition, data mining, and other fields, significant challenges still remain. In this chapter we provide a short introduction to cluster analysis, and then focus on the challenge of clustering high dimensional data. We present a brief overview of several recent techniques, including a more detailed description of recent work of our own which uses a concept-based clustering approach.

IEEE Transactions on Knowledge and Data Engineering | 2006

Enhancing data analysis with noise removal

Hui Xiong; Gaurav Pandey; Michael Steinbach; Vipin Kumar

Removing objects that are noisy is an important goal of data cleaning as noise hinders most types of data analysis. Most existing data cleaning methods focus on removing noise that is the product of low-level data errors that result from an imperfect data collection process, but data objects that are irrelevant or only weakly relevant can also significantly hinder data analysis. Thus, if the goal is to enhance the data analysis as much as possible, these objects should also be considered as noise, at least with respect to the underlying analysis. Consequently, there is a need for data cleaning techniques that remove both types of noise. Because data sets can contain large amounts of noise, these techniques also need to be able to discard a potentially large fraction of the data. This paper explores four techniques intended for noise removal to enhance data analysis in the presence of high noise levels. Three of these methods are based on traditional outlier detection techniques: distance-based, clustering-based, and an approach based on the local outlier factor (LOF) of an object. The other technique, which is a new method that we are proposing, is a hyperclique-based data cleaner (HCleaner). These techniques are evaluated in terms of their impact on the subsequent data analysis, specifically, clustering and association analysis. Our experimental results show that all of these methods can provide better clustering performance and higher quality association patterns as the amount of noise being removed increases, although HCleaner generally leads to better clustering performance and higher quality associations than the other three methods for binary data.

Earth Interactions | 2005

Variability in Terrestrial Carbon Sinks over Two Decades. Part III: South America, Africa, and Asia

C. Potter; Steven A. Klooster; Pang Ning Tan; Michael Steinbach; Vipin Kumar; V. Genovese

Abstract Seventeen years (1982–98) of net carbon flux predictions for Southern Hemisphere continents have been analyzed, based on a simulation model using satellite observations of monthly vegetation cover. The NASA Carnegie Ames Stanford Approach (CASA) model was driven by vegetation-cover properties derived from the Advanced Very High Resolution Radiometer and radiative transfer algorithms that were developed for the Moderate Resolution Imaging Spectroradiometer (MODIS). The terrestrial ecosystem flux for atmospheric CO2 for the Amazon region of South America has been predicted between a biosphere source of –0.17 Pg C per year (in 1983) and a biosphere sink of +0.64 Pg C per year (in 1989). The areas of highest variability in net ecosystem production (NEP) fluxes across all of South America were detected in the south-central rain forest areas of the Amazon basin and in southeastern Brazil. Similar levels of variability were recorded across central forested portions of Africa and in the southern horn of ...

Archive | 2005