Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter Filzmoser is active.

Publication


Featured researches published by Peter Filzmoser.


Applied Geochemistry | 2002

Factor analysis applied to regional geochemical data: problems and possibilities

Clemens Reimann; Peter Filzmoser; Robert G. Garrett

Abstract Cluster analysis can be used to group samples and to develop ideas about the multivariate geochemistry of the data set at hand. Due to the complex nature of regional geochemical data (neither normal nor log-normal, strongly skewed, often multi-modal data distributions, data closure), cluster analysis results often strongly depend on the preparation of the data (e.g. choice of the transformation) and on the clustering algorithm selected. Different variants of cluster analysis can lead to surprisingly different cluster centroids, cluster sizes and classifications even when using exactly the same input data. Cluster analysis should not be misused as a statistical “proof” of certain relationships in the data. The use of cluster analysis as an exploratory data analysis tool requires a powerful program system to test different data preparation, processing and clustering methods, including the ability to present the results in a number of easy to grasp graphics. Such a tool has been developed as a package for the R statistical software. Two example data sets from geochemistry are used to demonstrate how the results change with different data preparation and clustering methods. A data set from S-Norway with a known number of clusters and cluster membership is used to test the performance of different clustering and data preparation techniques. For a complex data set from the Kola Peninsula, cluster analysis is applied to explore regional data structures.


Computers & Geosciences | 2005

Multivariate outlier detection in exploration geochemistry

Peter Filzmoser; Robert G. Garrett; Clemens Reimann

A new method for multivariate outlier detection able to distinguish between extreme values of a normal distribution and values originating from a different distribution (outliers) is presented. To facilitate visualising multivariate outliers spatially on a map, the multivariate outlier plot, is introduced. In this plot different symbols refer to a distance measure from the centre of the distribution, taking into account the shape of the distribution, and different colours are used to signify the magnitude of the values for each variable. The method is illustrated using a real geochemical data set from far-northern Europe. It is demonstrated that important processes such as the input of metals from contamination sources and the contribution of sea-salts via marine aerosols to the soil can be identified and separated.


Computational Statistics & Data Analysis | 2008

Outlier identification in high dimensions

Peter Filzmoser; Ricardo A. Maronna; Mark Werner

A computationally fast procedure for identifying outliers is presented that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high-dimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations. The performance of this method is illustrated on real and simulated data with dimension ranging in the thousands.


Journal of Multivariate Analysis | 2003

Robust factor analysis

Greet Pison; Peter J. Rousseeuw; Peter Filzmoser; Christophe Croux

Our aim is to construct a factor analysis method that can resist the effect of outliers. For this we start with a highly robust initial covariance estimator, after which the factors can be obtained from maximum likelihood or from principal factor analysis (PFA). We find that PFA based on the minimum covariance determinant scatter matrix works well. We also derive the influence function of the PFA method based on either the classical scatter matrix or a robust matrix. These results are applied to the construction of a new type of empirical influence function (EIF), which is very effective for detecting influential data. To facilitate the interpretation, we compute a cutoff value for this EIF. Our findings are illustrated with several real data examples.


Chemometrics and Intelligent Laboratory Systems | 2007

Algorithms for Projection-Pursuit Robust Principal Component Analysis

Christophe Croux; Peter Filzmoser; M.R. Oliveira

Principal Component Analysis (PCA) is very sensitive in presence of outliers. One of the most appealing robust methods for principal component analysis uses the Projection-Pursuit principle. Here, one projects the data on a lower-dimensional space such that a robust measure of variance of the projected data will be maximized. The Projection-Pursuit based method for principal component analysis has recently been introduced in the field of chemometrics, where the number of variables is typically large. In this paper, it is shown that the currently available algorithm for robust Projection-Pursuit PCA performs poor in presence of many variables. A new algorithm is proposed that is more suitable for the analysis of chemical data. Its performance is studied by means of simulation experiments and illustrated on some real datasets.


Computational Statistics & Data Analysis | 2010

Imputation of missing values for compositional data using classical and robust methods

Karel Hron; Matthias Templ; Peter Filzmoser

New imputation algorithms for estimating missing values in compositional data are introduced. A first proposal uses the k-nearest neighbor procedure based on the Aitchison distance, a distance measure especially designed for compositional data. It is important to adjust the estimated missing values to the overall size of the compositional parts of the neighbors. As a second proposal an iterative model-based imputation technique is introduced which initially starts from the result of the proposed k-nearest neighbor procedure. The method is based on iterative regressions, thereby accounting for the whole multivariate data information. The regressions have to be performed in a transformed space, and depending on the data quality classical or robust regression techniques can be employed. The proposed methods are tested on a real and on simulated data sets. The results show that the proposed methods outperform standard imputation methods. In the presence of outliers, the model-based method with robust regressions is preferable.


Computational Statistics & Data Analysis | 2007

Robust fitting of mixtures using the Trimmed Likelihood Estimator

N. M. Neykov; Peter Filzmoser; R. Dimova; P. N. Neytchev

The maximum likelihood estimator (MLE) has commonly been used to estimate the unknown parameters in a finite mixture of distributions. However, the MLE can be very sensitive to outliers in the data. In order to overcome this the trimmed likelihood estimator (TLE) is proposed to estimate mixtures in a robust way. The superiority of this approach in comparison with the MLE is illustrated by examples and simulation studies. Moreover, as a prominent measure of robustness, the breakdown point (BDP) of the TLE for the mixture component parameters is characterized. The relationship of the TLE with various other approaches that have incorporated robustness in fitting mixtures and clustering are also discussed in this context.


Science of The Total Environment | 2012

The concept of compositional data analysis in practice - Total major element concentrations in agricultural and grazing land soils of Europe

Clemens Reimann; Peter Filzmoser; Karl Fabian; Karel Hron; Manfred Birke; Alecos Demetriades; Enrico Dinelli; Anna Ladenberger

Applied geochemistry and environmental sciences invariably deal with compositional data. Classically, the original or log-transformed absolute element concentrations are studied. However, compositional data do not vary independently, and a concentration based approach to data analysis can lead to faulty conclusions. For this reason a better statistical approach was introduced in the 1980s, exclusively based on relative information. Because the difference between the two methods should be most pronounced in large-scale, and therefore highly variable, datasets, here a new dataset of agricultural soils, covering all of Europe (5.6 million km(2)) at an average sampling density of 1 site/2500 km(2), is used to demonstrate and compare both approaches. Absolute element concentrations are certainly of interest in a variety of applications and can be provided in tabulations or concentration maps. Maps for the opened data (ratios to other elements) provide more specific additional information. For compositional data XY plots for raw or log-transformed data should only be used with care in an exploratory data analysis (EDA) sense, to detect unusual data behaviour, candidate subgroups of samples, or to compare pre-defined groups of samples. Correlation analysis and the Euclidean distance are not mathematically meaningful concepts for this data type. Element relationships have to be investigated via a stability measure of the (log-)ratios of elements. Logratios are also the key ingredient for an appropriate multivariate analysis of compositional data.


Science of The Total Environment | 2010

The bivariate statistical analysis of environmental (compositional) data.

Peter Filzmoser; Karel Hron; Clemens Reimann

Environmental sciences usually deal with compositional (closed) data. Whenever the concentration of chemical elements is measured, the data will be closed, i.e. the relevant information is contained in the ratios between the variables rather than in the data values reported for the variables. Data closure has severe consequences for statistical data analysis. Most classical statistical methods are based on the usual Euclidean geometry - compositional data, however, do not plot into Euclidean space because they have their own geometry which is not linear but curved in the Euclidean sense. This has severe consequences for bivariate statistical analysis: correlation coefficients computed in the traditional way are likely to be misleading, and the information contained in scatterplots must be used and interpreted differently from sets of non-compositional data. As a solution, the ilr transformation applied to a variable pair can be used to display the relationship and to compute a measure of stability. This paper discusses how this measure is related to the usual correlation coefficient and how it can be used and interpreted. Moreover, recommendations are provided for how the scatterplot can still be used, and which alternatives exist for displaying the relationship between two variables.


ieee vgtc conference on visualization | 2011

Uncertainty-aware exploration of continuous parameter spaces using multivariate prediction

Wolfgang Berger; Harald Piringer; Peter Filzmoser; M. Eduard Gröller

Systems projecting a continuous n‐dimensional parameter space to a continuous m‐dimensional target space play an important role in science and engineering. If evaluating the system is expensive, however, an analysis is often limited to a small number of sample points. The main contribution of this paper is an interactive approach to enable a continuous analysis of a sampled parameter space with respect to multiple target values. We employ methods from statistical learning to predict results in real‐time at any user‐defined point and its neighborhood. In particular, we describe techniques to guide the user to potentially interesting parameter regions, and we visualize the inherent uncertainty of predictions in 2D scatterplots and parallel coordinates. An evaluation describes a real‐world scenario in the application context of car engine design and reports feedback of domain experts. The results indicate that our approach is suitable to accelerate a local sensitivity analysis of multiple target dimensions, and to determine a sufficient local sampling density for interesting parameter regions.

Collaboration


Dive into the Peter Filzmoser's collaboration.

Top Co-Authors

Avatar

Matthias Templ

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Christophe Croux

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Robert G. Garrett

Geological Survey of Canada

View shared research outputs
Top Co-Authors

Avatar

Kurt Varmuza

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Rudolf Dutter

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Andreas Alfons

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Klaudius Kalcher

Medical University of Vienna

View shared research outputs
Top Co-Authors

Avatar

Valentin Todorov

United Nations Industrial Development Organization

View shared research outputs
Top Co-Authors

Avatar

Wolfgang Huf

Medical University of Vienna

View shared research outputs
Top Co-Authors

Avatar

Ewald Moser

Medical University of Vienna

View shared research outputs
Researchain Logo
Decentralizing Knowledge