Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Inge Koch is active.

Publication


Featured researches published by Inge Koch.


Computational Statistics & Data Analysis | 2008

Feature significance for multivariate kernel density estimation

Tarn Duong; Arianna Cowling; Inge Koch; M. P. Wand

Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features-such as local extrema-are statistically significant. This paper proposes a framework for feature significance in d-dimensional data which combines kernel density derivative estimators and hypothesis tests for modal regions. For the gradient and curvature estimators distributional properties are given, and pointwise test statistics are derived. The hypothesis tests extend the two-dimensional feature significance ideas of Godtliebsen et al. [Godtliebsen, F., Marron, J.S., Chaudhuri, P., 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11, 1-21]. The theoretical framework is complemented by novel visualization for three-dimensional data. Applications to real data sets show that tests based on the kernel curvature estimators perform well in identifying modal regions. These results can be enhanced by corresponding tests with kernel gradient estimators.


Neural Computation | 2007

Dimension Selection for Feature Selection and Dimension Reduction with Principal and Independent Component Analysis

Inge Koch; Kanta Naito

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.


Biometrical Journal | 2009

Highest density difference region estimation with application to flow cytometric data

Tarn Duong; Inge Koch; M. P. Wand

Motivated by the needs of scientists using flow cytometry, we study the problem of estimating the region where two multivariate samples differ in density. We call this problem highest density difference region estimation and recognise it as a two-sample analogue of highest density region or excess set estimation. Flow cytometry samples are typically in the order of 10,000 and 100,000 and with dimension ranging from about 3 to 20. The industry standard for the problem being studied is called Frequency Difference Gating, due to Roederer and Hardy (2001). After couching the problem in a formal statistical framework we devise an alternative estimator that draws upon recent statistical developments such as patient rule induction methods. Improved performance is illustrated in simulations. While motivated by flow cytometry, the methodology is suitable for general multivariate random samples where density difference regions are of interest.


Computational Statistics & Data Analysis | 2010

Prediction of multivariate responses with a selected number of principal components

Inge Koch; Kanta Naito

This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into the classification of high dimension low sample size (HDLSS) data, in particular microarray data, has made considerable advances, but regression prediction for high-dimensional data with continuous responses has had less attention. Recently Bair et al. (2006) proposed an efficient prediction method based on supervised principal component regression (PCR). Motivated by the fact that using a larger number of principal components results in better regression performance, this paper extends the method of Bair et al. in several ways: a comprehensive variable ranking is combined with a selection of the best number of components for PCR, and the new method further extends to regression with multivariate responses. The new method is particularly suited to addressing HDLSS problems. Applications to simulated and real data demonstrate the performance of the new method. Comparisons with the findings of Bair et al. (2006) show that for high-dimensional data in particular the new ranking results in a smaller number of predictors and smaller errors.


International Journal of Computational Intelligence and Applications | 2008

DESIGNING RELEVANT FEATURES FOR CONTINUOUS DATA SETS USING ICA

Mithun Prasad; Arcot Sowmya; Inge Koch

Isolating relevant information and reducing the dimensionality of the original data set are key areas of interest in pattern recognition and machine learning. In this paper, a novel approach to reducing dimensionality of the feature space by employing independent component analysis (ICA) is introduced. While ICA is primarily a feature extraction technique, it is used here as a feature selection/construction technique in a generic way. The new technique, called feature selection based on independent component analysis (FS_ICA), efficiently builds a reduced set of features without loss in accuracy and also has a fast incremental version. When used as a first step in supervised learning, FS_ICA outperforms comparable methods in efficiency without loss of classification accuracy. For large data sets as in medical image segmentation of high-resolution computer tomography images, FS_ICA reduces dimensionality of the data set substantially and results in efficient and accurate classification.


Australian & New Zealand Journal of Statistics | 2016

Evaluating the Contributions of Individual Variables to a Quadratic Form

Paul H. Garthwaite; Inge Koch

Summary Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear‐cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotellings one‐ and two‐sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.


Cytometry Part A | 2016

Computationally efficient multidimensional analysis of complex flow cytometry data using second order polynomial histograms

John Zaunders; Junmei Jing; Michael D. Leipold; Holden T. Maecker; Anthony D. Kelleher; Inge Koch

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4–8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in <10 s, on a standard laptop. The method also correctly clustered granulocytes, lymphocytes, including standard T, B, and NK cell subsets, and monocytes in 9‐color stained peripheral blood, within seconds. SOPHE successfully clustered up to 36 subsets of memory CD4 T cells using differentiation and trafficking markers, in 14‐color flow analysis, and up to 65 subpopulations of PBMC in 33‐dimensional CyTOF data, showing its usefulness in discovery research. SOPHE has the potential to greatly increase efficiency of analysing complex mixtures of cells in higher dimensions.


Proteomics | 2016

Classification of MALDI-MS imaging data of tissue microarrays using canonical correlation analysis-based variable selection.

Lyron Winderbaum; Inge Koch; Parul Mittal; Peter Hoffmann

Applying MALDI‐MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost‐ and time‐efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data—high‐dimensional low sample size—provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI‐MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3–20.9% of patients by leave‐one‐out cross‐validation and strongly outperforms LDA after reduction of the original data with principle component analysis.


The Annals of Applied Statistics | 2015

Feature extraction for proteomics imaging mass spectrometry data

Lyron Winderbaum; Inge Koch; Ove J. R. Gustafsson; Stephan Meding; Peter Hoffmann

Imaging mass spectrometry (IMS) has transformed proteomics by providing an avenue for collecting spatially distributed molecular data. Mass spectrometry data acquired with matrix assisted laser desorption ionization (MALDI) IMS consist of tens of thousands of spectra, measured at regular grid points across the surface of a tissue section. Unlike the more standard liquid chromatography mass spectrometry, MALDI-IMS preserves the spatial information inherent in the tissue. Motivated by the need to differentiate cell populations and tissue types in MALDI-IMS data accurately and efficiently, we propose an integrated cluster and feature extraction approach for such data. We work with the derived binary data representing presence/absence of ions, as this is the essential information in the data. Our approach takes advantage of the spatial structure of the data in a noise removal and initial dimension reduction step and applies


BMC Bioinformatics | 2015

Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models

Sean Robinson; Garique Glonek; Inge Koch; Mark R. Thomas; Christopher Davies

k

Collaboration


Dive into the Inge Koch's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kanta Naito

University of Adelaide

View shared research outputs
Top Co-Authors

Avatar

Arcot Sowmya

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Junmei Jing

Australian National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mithun Prasad

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge