Robin Nunkesser
Technical University of Dortmund
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Robin Nunkesser.
Bioinformatics | 2007
Robin Nunkesser; Thorsten Bernholt; Holger Schwender; Katja Ickstadt; Ingo Wegener
MOTIVATIONnNot individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this article, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS cannot only be used for feature selection, but can also be employed for discrimination.nnnRESULTSnIn an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several 10 SNPs, but can also be employed to analyze whole-genome data.nnnAVAILABILITYnSoftware can be downloaded from http://ls2-www.cs.uni-dortmund.de/~nunkesser/#Software
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011
Carla Chen; Holger Schwender; Jonathan M. Keith; Robin Nunkesser; Kerrie Mengersen; Paula E. Macrossan
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Computational Statistics & Data Analysis | 2010
Robin Nunkesser; Oliver Morell
A drawback of robust statistical techniques is the increased computational effort often needed as compared to non-robust methods. Particularly, robust estimators possessing the exact fit property are NP-hard to compute. This means that-under the widely believed assumption that the computational complexity classes NP and P are not equal-there is no hope to compute exact solutions for large high dimensional data sets. To tackle this problem, search heuristics are used to compute NP-hard estimators in high dimensions. A new evolutionary algorithm that is applicable to different robust estimators is presented. Further, variants of this evolutionary algorithm for selected estimators-most prominently least trimmed squares and least median of squares-are introduced and shown to outperform existing popular search heuristics in difficult data situations. The results increase the applicability of robust methods and underline the usefulness of evolutionary algorithms for computational statistics.
Computational Statistics & Data Analysis | 2009
Robin Nunkesser; Roland Fried; Karen Schettlinger; Ursula Gather
A fast update algorithm for online calculation of the Qn scale estimator is presented. This algorithm allows robust analysis of high-frequency time series in real time. It provides reliable estimates of a time-varying volatility even if many large outliers are present and it offers good efficiency in the case of clean Gaussian data.
Technical reports | 2008
Robin Nunkesser
RFreak is an R package providing a framework for evolutionary computation. By enwrapping the functionality of an evolutionary algorithm kit written in Java, it offers an easy way to do evolutionary computation in R. In addition, application examples where an evolutionary approach is promising in computational statistics are included and described in this paper. The package is thus further supporting the use of evolutionary computation in computational statistics.
international symposium on algorithms and computation | 2005
Robin Nunkesser; Philipp Woelfel
In this paper, the space requirements for the OBDD representation of certain graph classes, specifically cographs, several types of graphs with few P4s, unit interval graphs, interval graphs and bipartite graphs are investigated. Upper and lower bounds are proven for all these graph classes and it is shown that in most (but not all) cases a representation of the graphs by OBDDs is advantageous with respect to space requirements.
Computational Statistics & Data Analysis | 2007
Thorsten Bernholt; Robin Nunkesser; Karen Schettlinger
A common problem in linear regression is that largely aberrant values can strongly influence the results. The least quartile difference (LQD) regression estimator is highly robust, since it can resist up to almost 50% largely deviant data values without becoming extremely biased. Additionally, it shows good behavior on Gaussian data – in contrast to many other robust regression methods. However, the LQD is not widely used yet due to the high computational effort needed when using common algorithms, e.g. the subset algorithm of Rousseeuw and Leroy. For computing the LQD estimator for n data points in the plane, we propose a randomized algorithm with expected running time O(n2 log2 n) and an approximation algorithm with a running time of roughly O(n2 log n). It can be expected that the practical relevance of the LQD estimator will strongly increase thereby.
Archive | 2008
Robin Nunkesser; Karen Schettlinger; Roland Fried
Reliable automatic methods are needed for statistical online monitoring of noisy time series. Application of a robust scale estimator allows to use adaptive thresholds for the detection of outliers and level shifts. We propose a fast update algorithm for the Q n estimator and show by simulations that it leads to more powerful tests than other highly robust scale estimators.Reliable automatic methods are needed for statistical online monitoring of noisy time series. Application of a robust scale estimator allows to use adaptive thresholds for the detection of outliers and level shifts. We propose a fast update algorithm for the Q n estimator and show by simulations that it leads to more powerful tests than other highly robust scale estimators.
genetic and evolutionary computation conference | 2008
Robin Nunkesser
In this paper a Genetic Programming algorithm for genetic association studies is reconsidered. It is shown, that the application field of the algorithm is not restricted to genetic association studies, but that the algorithm can also be applied to logic minimization problems. In the context of multi-valued logic minimization on incompletely specified truth tables it outperforms existing algorithms. In addition, the facilities of the algorithm in the original application field are complemented by new results and experiments. This includes answers to the open questions of how to automatically choose the best individual in the last population and whether crossover is necessary for the algorithm.
Technical reports | 2008
Robin Nunkesser; Oliver Morell
A drawback of robust statistical techniques is the increased computational effort often needed compared to non robust methods. Robust estimators possessing the exact fit property, for example, are NP-hard to compute. This means thatunder the widely believed assumption that the computational complexity classes NP and P are not equalthere is no hope to compute exact solutions for large high dimensional data sets. To tackle this problem, search heuristics are used to compute NP-hard estimators in high dimensions. Here, an evolutionary algorithm that is applicable to different robust estimators is presented. Further, variants of this evolutionary algorithm for selected estimatorsmost prominently least trimmed squares and least median of squaresare introduced and shown to outperform existing popular search heuristics in difficult data situations. The results increase the applicability of robust methods and underline the usefulness of evolutionary computation for computational statistics.