Thorsten Bernholt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thorsten Bernholt is active.

Explore More

Publication

Featured researches published by Thorsten Bernholt.

Bioinformatics | 2007

Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

Robin Nunkesser; Thorsten Bernholt; Holger Schwender; Katja Ickstadt; Ingo Wegener

MOTIVATION Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this article, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS cannot only be used for feature selection, but can also be employed for discrimination. RESULTS In an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several 10 SNPs, but can also be employed to analyze whole-genome data. AVAILABILITY Software can be downloaded from http://ls2-www.cs.uni-dortmund.de/~nunkesser/#Software

Statistics and Computing | 2006

Modified repeated median filters

Thorsten Bernholt; Roland Fried; Ursula Gather; Ingo Wegener

We discuss moving window techniques for fast extraction of a signal composed of monotonic trends and abrupt shifts from a noisy time series with irrelevant spikes. Running medians remove spikes and preserve shifts, but they deteriorate in trend periods. Modified trimmed mean filters use a robust scale estimate such as the median absolute deviation about the median (MAD) to select an adaptive amount of trimming. Application of robust regression, particularly of the repeated median, has been suggested for improving upon the median in trend periods. We combine these ideas and construct modified filters based on the repeated median offering better shift preservation. All these filters are compared w.r.t. fundamental analytical properties and in basic data situations. An algorithm for the update of the MAD running in time O(log n) for window width n is presented as well.

Technical reports | 2006

Robust Estimators are Hard to Compute

Thorsten Bernholt

In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem. Robustness means that the estimation is not or only slightly affected by outliers in the data. In this paper, it is shown that the following robust estimators are hard to compute: LMS, LQS, LTS, LTA, MCD, MVE, Constrained M estimator, Projection Depth (PD) and Stahel-Donoho. In addition, a data set is presented such that the ltsReg-procedure of R has probability less than 0.0001 of finding a correct answer. Furthermore, it is described, how to design new robust estimators.

international conference on computational science and its applications | 2005

Computing the least median of squares estimator in time O ( n d )

Thorsten Bernholt

In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem, i. e., an estimation that is not or only slightly affected by outliers in the data. In this paper we will consider the least median of squares (LMS) estimator. For n points in d dimensions we describe a randomized algorithm for LMS running in O(nd) time and O(n) space, for d fixed, and in time O(d3 (2n)d) and O(dn) space, for arbitrary d.

symposium on computational geometry | 2007

A geometric framework for solving subsequence problems in computational biology efficiently

Thorsten Bernholt; Friedrich Eisenbrand; Thomas Hofmeister

In this paper, we introduce the notion of a constrained Minkowski sumwhich for two (finite) point-sets P,Q⊆ R2 and a set of k inequalities Ax≥ b is defined as the point-set (P ⊕ Q)Ax≥ b= x = p+q | ∈ P, q ∈ Q, , Ax ≥ b. We show that typical subsequenceproblems from computational biology can be solved by computing a setcontaining the vertices of the convex hull of an appropriatelyconstrained Minkowski sum. We provide an algorithm for computing such a setwith running time O(N log N), where N=|P|+|Q| if k is fixed. For the special case (P⊕ Q)x1≥ β, where P and Q consistof points with integer x1-coordinates whose absolute values arebounded by O(N), we even achieve a linear running time O(N). Wethereby obtain a linear running time for many subsequence problemsfrom the literature and improve upon the best known running times forsome of them.The main advantage of the presented approach is that it provides a generalframework within which a broad variety of subsequence problems canbe modeled and solved.This includes objective functions and constraintswhich are even more complexthan the ones considered before.

symposium on computational geometry | 2009

Constrained Minkowski Sums: A Geometric Framework for Solving Interval Problems in Computational Biology Efficiently

Thorsten Bernholt; Friedrich Eisenbrand; Thomas Hofmeister

AbstractIn this paper, we introduce the notion of a constrained Minkowski sum: for two (finite) point-sets P,Q⊆ℝ2 and a set of k inequalities Ax≥b, it is defined as the point-set (P⊕Q)Ax≥b={x=p+q∣p∈P,q∈Q,Ax≥b}. We show that typical interval problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O(Nlog N), where N=|P|+|Q| if k is fixed. For the special case

Computational Statistics & Data Analysis | 2007

Computing the least quartile difference estimator in the plane

Thorsten Bernholt; Robin Nunkesser; Karen Schettlinger

(P\oplus Q)_{x_{1}\geq \beta}

latin american symposium on theoretical informatics | 2006

An algorithm for a generalized maximum subsequence problem

Thorsten Bernholt; Thomas Hofmeister

where P and Q consist of points with integer x1-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many interval problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of interval problems can be modeled and solved.

Informatik Spektrum | 2002

Komplexitätstheorie, effiziente Algorithmen und die Bundesliga

Thorsten Bernholt; Alexander Gülich; Thomas Hofmeister; Niels Schmitt; Ingo Wegener

A common problem in linear regression is that largely aberrant values can strongly influence the results. The least quartile difference (LQD) regression estimator is highly robust, since it can resist up to almost 50% largely deviant data values without becoming extremely biased. Additionally, it shows good behavior on Gaussian data – in contrast to many other robust regression methods. However, the LQD is not widely used yet due to the high computational effort needed when using common algorithms, e.g. the subset algorithm of Rousseeuw and Leroy. For computing the LQD estimator for n data points in the plane, we propose a randomized algorithm with expected running time O(n2 log2 n) and an approximation algorithm with a running time of roughly O(n2 log n). It can be expected that the practical relevance of the LQD estimator will strongly increase thereby.

mathematical foundations of computer science | 1999

Football Elimination Is Hard to Decide Under the 3-Point-Rule

Thorsten Bernholt; Alexander Gülich; Thomas Hofmeister; Niels Schmitt

We consider a generalization of the maximum subsequence problem. Given an array a1,...,an of real numbers, the generalized problem consists in finding an interval [i,j] such that the length and the sum of the subsequence ai,...,aj maximize a given quasiconvex function f. Problems of this type occur, e.g., in bioinformatics. We show that the generalized problem can be solved in time O(n log n). As an example, we show how the so-called multiresolution criteria problem can be solved in time O(n log n).

Explore More