Nico Görnitz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nico Görnitz is active.

Explore More

Publication

Featured researches published by Nico Görnitz.

Journal of Artificial Intelligence Research | 2013

Toward supervised anomaly detection

Nico Görnitz; Marius Kloft; Konrad Rieck; Ulf Brefeld

Anomaly detection is being regarded as an unsupervised learning task as anomalies stem from adversarial or unlikely events with unknown distributions. However, the predictive performance of purely unsupervised anomaly detection often fails to match the required detection rates in many tasks and there exists a need for labeled data to guide the model generation. Our first contribution shows that classical semi-supervised approaches, originating from a supervised classifier, are inappropriate and hardly detect new and unknown anomalies. We argue that semi-supervised anomaly detection needs to ground on the unsupervised learning paradigm and devise a novel algorithm that meets this requirement. Although being intrinsically non-convex, we further show that the optimization problem has a convex equivalent under relatively mild assumptions. Additionally, we propose an active learning strategy to automatically filter candidates for labeling. In an empirical study on network intrusion detection data, we observe that the proposed learning methodology requires much less labeled data than the state-of-the-art, while achieving higher detection accuracies.

european conference on machine learning | 2009

Active and semi-supervised data domain description

Nico Görnitz; Marius Kloft; Ulf Brefeld

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Bioinformatics | 2014

Oqtans: The RNA-seq Workbench in the Cloud for Complete and Reproducible Quantitative Transcriptome Analysis

Vipin T. Sreedharan; Sebastian J. Schultheiss; Géraldine Jean; André Kahles; Regina Bohnert; Philipp Drewe; Pramod Kaushik Mudrakarta; Nico Görnitz; Georg Zeller; Gunnar Rätsch

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan. Contact: [email protected], [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

IEEE Transactions on Neural Networks | 2014

Efficient Algorithms for Exact Inference in Sequence Labeling SVMs

Alexander Bauer; Nico Görnitz; Franziska Biegler; Klaus-Robert Müller; Marius Kloft

The task of structured output prediction deals with learning general functional dependencies between arbitrary input and output spaces. In this context, two loss-sensitive formulations for maximum-margin training have been proposed in the literature, which are referred to as margin and slack rescaling, respectively. The latter is believed to be more accurate and easier to handle. Nevertheless, it is not popular due to the lack of known efficient inference algorithms; therefore, margin rescaling - which requires a similar type of inference as normal structured prediction - is the most often used approach. Focusing on the task of label sequence learning, we here define a general framework that can handle a large class of inference problems based on Hamming-like loss functions and the concept of decomposability for the underlying joint feature map. In particular, we present an efficient generic algorithm that can handle both rescaling approaches and is guaranteed to find an optimal solution in polynomial time.

european conference on machine learning | 2012

Efficient training of graph-regularized multitask SVMs

Christian Widmer; Marius Kloft; Nico Görnitz; Gunnar Rätsch

We present an optimization framework for graph-regularized multi-task SVMs based on the primal formulation of the problem. Previous approaches employ a so-called multi-task kernel (MTK) and thus are inapplicable when the numbers of training examples n is large (typically n<20,000, even for just a few tasks). In this paper, we present a primal optimization criterion, allowing for general loss functions, and derive its dual representation. Building on the work of Hsieh et al. [1,2], we derive an algorithm for optimizing the large-margin objective and prove its convergence. Our computational experiments show a speedup of up to three orders of magnitude over LibSVM and SVMLight for several standard benchmarks as well as challenging data sets from the application domain of computational biology. Combining our optimization methodology with the COFFIN large-scale learning framework [3], we are able to train a multi-task SVM using over 1,000,000 training points stemming from 4 different tasks. An efficient C++ implementation of our algorithm is being made publicly available as a part of the SHOGUN machine learning toolbox [4].

BMC Bioinformatics | 2011

Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data

Sebastian J. Schultheiss; Géraldine Jean; Jonas Behr; Regina Bohnert; Philipp Drewe; Nico Görnitz; André Kahles; Pramod Mudrakarta; Vipin T. Sreedharan; Georg Zeller; Gunnar Rätsch

First published by BioMed Central: Schultheiss, Sebastian J.; Jean, Geraldine; Behr, Jonas; Bohnert, Regina; Drewe, Philipp; Gornitz, Nico; Kahles, Andre; Mudrakarta, Pramod; Sreedharan, Vipin T.; Zeller, Georg; Ratsch, Gunnar: Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data - In: BMC Bioinformatics. - ISSN 1471-2105 (online). - 12 (2011), suppl. 11, art. A7. - doi:10.1186/1471-2105-12-S11-A7.

international conference on computational linguistics | 2014

An Off-the-shelf Approach to Authorship Attribution

Jamal Abdul Nasir; Nico Görnitz; Ulf Brefeld

Authorship detection is a challenging task due to many design choices the user has to decide on. The performance highly depends on the right set of features, the amount of data, in-sample vs. out-of-sample settings, and profile- vs. instance-based approaches. So far, the variety of combinations renders off-the-shelf methods for authorship detection inappropriate. We propose a novel and generally deployable method that does not share these limitations. We treat authorship attribution as an anomaly detection problem where author regions are learned in feature space. The choice of the right feature space for a given task is identified automatically by representing the optimal solution as a linear mixture of multiple kernel functions (MKL). Our approach allows to include labelled as well as unlabelled examples to remedy the in-sample and out-of-sample problems. Empirically, we observe our proposed novel technique either to be better or on par with baseline competitors. However, our method relieves the user from critical design choices (e.g., feature set) and can therefore be used as an off-the-shelf method for authorship attribution.

european conference on machine learning | 2015

Opening the Black Box: revealing interpretable sequence motifs in kernel-based learning algorithms

Marina M C Vidovic; Nico Görnitz; Klaus-Robert Müller; Gunnar Rätsch; Marius Kloft

This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences--or motifs--truly underlying the machines predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even difficult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the efficacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale human splice site data set from the domain of computational biology.

NeuroImage | 2015

Extracting latent brain states--Towards true labels in cognitive neuroscience experiments.

Anne K. Porbadnigk; Nico Görnitz; Claudia Sannelli; Alexander Binder; Mikio L. Braun; Marius Kloft; Klaus-Robert Müller

Neuroscientific data is typically analyzed based on the behavioral response of the participant. However, the errors made may or may not be in line with the neural processing. In particular in experiments with time pressure or studies where the threshold of perception is measured, the error distribution deviates from uniformity due to the structure in the underlying experimental set-up. When we base our analysis on the behavioral labels as usually done, then we ignore this problem of systematic and structured (non-uniform) label noise and are likely to arrive at wrong conclusions in our data analysis. This paper contributes a remedy to this important scenario: we present a novel approach for a) measuring label noise and b) removing structured label noise. We demonstrate its usefulness for EEG data analysis using a standard d2 test for visual attention (N=20 participants).

Journal of computing science and engineering | 2013

Decoding Brain States during Auditory Perception by Supervising Unsupervised Learning

Anne K. Porbadnigk; Nico Görnitz; Marius Kloft; Klaus-Robert Müller

The last years have seen a rise of interest in using electroencephalography-based brain computer interfacing methodology for investigating non-medical questions, beyond the purpose of communication and control. One of these novel applications is to examine how signal quality is being processed neurally, which is of particular interest for industry, besides providing neuroscientific insights. As for most behavioral experiments in the neurosciences, the assessment of a given stimulus by a subject is required. Based on an EEG study on speech quality of phonemes, we will first discuss the information contained in the neural correlate of this judgement. Typically, this is done by analyzing the data along behavioral responses/labels. However, participants in such complex experiments often guess at the threshold of perception. This leads to labels that are only partly correct, and oftentimes random, which is a problematic scenario for using supervised learning. Therefore, we propose a novel supervised-unsupervised learning scheme, which aims to differentiate true labels from random ones in a data-driven way. We show that this approach provides a more crisp view of the brain states that experimenters are looking for, besides discovering additional brain states to which the classical analysis is blind.

Explore More