Michael Wohlmayr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Wohlmayr is active.

Explore More

Publication

Featured researches published by Michael Wohlmayr.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models

Michael Wohlmayr; Michael Stark; Franz Pernkopf

We present a simple and efficient feature modeling approach for tracking the pitch of two simultaneously active speakers. We model the spectrogram features of single speakers using Gaussian mixture models in combination with the minimum description length model selection criterion. To obtain a probabilistic representation for the speech mixture spectrogram features of both speakers, we employ the mixture maximization model (MIXMAX) and, as an alternative, a linear interaction model. A factorial hidden Markov model is applied for tracking pitch over time. This statistical model can be used for applications beyond speech, whenever the interaction between individual sources can be represented as MIXMAX or linear model. For tracking, we use the loopy max-sum algorithm, and provide empirical comparisons to exact methods. Furthermore, we discuss a scheduling mechanism of loopy belief propagation for online tracking. We demonstrate experimental results using Mocha-TIMIT as well as data from the speech separation challenge provided by Cooke We show the excellent performance of the proposed method in comparison to a well known multipitch tracking algorithm based on correlogram features. Using speaker-dependent models, the proposed method improves the accuracy of correct speaker assignment, which is important for single-channel speech separation. In particular, we are able to reduce the overall tracking error by 51% relative for the speaker-dependent case. Moreover, we use the estimated pitch trajectories to perform single-channel source separation, and demonstrate the beneficial effect of correct speaker assignment on speech separation performance.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Source–Filter-Based Single-Channel Speech Separation Using Pitch Information

Michael Stark; Michael Wohlmayr; Franz Pernkopf

In this paper, we investigate the source-filter-based approach for single-channel speech separation. We incorporate source-driven aspects by multi-pitch estimation in the model-driven method. For multi-pitch estimation, the factorial HMM is utilized. For modeling the vocal tract filters either vector quantization (VQ) or non-negative matrix factorization are considered. For both methods, the final combination of the source and filter model results in an utterance dependent model that finally enables speaker independent source separation. The contributions of the paper are the multi-pitch tracker, the gain estimation for the VQ based method which accounts for different mixing levels, and a fast approximation for the likelihood computation. Additionally, a linear relationship between pitch tracking performance and speech separation performance is shown.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Maximum Margin Bayesian Network Classifiers

Franz Pernkopf; Michael Wohlmayr; Sebastian Tschiatschek

We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation [1]. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

Pattern Recognition | 2013

Stochastic margin-based structure learning of Bayesian network classifiers

Franz Pernkopf; Michael Wohlmayr

The margin criterion for parameter learning in graphical models gained significant impact over the last years. We use the maximum margin score for discriminatively optimizing the structure of Bayesian network classifiers. Furthermore, greedy hill-climbing and simulated annealing search heuristics are applied to determine the classifier structures. In the experiments, we demonstrate the advantages of maximum margin optimized Bayesian network structures in terms of classification performance compared to traditionally used discriminative structure learning methods. Stochastic simulated annealing requires less score evaluations than greedy heuristics. Additionally, we compare generative and discriminative parameter learning on both generatively and discriminatively structured Bayesian network classifiers. Margin-optimized Bayesian network classifiers achieve similar classification performance as support vector machines. Moreover, missing feature values during classification can be handled by discriminatively optimized Bayesian network classifiers, a case where purely discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

content based multimedia indexing | 2007

Joint Position-Pitch Tracking for 2-Channel Audio

Marian Kepesi; Franz Pernkopf; Michael Wohlmayr

In this paper, a new representation for acoustic source indexing in a multi-source environment is introduced. Each source is represented by a two-dimensional Gaussian-like probability distribution as a function of pitch and direction-of-arrival (DoA). These features of source candidates form the time dependent Position-Pitch (PoPi) plane, which is extracted from 2-channel audio. For demonstration, the time-evolution of Gaussians corresponding to source candidates are tracked by a Viterbi decoder. The Viterbi tracking is extended to multiple paths and pruning of similar paths using the normalized Levenshtein distance is applied.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference

Michael Wohlmayr; Franz Pernkopf

Robustness against noise and interfering audio signals is one of the challenges in speech recognition and audio analysis technology. One avenue to approach this challenge is single-channel multiple-source modeling. Factorial hidden Markov models (FHMMs) are capable of modeling acoustic scenes with multiple sources interacting over time. While these models reach good performance on specific tasks, there are still serious limitations restricting the applicability in many domains. In this paper, we generalize these models and enhance their applicability. In particular, we develop an EM-like iterative adaptation framework which is capable to adapt the model parameters to the specific situation (e.g. actual speakers, gain, acoustic channel, etc.) using only speech mixture data. Currently, source-specific data is required to learn the model. Inference in FHMMs is an essential ingredient for adaptation. We develop efficient approaches based on observation likelihood pruning. Both adaptation and efficient inference are empirically evaluated for the task of multipitch tracking using the GRID corpus.

european conference on machine learning | 2009

On Discriminative Parameter Learning of Bayesian Network Classifiers

Franz Pernkopf; Michael Wohlmayr

We introduce three discriminative parameter learning algorithms for Bayesian network classifiers based on optimizing either the conditional likelihood (CL) or a lower-bound surrogate of the CL. One training procedure is based on the extended Baum-Welch (EBW) algorithm. Similarly, the remaining two approaches iteratively optimize the parameters (initialized to ML) with a 2-step algorithm. In the first step, either the class posterior probabilities or class assignments are determined based on current parameter estimates. Based on these posteriors (class assignment, respectively), the parameters are updated in the second step. We show that one of these algorithms is strongly related to EBW. Additionally, we compare all algorithms to conjugate gradient conditional likelihood (CGCL) parameter optimization [1]. We present classification results for frame- and segment-based phonetic classification and handwritten digit recognition. Discriminative parameter learning shows a significant improvement over generative ML estimation for naive Bayes (NB) and tree augmented naive Bayes (TAN) structures on all data sets. In general, the performance improvement of discriminative parameter learning is large for simple Bayesian network structures which are not optimized for classification.

international conference on acoustics, speech, and signal processing | 2011

Gain-robust multi-pitch tracking using sparse nonnegative matrix factorization

Robert Peharz; Michael Wohlmayr; Franz Pernkopf

While nonnegative matrix factorization (NMF) has successfully been applied for gain-robust multi-pitch detection, a method to track pitch values over time was not provided. We embed NMF-based pitch detection into a recently proposed pitch-tracking system, based on a factorial hidden Markov model (FHMM). The original system models speech spectra with Gaussian mixture models, which is sensitive to a gain mismatch between training and test data. We therefore combine the advantages of these two approaches and derive a gain-adaptive observation model for the FHMM. As training algorithm we use a modification of ℓ0-sparse NMF, which represents the short-time spectrum with scalable basis vectors. In experiments we show that the new approach significantly increases the gain-robustness of the original tracking system.

european conference on machine learning | 2010

Large margin learning of Bayesian classifiers based on Gaussian mixture models

Franz Pernkopf; Michael Wohlmayr

We present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm [1]. We suggest two criteria for discriminative optimization, namely the class conditional likelihood (CL) and the maximization of the margin (MM). In the experiments, we present results for synthetic data, broad phonetic classification, and a remote sensing application. The experiments show that CL-optimized GMMs (CL-GMMs) achieve a lower performance compared to MM-optimized GMMs (MM-GMMs), whereas both discriminative GMMs (DGMMs) perform significantly better than generatively learned GMMs. We also show that the generative discriminatively parameterized GMM classifiers still allow to marginalize over missing features, a case where generative classifiers have an advantage over purely discriminative classifiers such as support vector machines or neural networks.

international conference on acoustics, speech, and signal processing | 2011

Efficient implementation of probabilistic multi-pitch tracking

Michael Wohlmayr; Robert Peharz; Franz Pernkopf

We significantly improve the computational efficiency of a probabilistic approach for multiple pitch tracking. This method is based on a factorial hidden Markov model and two alternative interaction models for magnitude and log-magnitude spectra, respectively. The main computational bottleneck comprises the determination of observation likelihoods. However, we show that up to 99.5% of the smallest likelihood values can be discarded at each time frame without affecting the overall tracking accuracy. For both interaction models, we present a heuristic to efficiently find the largest likelihood values. Experiments on the GRID database show that the proposed methods result in a major speedup without significantly changing tracking accuracy.

Explore More