Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Martin Hagmüller is active.

Publication


Featured researches published by Martin Hagmüller.


non linear speech processing | 2006

Speaker verification security improvement by means of speech watermarking

Marcos Faundez-Zanuy; Martin Hagmüller; Gernot Kubin

This paper presents a security enhanced speaker verification system based on speech signal watermarking. Our proposed system can detect several situations where a playback speech, a synthetically generated speech, a manipulated speech signal or a hacker trying to imitate the speech is fooling the biometric system. In addition, we have generated a watermarked speech signals database from which we have obtained relevant conclusions about the influence of this technique on speaker verification rates. Mainly we have checked that biometrics and watermarking can coexist simultaneously minimizing the mutual effects. Experimental results show that the proposed speech watermarking system can suffer A-law coding with a message error rate lower than 2x10^-^4 for SWR higher than 20dB at a message rate of 48bits/s.


Pattern Recognition | 2007

Speaker identification security improvement by means of speech watermarking

Marcos Faundez-Zanuy; Martin Hagmüller; Gernot Kubin

This paper presents a security enhanced speaker identification system based on speech signal watermarking. Our proposed system can detect several situations where a playback speech, a synthetically generated speech, or a hacker trying to imitate the speech is fooling the biometric system. It is also suitable for forensic experts, who sometimes have to demonstrate in front of a court that a digital recording has neither been manipulated nor edited. In addition, we demonstrate that this watermark can coexist simultaneously with biometric speaker identification based on Gaussian mixture models (GMM), minimizing the mutual effects.


non linear speech processing | 2006

Poincaré pitch marks

Martin Hagmüller; Gernot Kubin

A novel approach for pitch mark determination based on dynamical systems theory is presented. Pitch marks are used for speech analysis and modification, such as jitter measurement or time scale modification. The algorithm works in a pseudo-state space and calculates the Poincare section at a chosen point in the state space. Pitch marks are then found at the crossing of the trajectories with the Poincare plane of the initial point. The procedure is performed frame-wise to account for the changing dynamics of the speech production system. The system is intended for real-time use, so higher-level processing extending over more than one frame is not used. The processing delay is, therefore, limited to one frame. The algorithm is evaluated by calculating an average pitch value for 10ms frames and using a small database with pitch measurements from a laryngograph signal. The results are compared to a reference correlation-based pitch mark algorithm. The performance of the proposed algorithm is comparable to the reference algorithm, but in contrast correctly follows the pitch marks of diplophonic voices.


Journal of Forensic Sciences | 2010

Speech watermarking: an approach for the forensic analysis of digital telephonic recordings.

Marcos Faundez-Zanuy; Jose J. Lucena‐Molina; Martin Hagmüller

Abstract:  In this article, the authors discuss the problem of forensic authentication of digital audio recordings. Although forensic audio has been addressed in several articles, the existing approaches are focused on analog magnetic recordings, which are less prevalent because of the large amount of digital recorders available on the market (optical, solid state, hard disks, etc.). An approach based on digital signal processing that consists of spread spectrum techniques for speech watermarking is presented. This approach presents the advantage that the authentication is based on the signal itself rather than the recording format. Thus, it is valid for usual recording devices in police‐controlled telephone intercepts. In addition, our proposal allows for the introduction of relevant information such as the recording date and time and all the relevant data (this is not always possible with classical systems). Our experimental results reveal that the speech watermarking procedure does not interfere in a significant way with the posterior forensic speaker identification.


Journal of Voice | 2017

Towards Objective Voice Assessment: The Diplophonia Diagram.

Philipp Aichinger; Imme Roesner; Berit Schneider-Stickler; Matthias Leonhard; Doris-Maria Denk-Linnert; Wolfgang Bigenzahn; Anna Katharina Fuchs; Martin Hagmüller; Gernot Kubin

OBJECTIVES Diplophonia is an often misinterpreted symptom of disordered voice, and needs objectification. An audio signal processing algorithm for the detection of diplophonia is proposed. Diplophonia is produced by two distinct oscillators, which yield a profound physiological interpretation. The algorithms performance is compared with the clinical standard parameter degree of subharmonics (DSH). STUDY DESIGN This is a prospective study. METHODS A total of 50 dysphonic subjects with (28 with diplophonia and 22 without diplophonia) and 30 subjects with euphonia were included in the study. From each subject, up to five sustained phonations were recorded during rigid telescopic high-speed video laryngoscopy. A total of 185 phonations were split up into 285 analysis segments of homogeneous voice qualities. In accordance to the clinical group allocation, the considered segmental voice qualities were (1) diplophonic, (2) dysphonic without diplophonia, and (3) euphonic. The Diplophonia Diagram is a scatter plot that relates the one-oscillator synthesis quality (SQ1) to the two-oscillator synthesis quality (SQ2). Multinomial logistic regression is used to distinguish between diplophonic and nondiplophonic segments. RESULTS Diplophonic segments can be well distinguished from nondiplophonic segments in the Diplophonia Diagram because two-oscillator synthesis is more appropriate for imitating diplophonic signals than one-oscillator synthesis. The detection of diplophonia using the Diplophonia Diagram clearly outperforms the DSH by means of positive likelihood ratios (56.8 versus 3.6). CONCLUSIONS The diagnostic accuracy of the newly proposed method for detecting diplophonia is superior to the DSH approach, which should be taken into account for future clinical and scientific work.


international conference on acoustics, speech, and signal processing | 2013

Double pitch marks in diplophonic voice

Philipp Aichinger; Berit Schneider-Stickler; Wolfgang Bigenzahn; Anna Katharina Fuchs; Bernhard C. Geiger; Martin Hagmüller; Gernot Kubin

Determination of pitch marks (PMs) is necessary in clinical voice assessment for the measurement of fundamental frequency (F0) and perturbation. In voice with ambiguous F0, PM determination is crucial, and its validity needs special attention. The study at hand proposes a new approach for PM determination from Laryngeal High-Speed Videos (LHSVs), rather than from the audio signal. In this novel approach, double PMs are extracted from a diplophonic voice sample, in order to account for ambiguous F0s. The LHSVs are spectrally analyzed in order to extract dominant oscillation frequencies of the vocal folds. Unit pulse trains with these frequencies are created as PM trains and compensated for the phase shift. The PMs are compared to Praats single audio PMs. It is shown that double PMs are needed in order to analyze diplophonic voice, because traditional single PMs do not explain its double-source characteristic.


ieee automatic speech recognition and understanding workshop | 2015

Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results

Lukas Pfeifenberger; Tobias Schrank; Matthias Zöhrer; Martin Hagmüller; Franz Pernkopf

Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively.


Folia Phoniatrica Et Logopaedica | 2016

Diplophonia Disturbs Jitter and Shimmer Measurement

Philipp Aichinger; Martin Hagmüller; Imme Roesner; Wolfgang Bigenzahn; Berit Schneider-Stickler; Jean Schoentgen

Objectives: The aims of this study are to investigate the effects of diplophonia on jitter and shimmer and to identify measurement limitations with regard to material selection and clinical interpretation. Materials and Methods: Four hundred and ninety-eight audio samples of sustained phonations were analyzed. The audio samples were assessed for the grade of hoarseness and the presence of diplophonia. Jitter and shimmer were reported with regard to perceptual ratings. We investigated cycle marker positions exemplarily and qualitatively to understand their implications for perturbation measurements. Results: Medians of jitter and shimmer were higher for diplophonic voices than for nondiplophonic voices with equal grades of hoarseness. The variance of jitter for moderately dysphonic voices was larger than the variance observed in a corpus from which diplophonic samples had been discarded. The positions of cycle markers in diplophonic voices did not match the positions of the pulses, indicating that the validity of jitter and shimmer values for these voices were questionable. Conclusion: Diplophonia biases the reporting of dysphonia severity via perturbation measures, and their validity is questionable for these voices. In addition, diplophonia is an influential source of variance in jitter measurements. Thus, diplophonic fragments of voice samples should be excluded prior to perturbation analysis.


international conference on acoustics, speech, and signal processing | 2015

Adaptive differential microphone arrays used as a front-end for an automatic speech recognition system

Elmar Messner; Hannes Pessentheiner; Juan Andres Morales-Cordovilla; Martin Hagmüller

For automatic speech recognition (ASR) systems it is important that the input signal mainly contains the desired speech signal. For a compact arrangement, differential microphone arrays (DMAs) are a suitable choice as front-end of ASR systems. The limiting factor of DMAs is the white noise gain, which can be treated by the minimum norm solution (MNS). In this paper, we introduce the first time the MNS to adaptive differential microphone arrays. We compare its effect to the conventional implementation when used as front-end of an ASR system. In experiments we show that the proposed algorithms consistently increase the word accuracy up to 50% relative to their conventional implementations. For PESQ we achieve an improvement of up to 0.1 points.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

Localization and characterization of multiple harmonic sources

Hannes Pessentheiner; Martin Hagmüller; Gernot Kubin

We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1

Collaboration


Dive into the Martin Hagmüller's collaboration.

Top Co-Authors

Avatar

Gernot Kubin

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Philipp Aichinger

Medical University of Vienna

View shared research outputs
Top Co-Authors

Avatar

Franz Pernkopf

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar

Anna Katharina Fuchs

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean Schoentgen

Université libre de Bruxelles

View shared research outputs
Top Co-Authors

Avatar

Hannes Pessentheiner

Graz University of Technology

View shared research outputs
Top Co-Authors

Avatar

Imme Roesner

Medical University of Vienna

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge