Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ewald Enzinger is active.

Publication


Featured researches published by Ewald Enzinger.


Speech Communication | 2016

Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) Introduction

Geoffrey Stewart Morrison; Ewald Enzinger

This paper introduces an evaluation of forensic voice comparison systems.It includes the rules for the evaluation.The training and test data reflect the conditions of a real case.Operational and research laboratories are invited to participate.Results will be published in a Virtual Special Issue of Speech Communication. There is increasing pressure on forensic laboratories to validate the performance of forensic analysis systems before they are used to assess strength of evidence for presentation in court. Different forensic voice comparison systems may use different approaches, and even among systems using the same general approach there can be substantial differences in operational details. From case to case, the relevant population, speaking styles, and recording conditions can be highly variable, but it is common to have relatively poor recording conditions and mismatches in speaking style and recording conditions between the known- and questioned-speaker recordings. In order to validate a system intended for use in casework, a forensic laboratory needs to evaluate the degree of validity and reliability of the system under forensically realistic conditions. The present paper is an introduction to a Virtual Special Issue consisting of papers reporting on the results of testing forensic voice comparison systems under conditions reflecting those of an actual forensic voice comparison case. A set of training and test data representative of the relevant population and reflecting the conditions of this particular case has been released, and operational and research laboratories are invited to use these data to train and test their systems. The present paper includes the rules for the evaluation and a description of the evaluation metrics and graphics to be used. The name of the evaluation is: forensic_eval_01. Display Omitted


international conference on acoustics, speech, and signal processing | 2011

A logarithmic based pole-zero vocal tract model estimation for speaker verification

Ewald Enzinger; Peter Balazs; Damián Marelli; Timo Becker

In this paper we investigate the use of formant and antiformant measurements of nasal consonants for speaker verification. The features are obtained using a pole-zero vocal tract model estimate optimized by minimizing a logarithmic criterion which is motivated by the perception of amplitude by the human auditory system. A GMM-UBM approach is used for performing speaker comparisons within the likelihood-ratio framework. Results are compared with systems based on Mel Frequency Cepstral Coefficients (MFCCs) as well as formant center frequencies and bandwidths obtained using the Snack Toolkit. The formant and anti-formant based system attains comparable results to the MFCC system and outperforms the formant-based approach while offering a more straightforward interpretation in terms of a physical speech production model.


Forensic Science International | 2016

Use of relevant data, quantitative measurements, and statistical models to calculate a likelihood ratio for a Chinese forensic voice comparison case involving two sisters

Cuiling Zhang; Geoffrey Stewart Morrison; Ewald Enzinger

Currently, the standard approach to forensic voice comparison in China is the aural-spectrographic approach. Internationally, this approach has been the subject of much criticism. The present paper describes what we believe is the first forensic voice comparison analysis presented to a court in China in which a numeric likelihood ratio was calculated using relevant data, quantitative measurements, and statistical models, and in which the validity and reliability of the analytical procedures were empirically tested under conditions reflecting those of the case under investigation. The hypotheses addressed were whether the female speaker on a recording of a mobile telephone conversation was a particular individual, or whether it was that individuals younger sister. Known speaker recordings of both these individuals were recorded using the same mobile telephone as had been used to record the questioned-speaker recording, and customised software was written to perform the acoustic and statistical analyses.


Science & Justice | 2016

Refining the relevant population in forensic voice comparison - A response to Hicks et alii (2015) The importance of distinguishing information from evidence/observations when formulating propositions.

Geoffrey Stewart Morrison; Ewald Enzinger; Cuiling Zhang

Hicks et alii [Sci. Just. 55 (2015) 520-525. http://dx.doi.org/10.1016/j.scijus.2015.06.008] propose that forensic speech scientists not use the accent of the speaker of questioned identity to refine the relevant population. This proposal is based on a lack of understanding of the realities of forensic voice comparison. If it were implemented, it would make data-based forensic voice comparison analysis within the likelihood ratio framework virtually impossible. We argue that it would also lead forensic speech scientists to present invalid unreliable strength of evidence statements, and not allow them to conduct the tests that would make them aware of this problem.


Journal of the Acoustical Society of America | 2013

Fusion of multiple formant-trajectory- and fundamental-frequency-based forensic-voice-comparison systems: Chinese /ei1/, /ai2/, and /iau1/

Cuiling Zhang; Ewald Enzinger

This study investigates the fusion of multiple formant-trajectory- and fundamental-frequency-trajectory-based (f0-trajectory-based) forensic-voice-comparison systems. Each system was based on tokens of a single phoneme: tokens of Chinese /ei1/, /ai2/, and /iau1/ (numbers indicate tones). Human-supervised formant-trajectory and f0-trajectory measurements were made on tokens from a database of recordings of 60 female speakers of Chinese. Discrete cosine transforms (DCT) were fitted to the trajectories and the DCT coefficients used to calculate likelihood ratios via the multivariate kernel density (MVKD) formula. The individual-phoneme systems were fused with each other and with a baseline mel-frequency cepstral-coefficient (MFCC) Gaussian-mixture-model universal-background-model (GMM-UBM). The latter made use of the entire speech-active portion of the recordings. Tests were conducted using high-quality recordings as nominal suspect samples and mobile-to-landline transmitted recordings as nominal offender sa...


conference of the international speech communication association | 2016

Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling Approaches.

Ewald Enzinger

This study compares three statistical models used to calculate likelihood ratios in acoustic-phonetic forensic-voicecomparison systems: Multivariate kernel density, principal component analysis kernel density, and a multivariate normal model. The data were coefficient values obtained from discrete cosine transforms fitted to human-supervised formant-trajectory measurements of tokens of /iau/ from a database of recordings of 60 female speakers of Chinese. Tests were conducted using high-quality recordings as nominal suspect samples and mobileto-landline transmitted recordings as nominal offender samples. Performance was assessed before and after fusion with a baseline automatic mel frequency cepstral coefficient Gaussian mixture model universal background model system. In addition, Monte Carlo simulations were used to compare the output of the statistical models to true likelihood-ratio values calculated on the basis of the distribution specified for a simulated population.


international conference on acoustics, speech, and signal processing | 2014

Bayesian vocal tract model estimates of nasal stops for speaker verification

Ewald Enzinger; Christian H. Kasess

In this paper we report on speaker verification experiments using branched vocal tract model estimates of alveolar nasal (/n/) stops. While the discriminatory potential of nasal acoustics has long been established, their acoustic properties have so far mostly been characterized using spectral features. Here, we used a Bayesian estimation technique to obtain reflection coefficients of a branched-tube model of the combined nasal and oral tract. Parameters were then modeled using probabilistic linear discriminant analysis to calculate likelihood ratios for speaker verification trials. Performance was assessed on normal and high vocal effort speech using high-quality and mobile-telephone-transmitted recordings taken from the German-language Pool2010 corpus. Results are compared with those of systems based on mel-frequency cepstral coefficients (MFCC). Vocal tract parameter based systems outperform MFCC based systems in matched conditions, but lack robustness under mismatch, while being readily interpretable with respect to a physical speech production model.


Journal of the Acoustical Society of America | 2013

Mismatched distances from speakers to telephone in a forensic-voice-comparison case

Ewald Enzinger

In a forensic-voice-comparison (FVC) case, one speaker (A) was talking on a mobile telephone, and another (B) was standing a short distance away. Later, B moved closer to the telephone. Shortly thereafter, there was a section of speech where the identity of the speaker was disputed. All material for training an FVC-system could be extracted from this single recording, but there was a near-far mismatch: Training data for A were near, training data for B were far, and the disputed speech was near. We describe a procedure for addressing the degree of validity and reliability of an FVC system under such conditions, prior to it being applied to the casework recording: Sections of recordings of pairs of speakers of known identity are used to train an A and a B model; multiple other sections from each of the A and B recordings are used as test data; a likelihood ratio is calculated for each test section; and system validity and reliability are assessed. Prior to training and testing, the A and B recordings were ...


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Experiments on using vocal tract estimates of nasal stops for speaker verification

Ewald Enzinger; Christian H. Kasess

Nasal stops have been recognized as an important source of speaker-discriminating features. The nasal cavity is, with the exception of the velar junction, independent of articulatory movements. As the complex nasal structure varies from person to person, features dependent upon nasal acoustics may have low within-speaker and high between-speaker variability. In this study we use a Bayesian estimation technique to obtain reflection coefficients of a branched-tube model of the combined nasal and oral tract. These are then used as parameters in speaker verification experiments. The performance is evaluated on the basis of speakers from the TIMIT corpus as well as the Kiel corpus and is compared with that of a system based on Mel frequency cepstral coefficient (MFCC) features. Fusion of both systems indicates that the two approaches offer complementary information.


Journal of the Acoustical Society of America | 2011

Comparison of human-supervised and fully automatic formant-trajectory measurement for forensic voice comparison

Cuiling Zhang; Felipe Ochoa; Ewald Enzinger; Geoffrey Stewart Morrison

Acoustic–phonetic approaches to forensic voice comparison often include analysis of vowel formants. Such methods depend on human-supervised formant extraction, which is often assumed to be reliable and relatively robust to transmission-channel effects, but requires substantial investment of human labor. Fully automatic formant trackers require less human labor but are usually not considered reliable. This study assesses the variability within and between four human experts and compares the results of human-supervised formant measurement with several fully automatic procedures, both on studio-quality recordings and transmission-channel degraded recordings. Measurements are made of the formant trajectories of /iau/ tokens in a database of recordings of 60 female speakers of Chinese. As well as directly comparing the formant-measurements results, the formant measurements are also used as input to likelihood-ratio forensic-voice-comparison systems, and the validity and reliability of each system is empiricall...

Collaboration


Dive into the Ewald Enzinger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cuiling Zhang

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Christian H. Kasess

Austrian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Felipe Ochoa

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar

Cuiling Zhang

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Peter Balazs

Austrian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wolfgang Kreuzer

Austrian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge