Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Milan Sigmund is active.

Publication


Featured researches published by Milan Sigmund.


Speech Communication | 2012

Impact of vocal effort variability on automatic speech recognition

Petr Zelinka; Milan Sigmund; Jiri Schimmel

The impact of changes in a speakers vocal effort on the performance of automatic speech recognition has largely been overlooked by researchers and virtually no speech resources exist for the development and testing of speech recognizers at all vocal effort levels. This study deals with speech properties in the whole range of vocal modes - whispering, soft speech, normal speech, loud speech, and shouting. Fundamental acoustic and phonetic changes are documented. The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the systems robustness are tested. The proposed multiple model framework approach reaches a 50% relative reduction of word error rate compared to the baseline system. A new specialized speech database, BUT-VE1, is presented, which contains speech recordings of 13 speakers at 5 vocal effort levels with manual phonetic segmentation and sound pressure level calibration.


nordic signal processing symposium | 2006

Introducing the Database ExamStress for Speech under Stress

Milan Sigmund

This paper briefly describes a newly created database of Czech speech under realistic stressed conditions and presents some selected results achieved by analyzing stressed speech. The motivation for creating a new database was the non-existence of stressed speech corpora for Czech or any other Slavic language. The database contains read and conversational speech both in neutral and stressed state of 31 male speakers. The stressed speech was recorded during final oral examinations at the Brno University of Technology. In order to quantify the stress of individual speakers the speakers heart rate was also measured and recorded simultaneously with the speech. Experiments conducted using this database show that the speech corpus can be used for development and evaluation of specific algorithms by identifying the extent of stress speakers have by analyzing their voices only


international test conference | 2011

ANALYSIS OF VOICED SPEECH EXCITATION DUE TO ALCOHOL INTOXICATION

Milan Sigmund; Petr Zelinka

A significant part of information carried in speech signal refers to the speaker. This paper deals with investigating alcohol intoxication based on analyzing recorded speech signal. Speech changes resulting from alcohol intoxication were investigated in the waveform of glottal pulses estimated from speech by applying the Iterative Adaptive Inverse Filtering (IAIF). Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 196o. At this alcohol level, the associated negative events influence professional performance and may involve fatal accidents in some cases. Via analyzing the speech signal, the speaker could be automatically monitored without their active co-operation. For use in our experiments, a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created. http://dx.doi.org/10.5755/j01.itc.40.2.429


Journal of Intelligent and Robotic Systems | 2000

Transformations between Pictures from 2D to 3D

Milan Sigmund; Pavel Novotny

This paper describes programs for 3-dimensional engraving. The programs use raster or vector images to create a 3D model and, subsequently, convert this model into a sequence of control commands for 3D engraving machines. Three programs have been developed. A program for engraving general 3D surfaces from grey-scale images, a program for preparing these grey-scale images from patterns and vector images, and a program for fast 2D engraving. A simple and fast preparation of the 3D model, a user-friendly environment, and small hardware requirements were the principal goals.


international test conference | 2013

Statistical Analysis of Fundamental Frequency Based Features in Speech under Stress

Milan Sigmund

A significant part of the non-linguistic information carried in speech refers to the speaker and his/her internal state. This study investigates sixteen features based on fundamental frequency of speech F0 in order to detect stress in speakers. The most effective features resulting from experiments are presented here. The total frequency ranges of F0 across specific short-time speech segments created by two or three frames having stable F0 values were evaluated as the best features for speaker-independent stress detection. F0 contours were computed frame-by-frame using an optimized autocorrelation function. In our experiments, we used utterances spoken by 14 male speakers and taken from own database of speech under real psychological stress. DOI: http://dx.doi.org/10.5755/j01.itc.42.3.3895


Archive | 2008

Automatic Speaker Recognition by Speech Signal

Milan Sigmund

Acoustical communication is one of the fundamental prerequisites for the existence of human society. Textual language has become extremely important in modern life, but speech has dimensions of richness that text cannot approximate. From speech alone, fairly accurate guesses can be made as to whether the speaker is male or female, adult or child. In addition, experts can extract from speech information regarding e.g. the speaker’s state of mind. As computer power increased and knowledge about speech signals improved, research of speech processing became aimed at automated systems for many purposes. Speaker recognition is the complement of speech recognition. Both techniques use similar methods of speech signal processing. In automatic speech recognition, the speech processing approach tries to extract linguistic information from the speech signal to the exclusion of personal information. Conversely, speaker recognition is focused on the characteristics unique to the individual, disregarding the current word spoken. The uniqueness of an individual’s voice is a consequence of both the physical features of the person vocal tract and the person mental ability to control the muscles in the vocal tract. An ideal speaker recognition system would use only physical features to characterize speakers, since these features cannot be easily changed. However, it is obvious that the physical features as vocal tract dimensions of an unknown speaker cannot be simply measured. Thus, numerical values for physical features or parameters would have to be derived from digital signal processing parameters extracted from the speech signal. Suppose that vocal tracts could be effectively represented by 10 independent physical features, with each feature taking on one of 10 discrete values. In this case, 1010 individuals in the population (i.e., 10 billion) could be distinguished whereas today’s world population amounts to approximately 7 billion individuals. People can reliably identify familiar voices. About 2-3 seconds of speech is sufficient to identify a voice, although performance decreases for unfamiliar voices. One review of human speaker recognition (Lancker et al., 1985) notes that many studies of 8-10 speakers (work colleagues) yield in excess of 97% accuracy if a sentence or more of the test speech is heard. Performance falls to about 54% when duration is shorter than 1 second and/or distorted e.g., severely highpass or lowpass filtered. Performance also falls significantly if training and test utterances are processed through different transmission systems. A study


international workshop on machine learning for signal processing | 2010

Automatic vocal effort detection for reliable speech recognition

Petr Zelinka; Milan Sigmund

This paper describes an approach for enhancing the robustness of isolated words recognizer by extending its flexibility in the domain of speakers variable vocal effort level. An analysis of spectral properties of spoken vowels in four various speaking modes (whispering, soft, normal, and loud) confirm consistent spectral tilt changes. Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.


international conference radioelektronika | 2015

Acoustical detection of gunshots

Martin Hrabina; Milan Sigmund

This paper describes development of reliable gunshot detection system with emphasis on low power consumption for use in counter-poacher devices primarily protecting elephants in Africa. Intended system will work as a binary detector of gunfire without further classification of used fire-arm. Dominance of right gunshot detection over false alarms is crucial. Proposed recognition system is based on linear predictive coefficients, correlation against template and comparison of spectral energy in sub-bands.


2016 SAI Computing Conference (SAI) | 2016

Implementation of developed gunshot detection algorithm on TMS320C6713 processor

Martin Hrabina; Milan Sigmund

This paper deals with implementing a real-time gunshot detection algorithm on the digital signal processor TMS320C6713. The developed algorithm uses 3 linear predictive coding coefficients, energy in 3 spectral bands, and frame correlation. The audio input signal is continuously processed and signal frames considered to contain a gunshot are shortly signalized by an LED indicator. Experimental results achieved 82% correctly detected gunshots and 3% false alarms. The system should increase protection of wild elephants in Africa.


international conference radioelektronika | 2010

Towards reliable speech recognition in operating room noise environment

Petr Zelinka; Milan Sigmund

This paper describes several practical steps for accurate statistical modeling of a known acoustical noise environment to attain good performance of a small vocabulary speech recognizer for isolated words based on whole-word hidden Markov models. Hierarchical segmentation based on Bayes information criterion and k-means clustering followed by split-merge Gaussian mixture model training were utilized for noise model estimation. Parallel model combination technique produces final noise-corrupted speech models for a small group of speakers. Experiments were carried out on a real operating room ambient noise recorded during a neurosurgery at the University Hospital in Marburg.

Collaboration


Dive into the Milan Sigmund's collaboration.

Top Co-Authors

Avatar

Miroslav Stanek

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Petr Zelinka

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Lubomir Brancik

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Martin Hrabina

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ales Prokes

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

A. Kuiper

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Edita Kolarova

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jiri Schimmel

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pavel Sala

Brno University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge