Bojan Petek
University of Ljubljana
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bojan Petek.
international conference on spoken language processing | 1996
Bojan Petek; Ove Kjeld Andersen; Paul Dalsgaard
The results from applying an improved algorithm to the task of automatic segmentation of spontaneous telephone quality speech are presented, and compared to the results from those resulting from superimposing white noise. Three segmentation algorithms are compared which are all based on variants of the Spectral Variation Function. Experimental results are obtained on the OGI multi language telephone speech corpus (OGLTS). We show that the use of the auditory forward and backward masking effects prior to the SVF computation increases the robustness of the algorithm to white noise. When the average signal to noise ratio (SNR) is decreased to 10 dB, the peak ratio (defined as the ratio of the number of peaks measured at the target over the original SNRs) is increased by 16%, 12%, and 11% for the MFC (Mel Frequency Cepstra), RASTA (Relative Spectral Processing), and the FBDYN (Forward Backward Auditory Masking Dynamic Cepstra) SVF segmentation algorithms, respectively.
international conference on spoken language processing | 1996
Bojan Petek; Rastislav Sustarsic; Smiljana Komar
Slovenian language is among the richest Slavic languages in view of the number of dialects. More than 40 dialects in seven dialect groups can be found on a territory of about 21,000 km/sup 2/ and population of 2 million. Given the richness of influencing factors on the Standard Slovenian language we decided to undertake an acoustic analysis of its contemporary vowel system. Results on vowel formant frequencies and durations using the Key Elemetrics Computerized Speech Lab are presented for the monophthongs uttered by educated speakers of Standard Slovenian. The emphasis of research reported in this paper is on a comparative analysis of durations of the Slovenian vowels. The measurements support the basic division of vowels into long stressed and short unstressed ones, while the further subdivision of stressed vowels into long and short only seems to be valid for one particular vowel, i.e. the open central vowel /a/.
international conference on acoustics, speech, and signal processing | 1993
Bojan Petek; Anuška Ferligoj
The authors show how to exploit additionally a prediction error signal in the context-dependent hidden control neural network (HCNN-CDF) continuous speech recognition (SR) system to increase the discrimination among predictive models of the system. First, by using linear discriminant analysis (LDA) they analyze the squared prediction errors of the system on a SR task. The results clearly show that the residual prediction error signal contains information further to support discrimination among the models of the system. LDA also determines which components of the residual prediction error signal contribute most to discrimination among the models. It is used as a tool to determine the dimensionality of the predicted error vector to be modeled. Second, using the results from discriminant analysis, a new HCNN model which predicts (i.e., computes) the squared prediction error signal from the speech data is proposed. Using these HCNN models, an increased discrimination among predictive models of the system was observed.<<ETX>>
international conference on spoken language processing | 1996
Paul Dalsgaard; Ove Kjeld Andersen; Hanne Hesselager; Bojan Petek
The paper reports on results from ongoing research on language identification (LID) performed on the three languages: American-English, German and Spanish. The speech material used is from the Oregon Graduate Institute Spontaneous Telephone Speech Corpus, OGI-TS. The baseline LID system consists of three parallel phoneme recognisers, each of which are followed by three language modelling modules each characterising the bigram probabilities. The phoneme models used are derived on the basis of the combined speech corpus comprising the three languages. The phonemes are handled differently in analysis performed in two experiments. In the first experiment they are trained and tested language specifically. In the second, they are separated into a number of groups, one of which contains those language independent speech units which are similar enough to be equated across the training languages, the remaining containing the non combinable language dependent phonemes for each of the languages. A data driven technique has been devised to separate the speech sounds contained within the training corpus into these groups. In order to prepare for an optimal separation between the input classes, a linear discriminant analysis is performed on the training speech material. Results from a number of experiments show that average language identification scores of close to 90% can be retained by the LID system presented here, even for a high number of language independent speech units.
Lecture Notes in Computer Science | 2005
Bojan Petek
This tutorial describes a context-dependent Hidden Control Neural Network (HCNN) architecture for large vocabulary continuous speech recognition. Its basic building element, the context-dependent HCNN model, is connectionist network trained to capture dynamics of sub-word units of speech. The described HCNN model belongs to a family of Hidden Markov Model/Multi-Layer Perceptron (HMM/MLP) hybrids, usually referred to as Predictive Neural Networks [1]. The model is trained to generate continuous real-valued output vector predictions as opposed to estimate maximum a posteriori probabilities (MAP) when performing pattern classification. Explicit context-dependent modeling is introduced to refine the baseline HCNN model for continuous speech recognition. The extended HCNN system was initially evaluated on the Conference Registration Database of CMU. On the same task, the HCNN modeling yielded better generalization performance than the Linked Predictive Neural Networks (LPNN). Additionally, several optimizations were possible when implementing the HCNN system. The tutorial concludes with the discussion of future research in the area of predictive connectionist approach to speech recognition.
international conference on acoustics, speech, and signal processing | 2000
Bojan Petek
This paper addresses the equivalence of mapping functions between linked predictive neural networks (LPNN) and hidden control neural networks (HCNN). Two theoretical results supported by Mathematica experiments are presented. First, it is proved that for every HCNN model there exist an equivalent LPNN model. Second, it is shown that the set of input-output functions of an LPNN model is strictly larger than the set of functions of an equivalent HCNN model. Therefore, when using equal architecture of the canonical building blocks (MLPs) for the LPNN and HCNN models, the LPNN models represent a superset of the approximation capabilities of the HCNN models. On the other hand, comparative experiments on the same task showed that the HCNN system outperforms the LPNN system.
non linear speech processing | 2005
Marcos Faundez-Zanuy; Unto Laine; Gernot Kubin; Stephen McLaughlin; W. Bastiaan Kleijn; Gérard Chollet; Bojan Petek; Amir Hussain
This paper summarizes the rationale for proposing the COST-277 “nonlinear speech processing” action, and the work done during these last four years. In addition, future perspectives are described.
international conference on acoustics, speech, and signal processing | 1997
Bojan Petek; Ove Kjeld Andersen; Paul Dalsgaard
Two systems (statistical trajectory models (STM) and continuous density HMMs) utilizing three preprocessing methodologies (MFCC, RASTA and FBDYN) were evaluated on two databases, namely CTIMIT and the corresponding down-sampled TIMIT. Within the bounds of the experimental setup the comparative performance analysis showed that the STM significantly outperforms the HMM system on the CTIMIT database. Specifically, the performance of the STM system was found to be at least 10% better as compared to the one obtained by HMM when the RASTA preprocessing was used. The performance of both systems with FBDYN parametrization was found to be inferior to those using MFCC and RASTA. On the other hand, in low-noise conditions on the TIMIT database FBDYN yielded an improved performance for the HMM system, whereas STM achieved the best results with the MFCC parametrization.
non-linear speech processing | 2003
Robert Modic; Børge Lindberg; Bojan Petek
SPE-Based Selection of Context-Dependent Units for Speech Recognition | 2002
Matjaz Rodman; Bojan Petek; Tom Brøndsted