Bojan Kotnik
University of Maribor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bojan Kotnik.
Signal Processing | 2007
Bojan Kotnik; Zdravko Kacic
This paper presents a noise robust feature extraction algorithm NRFE using joint wavelet packet decomposition (WPD) and autoregressive (AR) modeling of a speech signal. In opposition to the short time Fourier transform (STFT)-based time-frequency signal representation, wavelet packet decomposition can lead to better representation of non-stationary parts of the speech signal (e.g. consonants). The vowels are well described with an AR model as in LPC analysis. The proposed Root-Log compression scheme is used to perform the computation of the wavelet packet parameters. The separately extracted WPD and AR-based parameters are combined together and then transformed with the usage of linear discriminant analysis (LDA) to finally produce a lower dimensional output feature vector. The noise robustness is improved with the application of proposed wavelet-based denoising algorithm with a modified soft thresholding procedure and time-frequency adaptive threshold. The proposed voice activity detector based on a skewness-to-kurtosis ratio of the LPC residual signal is used to effectively perform a frame-dropping principle. The speech recognition results achieved on Aurora 2 and Aurora 3 databases show overall performance improvement of 44.7% and 48.2% relative to the baseline MFCC front-end, respectively.
EURASIP Journal on Advances in Signal Processing | 2005
Damjan Vlaj; Bojan Kotnik; Bogomir Horvat; Zdravko Kacic
This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by relative (G.723.1 VAD), by relative (G.729 VAD), and by relative (DSR VAD) in all SNRs.
International Journal of Speech Technology | 2003
Bojan Kotnik; Damjan Vlaj; Bogomir Horvat
The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recognisers front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.
Journal of Real-time Image Processing | 2017
Marko Kočevar; Bojan Kotnik; Amor Chowdhury; Zdravko Kacic
Fingerprint enhancement is a key step in the Automated Fingerprint Identification System. Because of poor quality of a fingerprint the algorithm for feature extraction may extract features incorrectly, which affects incorrect fingerprint match and consequently inefficient fingerprint-based identity verification. Fingerprint image enhancement techniques are based on enhancement in spatial domain or in frequency domain or in a combination of both. This article presents a block–local normalization algorithm and a technique for speeding up a two-stage algorithm for low-quality fingerprint image enhancement with image learning, which first enhances a fingerprint image in the spatial domain and then in the frequency domain. The normalization technique includes an algorithm with block–local normalization with different block sizes. Experimental results obtained on a public database FVC2004 showed that the presented normalization technique speeds up and improves a state-of-the-art two-stage algorithm, provides better results in comparison with global and local normalization, and positively affects fingerprint image enhancement, and consequently improves the efficiency of the automated fingerprint identification system.
EURASIP Journal on Advances in Signal Processing | 2007
Bojan Kotnik; Zdravko Kacic
This paper concerns the problem of automatic speech recognition in noise-intense and adverse environments. The main goal of the proposed work is the definition, implementation, and evaluation of a novel noise robust speech signal parameterization algorithm. The proposed procedure is based on time-frequency speech signal representation using wavelet packet decomposition. A new modified soft thresholding algorithm based on time-frequency adaptive threshold determination was developed to efficiently reduce the level of additive noise in the input noisy speech signal. A two-stage Gaussian mixture model (GMM)-based classifier was developed to perform speech/nonspeech as well as voiced/unvoiced classification. The adaptive topology of the wavelet packet decomposition tree based on voiced/unvoiced detection was introduced to separately analyze voiced and unvoiced segments of the speech signal. The main feature vector consists of a combination of log-root compressed wavelet packet parameters, and autoregressive parameters. The final output feature vector is produced using a two-staged feature vector postprocessing procedure. In the experimental framework, the noisy speech databases Aurora 2 and Aurora 3 were applied together with corresponding standardized acoustical model training/testing procedures. The automatic speech recognition performance achieved using the proposed noise robust speech parameterization procedure was compared to the standardized mel-frequency cepstral coefficient (MFCC) feature extraction procedures ETSI ES 201 108 and ETSI ES 202 050.
conference on computer as a tool | 2003
Bojan Kotnik; Zdravko Kacic; Bogomir Horvat
In this paper a noise robust speech feature extraction algorithm using wavelet packet decomposition (WPD) of the speech signal is presented. In contrast to the time-frequency signal representation based on short-time Fourier transform (STFT), a computational efficient WPD can lead to good representation of stationary (vowel phonemes) as well as non-stationary (consonants) segments of the speech signal. In the proposed WPD scheme a novel wavelet function is developed and presented. The noise robustness is improved with the application of proposed wavelet based denoising algorithm with the modified soft thresholding procedure. For decorrelation of feature vector elements and dimensionality reduction of final feature vector a principal component analysis (PCA) is used. Automatic speech recognition results on Aurora 3 database show performance improvement when compared to the standardized mel-frequency cepstral coefficients (MFCC) feature extraction algorithm.
Isa Transactions | 2015
Iztok Blazinšek; Bojan Kotnik; Amor Chowdhury; Zdravko Kacic
This paper presents the problems of implementation and adjustment (calibration) of a metrology engine embedded in NXPs EM773 series microcontroller. The metrology engine is used in a smart metering application to collect data about energy utilization and is controlled with the use of metrology engine adjustment (calibration) parameters. The aim of this research is to develop a method which would enable the operators to find and verify the optimum parameters which would ensure the best possible accuracy. Properly adjusted (calibrated) metrology engines can then be used as a base for variety of products used in smart and intelligent environments. This paper focuses on the problems encountered in the development, partial automatisation, implementation and verification of this method.
international conference on systems, signals and image processing | 2008
Marko Kos; Matej Grasic; Bojan Kotnik; Zdravko Kacic
In this paper we present research work that was carried out on Slovenian BNSI Broadcast News database regarding the speech bandwidth classification. Speech recorded in studio environment has frequency bandwidth of 8 kHz, while speech recorded over telephone channel has the bandwidth of 3.1 kHz. Speech bandwidth classification enables us to use separate speech models for automatic speech recognition (ASR), which helps to improve the overall automatic speech recognition result. For the task of speech bandwidth classification we used two different model-based principles. One principle is based on artificial neural network and the second principle is based on Gaussian mixture models. Both principles have been tested and evaluated using same front-end features for simple result comparison.
Archive | 2006
Bojan Kotnik; Harald Höge; Zdravko Kacic
Archive | 2010
Amor Chowdhury; Milos Urbanija; Bojan Kotnik; Dani Alyamour