Niksa Jakovljevic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Niksa Jakovljevic is active.

Explore More

Publication

Featured researches published by Niksa Jakovljevic.

Applied Intelligence | 2010

Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Marko Janev; Darko Pekar; Niksa Jakovljevic; Vlado Delić

In this paper a novel algorithm for Gaussian Selection (GS) of mixtures used in a continuous speech recognition system is presented. The system is based on hidden Markov models (HMM), using Gaussian mixtures with full covariance matrices as output distributions. The purpose of Gaussian selection is to increase the speed of a speech recognition system, without degrading the recognition accuracy. The basic idea is to form hyper-mixtures by clustering close mixtures into a single group by means of Vector Quantization (VQ) and assigning it unique Gaussian parameters for estimation. In the decoding process only those hyper-mixtures which are above a designated threshold are selected, and only mixtures belonging to them are evaluated, improving computational efficiency. There is no problem with the clustering and evaluation if overlaps between the mixtures are small, and their variances are of the same range. However, in real case, there are numerous models which do not fit this profile. A Gaussian selection scheme proposed in this paper addresses this problem. For that purpose, beside the clustering algorithm, it also incorporates an algorithm for mixture grouping. The particular mixture is assigned to a group from the predefined set of groups, based on a value aggregated from eigenvalues of the covariance matrix of that mixture using Ordered Weighted Averaging operators (OWA). After the grouping of mixtures is carried out, Gaussian mixture clustering is performed on each group separately.

Applied Intelligence | 2012

A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

Branislav M. Popovic; Marko Janev; Darko Pekar; Niksa Jakovljevic; Milan Gnjatović; Milan Sečujski; Vlado Delić

The paper presents a novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models, which tends to improve on the local optimal solution determined by the initial constellation. It is initialized by local optimal parameters obtained by using a baseline approach similar to k-means, and it tends to approach more closely to the global optimum of the target clustering function, by iteratively splitting and merging the clusters of Gaussian components obtained as the output of the baseline algorithm. The algorithm is further improved by introducing model selection in order to obtain the best possible trade-off between recognition accuracy and computational load in a Gaussian selection task applied within an actual recognition system. The proposed method is tested both on artificial data and in the framework of Gaussian selection performed within a real continuous speech recognition system, and in both cases an improvement over the baseline method has been observed.

international conference on speech and computer | 2015

Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit

Branislav M. Popovic; Stevan Ostrogonac; Edvin Pakoci; Niksa Jakovljevic; Vlado Delić

This paper presents a deep neural network (DNN) based large vocabulary continuous speech recognition (LVCSR) system for Serbian, developed using the open-source Kaldi speech recognition toolkit. The DNNs are initialized using stacked restricted Boltzmann machines (RBMs) and trained using cross-entropy as the objective function and the standard error backpropagation procedure in order to provide posterior probability estimates for the hidden Markov model (HMM) states. Emission densities of HMM states are represented as Gaussian mixture models (GMMs). The recipes were modified based on the particularities of the Serbian language in order to achieve the optimal results. A corpus of approximately 90 hours of speech (21000 utterances) is used for the training. The performances are compared for two different sets of utterances between the baseline GMM-HMM algorithm and various DNN settings.

text speech and dialogue | 2008

Energy Normalization in Automatic Speech Recognition

Niksa Jakovljevic; Marko Janev; Darko Pekar; Dragisa Miskovic

In this paper a novel method for energy normalization is presented. The objective of this method is to remove unwanted energy variations caused by different microphone gains, various loudness levels across speakers, as well as changes of single speaker loudness level over time. The solution presented here is based on principles used in automatic gain control. The use of this method results in relative improvement of the performances of an automatic speech recognition system by 26 %.

ieee eurocon | 2009

Vocal tract length normalization strategy based on maximum likelihood criterion

Niksa Jakovljevic; Milan Sečujski; Vlado Delić

In this paper performances of automatic speech recognition systems which use Vocal Tract Length Normalization (VTN) are presented. Beside standard procedure for VTN coefficient estimation several variants based on robust statistic methods are introduced. All systems which use VTN performed better than referent systems, while the best performance was achieved by the system in which the VTN coefficient for a particular speaker is chosen as the one with maximum sample mean of likelihoods per phoneme. Phoneme likelihoods are calculated as sample medians of feature vectors corresponding to particular phonemes. The relative improvement of performance for this system is about 20%.

international symposium on intelligent systems and informatics | 2012

Comparison of the automatic speaker recognition performance over standard features

Milan M. Dobrović; Vlado Delić; Niksa Jakovljevic; Ivan Jokic

This paper presents a study of speaker recognition accuracy depending on the choice of features, window width and model complexity. The standard features were considered, such as linear and perceptual prediction coefficients (LPC and PLP) and mel-frequency cepstral coefficients (MFCC). Gaussian mixture model (GMM), with the use of HTK tools, was chosen for speaker modelling. Speech database S70W100s120, recorded at the Electrical Engineering Department of Belgrade University, was used for purposes of system training and testing. Ten speaker models and the universal background model (UBM) were trained.

International Journal of Advanced Robotic Systems | 2017

Hybrid methodological approach to context-dependent speech recognition:

Dragiša Mišković; Milan Gnjatović; Perica Štrbac; Branimir Trenkić; Niksa Jakovljevic; Vlado Delić

Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

international conference on speech and computer | 2016

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Edvin Pakoci; Branislav M. Popovic; Niksa Jakovljevic; Darko Pekar; Fathy Yassa

In this paper, a novel variant of an automatic phonetic segmentation procedure is presented, especially useful if data is scarce. The procedure uses the Kaldi speech recognition toolkit as its basis, and combines and modifies several existing methods and Kaldi recipes. Both the specifics of model training and test data alignment are explained in detail. Effectiveness of artificial extension of the starting amount of manually labeled material during training is examined as well. Experimental results show the admirable overall correctness of the proposed procedure in the given test environment. Several variants of the procedure are compared, and the usage of speaker-adapted context-dependent triphone models trained without the expanded manually checked data is proven to produce the best results. A few ways to improve the procedure even more, as well as future work, are also discussed.

telecommunications forum | 2015

Voice assistant application for the Serbian language

Branislav M. Popovic; Edvin Pakoci; Niksa Jakovljevic; Goran Kocis; Darko Pekar

This paper presents a Voice Assistant, an Android based personal assistant application for mobile phones, allowing voice control for the Serbian language. The native interface is provided for a large vocabulary continuous speech recognition system based on the open-source Kaldi speech recognition toolkit. Several acoustic models were trained using a database of about 70000 utterances, by adding different amount of noise. The results are provided for a test database of about 4500 utterances and a test vocabulary of more than 14000 words.

telecommunications forum | 2014

Gaussian mixture model with precision matrices approximated by sparsely represented eigenvectors

Niksa Jakovljevic

This paper proposes a model which approximates full covariance matrices in Gaussian mixture models (GMM) with a reduced number of parameters and computations required for likelihood evaluations. In the proposed model inverse covariance (precision) matrices are approximated using sparsely represented eigenvectors, i.e. each eigenvector of a covariance/precision matrix is represented as a linear combination of a small number of vectors from an overcomplete dictionary. A maximum likelihood algorithm for parameter estimation and its practical implementation are presented. Experimental results on a speech recognition task show that while keeping the word error rate close to the one obtained by GMMs with full covariance matrices, the proposed model can reduce the number of parameters by 45%.

Explore More