Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Darko Pekar is active.

Publication


Featured researches published by Darko Pekar.


Applied Intelligence | 2010

Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Marko Janev; Darko Pekar; Niksa Jakovljevic; Vlado Delić

In this paper a novel algorithm for Gaussian Selection (GS) of mixtures used in a continuous speech recognition system is presented. The system is based on hidden Markov models (HMM), using Gaussian mixtures with full covariance matrices as output distributions. The purpose of Gaussian selection is to increase the speed of a speech recognition system, without degrading the recognition accuracy. The basic idea is to form hyper-mixtures by clustering close mixtures into a single group by means of Vector Quantization (VQ) and assigning it unique Gaussian parameters for estimation. In the decoding process only those hyper-mixtures which are above a designated threshold are selected, and only mixtures belonging to them are evaluated, improving computational efficiency. There is no problem with the clustering and evaluation if overlaps between the mixtures are small, and their variances are of the same range. However, in real case, there are numerous models which do not fit this profile. A Gaussian selection scheme proposed in this paper addresses this problem. For that purpose, beside the clustering algorithm, it also incorporates an algorithm for mixture grouping. The particular mixture is assigned to a group from the predefined set of groups, based on a value aggregated from eigenvalues of the covariance matrix of that mixture using Ordered Weighted Averaging operators (OWA). After the grouping of mixtures is carried out, Gaussian mixture clustering is performed on each group separately.


Archive | 2010

Speech Technologies for Serbian and Kindred South Slavic Languages

Marko Janev; Radovan Obradovic; Darko Pekar

This chapter will present the results of the research and development of speech technologies for Serbian and other kindred South Slavic languages used in five countries of the Western Balkans, carried out by the University of Novi Sad, Serbia in cooperation with the company AlfaNum. The first section will describe particularities of highly inflected languages (such as Serbian and other languages dealt with in this chapter) from the point of view of speech technologies. The following sections will describe the existing speech and language resources for these languages, the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems developed on the basis of these resources as well as auxiliary software components designed in order to aid this development. It will be explained how the resources originally built for the Serbian language facilitated the development of speech technologies in Croatian, Bosnian, and Macedonian as well. The paper is concluded by the directions of further research aimed at development of multimodal dialogue systems in South Slavic languages.


text speech and dialogue | 2002

AlfaNum System for Speech Synthesis in Serbian Language

Milan Sečujski; Radovan Obradovic; Darko Pekar; Ljubomir Jovanov; Vlado Delić

This paper presents some basic criteria for conception of a concatenative text-to-speech synthesizer in Serbian language. The paper describes the prosody generator which was used and reflects upon several peculiarities of Serbian language which led to its adoption. Within the paper, the results of an experiment showing the influence of natural-sounding prosody on human speech recognition are discussed. The paper also describes criteria for on-line selection of appropriate segments from a large speech corpus, as well as criteria for off-line preparations of the speech database for synthesis.


Applied Intelligence | 2012

A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

Branislav M. Popovic; Marko Janev; Darko Pekar; Niksa Jakovljevic; Milan Gnjatović; Milan Sečujski; Vlado Delić

The paper presents a novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models, which tends to improve on the local optimal solution determined by the initial constellation. It is initialized by local optimal parameters obtained by using a baseline approach similar to k-means, and it tends to approach more closely to the global optimum of the target clustering function, by iteratively splitting and merging the clusters of Gaussian components obtained as the output of the baseline algorithm. The algorithm is further improved by introducing model selection in order to obtain the best possible trade-off between recognition accuracy and computational load in a Gaussian selection task applied within an actual recognition system. The proposed method is tested both on artificial data and in the framework of Gaussian selection performed within a real continuous speech recognition system, and in both cases an improvement over the baseline method has been observed.


Archive | 2010

Applications of Speech Technologies in Western Balkan Countries

Darko Pekar; Dragiša Mišković; Dragan Knezevic; Nataša Vujnović Sedlar; Milan Sečujski; Vlado Delić

The chapter will present the first applications of speech technologies in the countries of Western Balkans, launched by the Serbian company AlfaNum. The speech technologies for Serbian and kindred South Slavic languages are developed in cooperation with the University of Novi Sad, Serbia. Most of these applications are rather innovative in Western Balkans and they will serve as a base for complex systems which will enable 20 millions of inhabitants of this part of Europe to talk to machines in their midst in their native languages, equally to their counterparts who live in more developed countries in the region. Firstly, the importance of research and development of speech technologies will be stressed, particularly in view of their language dependence and, on the other hand, the possibility of their wide application. The central part of the chapter will focus on the results of the research and development of the first applications of automatic speech recognition (ASR) and text-to-speech synthesis (TTS) across Western Balkans – some of them are a novelty in a much wider region as well. The paper will be concluded by the directions of future research and development of new applications of speech technologies in the Western Balkan region and worldwide.


text speech and dialogue | 2008

Energy Normalization in Automatic Speech Recognition

Niksa Jakovljevic; Marko Janev; Darko Pekar; Dragisa Miskovic

In this paper a novel method for energy normalization is presented. The objective of this method is to remove unwanted energy variations caused by different microphone gains, various loudness levels across speakers, as well as changes of single speaker loudness level over time. The solution presented here is based on principles used in automatic gain control. The use of this method results in relative improvement of the performances of an automatic speech recognition system by 26 %.


telecommunications forum | 2014

On the realization of AnSpeechCollector, system for creating transcribed speech database

Siniša Suzić; Darko Pekar; Vlado Delić

In this paper we present system which is used for creating transcribed speech database in a fast and efficient manner. The system consists of a client application running on Android based mobile phones and a dedicated server. We also present a database of approximately 55 hours of speech created with this system.


international conference on speech and computer | 2017

End-to-End Large Vocabulary Speech Recognition for the Serbian Language

Branislav M. Popovic; Edvin Pakoci; Darko Pekar

This paper presents the results of a large vocabulary speech recognition for the Serbian language, developed by using Eesen end-to-end framework. Eesen involves training a single deep recurrent neural network, containing a number of bidirectional long short-term memory layers, modeling the connection between the speech and a set of context-independent lexicon units. This approach reduces the amount of expert knowledge needed in order to develop other competitive speech recognition systems. The training is based on a connectionist temporal classification, while decoding allows the usage of weighted finite-state transducers. This provides much faster and more efficient decoding in comparison to other similar systems. A corpus of approximately 215 h of audio data (about 171 h of speech and 44 h of silence, or 243 male and 239 female speakers) was employed for the training (about 90%) and testing (about 10%) purposes. On a set of more than 120000 words, the word error rate of 14.68% and the character error rate of 3.68% is achieved.


international conference on speech and computer | 2017

Language Model Optimization for a Deep Neural Network Based Speech Recognition System for Serbian

Edvin Pakoci; Branislav M. Popovic; Darko Pekar

This paper presents the results obtained using several variants of trigram language models in a large vocabulary continuous speech recognition (LVCSR) system for the Serbian language, based on the deep neural network (DNN) framework implemented within the Kaldi speech recognition toolkit. This training approach allows parallelization using several threads on either multiple GPUs or multiple CPUs, and provides a natural-gradient modification to the stochastic gradient descent (SGD) optimization method. Acoustic models are trained over a fixed number of training epochs with parameter averaging in the end. This paper discusses recognition using different language models trained with Kneser-Ney or Good-Turing smoothing methods, as well as several pruning parameter values. The results on a test set containing more than 120000 words and different utterance types are explored and compared to the referent results with GMM-HMM speaker-adapted models for the same speech database. Online and offline recognition results are compared to each other as well. Finally, the effect of additional discriminative training using a language model prior to the DNN stage is explored.


international symposium on intelligent systems and informatics | 2012

A language model for highly inflective non-agglutinative languages

Stevan Ostrogonac; Dragisa Miskovic; Milan Sečujski; Darko Pekar; Vlado Delić

This paper proposes a method of creating language models for highly inflective non-agglutinative languages. Three types of language models were considered - a common n-gram model, an n-gram model of lemmas and a class n-gram model. The last two types were specially designed for the Serbian language reflecting its unique grammar structure. All the language models were trained on a carefully collected data set incorporating several literary styles and a great variety of domain-specific textual documents in Serbian. Language models of the three types were created for different sets of textual corpora and evaluated by perplexity values they have given on the test data. A log-linear combination of the common, lemma-based and class n-gram models that was also created shows promising results in overcoming the data sparsity problem. However, the evaluation of this combined model in the context of a large vocabulary continuous speech recognition system (LVCSR) is yet to be done in order to establish the improvement in terms of word error rate (WER).

Collaboration


Dive into the Darko Pekar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marko Janev

American Academy of Arts and Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge