Vlado Delić | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vlado Delić is active.

Explore More

Publication

Featured researches published by Vlado Delić.

Applied Intelligence | 2010

Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Marko Janev; Darko Pekar; Niksa Jakovljevic; Vlado Delić

In this paper a novel algorithm for Gaussian Selection (GS) of mixtures used in a continuous speech recognition system is presented. The system is based on hidden Markov models (HMM), using Gaussian mixtures with full covariance matrices as output distributions. The purpose of Gaussian selection is to increase the speed of a speech recognition system, without degrading the recognition accuracy. The basic idea is to form hyper-mixtures by clustering close mixtures into a single group by means of Vector Quantization (VQ) and assigning it unique Gaussian parameters for estimation. In the decoding process only those hyper-mixtures which are above a designated threshold are selected, and only mixtures belonging to them are evaluated, improving computational efficiency. There is no problem with the clustering and evaluation if overlaps between the mixtures are small, and their variances are of the same range. However, in real case, there are numerous models which do not fit this profile. A Gaussian selection scheme proposed in this paper addresses this problem. For that purpose, beside the clustering algorithm, it also incorporates an algorithm for mixture grouping. The particular mixture is assigned to a group from the predefined set of groups, based on a value aggregated from eigenvalues of the covariance matrix of that mixture using Ordered Weighted Averaging operators (OWA). After the grouping of mixtures is carried out, Gaussian mixture clustering is performed on each group separately.

text speech and dialogue | 2002

AlfaNum System for Speech Synthesis in Serbian Language

Milan Sečujski; Radovan Obradovic; Darko Pekar; Ljubomir Jovanov; Vlado Delić

This paper presents some basic criteria for conception of a concatenative text-to-speech synthesizer in Serbian language. The paper describes the prosody generator which was used and reflects upon several peculiarities of Serbian language which led to its adoption. Within the paper, the results of an experiment showing the influence of natural-sounding prosody on human speech recognition are discussed. The paper also describes criteria for on-line selection of appropriate segments from a large speech corpus, as well as criteria for off-line preparations of the speech database for synthesis.

international symposium on intelligent systems and informatics | 2012

Adaptive multimodal interaction with industrial robot

Milan Gnjatović; Jovica Tasevski; Milutin Nikolić; Dragiša Mišković; Branislav Borovac; Vlado Delić

This paper reports a spoken natural language dialogue system that manages the interaction between the user and the industrial robot ABB IRB 140. To the extent that the dialogue system is multimodal, it uses three communication modalities: (i) spoken language (automatic speech recognition and text-to-speech synthesis), (ii) visual recognition of the figures and determination of their positions, and (iii) typed text. To the extent that the dialogue system is adaptive, it takes the verbal and spatial contexts into account in order to adapt its dialogue behavior and to process spontaneously formulated user commands of different syntactic forms without explicit syntactic expectations. The industrial robot is slightly modified and enabled to manipulate over graphical figures, following the instructions of the dialogue system.

Knowledge Based Systems | 2014

Cognitively-inspired representational approach to meaning in machine dialogue

Milan Gnjatović; Vlado Delić

One of the most fundamental research questions in the field of human–machine interaction is how to enable dialogue systems to capture the meaning of spontaneously produced linguistic inputs without explicit syntactic expectations. This paper introduces a cognitively-inspired representational model intended to address this research question. To the extent that this model is cognitively-inspired, it integrates insights from behavioral and neuroimaging studies on working memory operations and language-impaired patients (i.e., Brocas aphasics). The level of detail contained in the specification of the model is sufficient for a computational implementation, while the level of abstraction is sufficient to enable generalization of the model over different interaction domains. Finally, the paper reports on a domain-independent framework for end-user programming of adaptive dialogue management modules.

Applied Intelligence | 2012

A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

Branislav M. Popovic; Marko Janev; Darko Pekar; Niksa Jakovljevic; Milan Gnjatović; Milan Sečujski; Vlado Delić

The paper presents a novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models, which tends to improve on the local optimal solution determined by the initial constellation. It is initialized by local optimal parameters obtained by using a baseline approach similar to k-means, and it tends to approach more closely to the global optimum of the target clustering function, by iteratively splitting and merging the clusters of Gaussian components obtained as the output of the baseline algorithm. The algorithm is further improved by introducing model selection in order to obtain the best possible trade-off between recognition accuracy and computational load in a Gaussian selection task applied within an actual recognition system. The proposed method is tested both on artificial data and in the framework of Gaussian selection performed within a real continuous speech recognition system, and in both cases an improvement over the baseline method has been observed.

international symposium on intelligent signal processing and communication systems | 2011

Improvement of Thai speech emotion recognition by using face feature analysis

Igor Stankovic; Montri Karnjanadecha; Vlado Delić

The fact that in Thai language emotions are not usually manifested, mostly because any emotion would interfere with meaning otherwise, makes this language very difficult for any kind of emotion recognition. Our proposed Thai emotion recognition system consists of two parts - speech emotion recognition and improvements of the system using face feature analysis. For this purpose audiovisual Thai emotion database was recorded. Speech emotion recognition is based on calculating fundamental frequency, zero crossing rate and energy from short-time wavelet signals, and shows great system results with accuracy of 97.8%. Our current research activities are directed to improving the accuracy of the overall system using face feature analysis, therefore showing that vision is as crucial as hearing is for expressing and recognizing any emotion.

symposium on neural network applications in electrical engineering | 2012

Autonomic telemedical application for Android based mobile devices

Stevan Jokić; Srđan Krčo; Dejan Sakač; Ivan Jokić; Vlado Delić

In this paper, a mobile telemedicine application implemented for Android based devices is presented. The main applications functionality of ECG transmission is extended by real time ECG analysis, as well as real time analyze of acceleration data captured by embedded acceleration sensor. In this paper are presented efficient algorithms for ECG and acceleration data analysis. The ECG analysis is focused on arrhythmic heartbeats detection and pathological ST-T segment detection. Arrhythmic heartbeats detection is performed on the estimated ECG model features using Artificial Neural Networks (ANN). In the mobile application alarms could be defined, which triggering can send e-mail messages with attached ECG images and excel formatted data reports. Data from the acceleration sensor are analyzed regarding to monitor user walking activity. Mobile application is integrated in the existing telemedical system using predefined interfaces, but she also provides high autonomy to the end users with or without medical knowledge.

symposium on neural network applications in electrical engineering | 2012

Application of neural networks in emotional speech recognition

Milana Bojanić; Vladimir S. Crnojevic; Vlado Delić

Emotional speech recognition (ESR) from the aspect of human-machine interaction (HCI) is a prerequisite for the framework of interacting partners within the HCI. This paper addresses the application of neural network (NN) in ESR. The performance of NN is tested using three different feature sets which are basis for ESR: prosodic features, spectral features and a set of their combination. The results of these feature sets are compared using several network topologies and two training algorithms. It has been shown that using joint prosodic-spectral feature set as input to three layer feed-forward NN trained with back-propagation algorithm has the best performance in 5-class emotional speech recognition task.

international conference on speech and computer | 2015

Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit

Branislav M. Popovic; Stevan Ostrogonac; Edvin Pakoci; Niksa Jakovljevic; Vlado Delić

This paper presents a deep neural network (DNN) based large vocabulary continuous speech recognition (LVCSR) system for Serbian, developed using the open-source Kaldi speech recognition toolkit. The DNNs are initialized using stacked restricted Boltzmann machines (RBMs) and trained using cross-entropy as the objective function and the standard error backpropagation procedure in order to provide posterior probability estimates for the hidden Markov model (HMM) states. Emission densities of HMM states are represented as Gaussian mixture models (GMMs). The recipes were modified based on the particularities of the Serbian language in order to achieve the optimal results. A corpus of approximately 90 hours of speech (21000 utterances) is used for the training. The performances are compared for two different sets of utterances between the baseline GMM-HMM algorithm and various DNN settings.

Archive | 2010

Applications of Speech Technologies in Western Balkan Countries

Darko Pekar; Dragiša Mišković; Dragan Knezevic; Nataša Vujnović Sedlar; Milan Sečujski; Vlado Delić

The chapter will present the first applications of speech technologies in the countries of Western Balkans, launched by the Serbian company AlfaNum. The speech technologies for Serbian and kindred South Slavic languages are developed in cooperation with the University of Novi Sad, Serbia. Most of these applications are rather innovative in Western Balkans and they will serve as a base for complex systems which will enable 20 millions of inhabitants of this part of Europe to talk to machines in their midst in their native languages, equally to their counterparts who live in more developed countries in the region. Firstly, the importance of research and development of speech technologies will be stressed, particularly in view of their language dependence and, on the other hand, the possibility of their wide application. The central part of the chapter will focus on the results of the research and development of the first applications of automatic speech recognition (ASR) and text-to-speech synthesis (TTS) across Western Balkans – some of them are a novelty in a much wider region as well. The paper will be concluded by the directions of future research and development of new applications of speech technologies in the Western Balkan region and worldwide.

Explore More