Milan Sečujski
University of Novi Sad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Milan Sečujski.
text speech and dialogue | 2002
Milan Sečujski; Radovan Obradovic; Darko Pekar; Ljubomir Jovanov; Vlado Delić
This paper presents some basic criteria for conception of a concatenative text-to-speech synthesizer in Serbian language. The paper describes the prosody generator which was used and reflects upon several peculiarities of Serbian language which led to its adoption. Within the paper, the results of an experiment showing the influence of natural-sounding prosody on human speech recognition are discussed. The paper also describes criteria for on-line selection of appropriate segments from a large speech corpus, as well as criteria for off-line preparations of the speech database for synthesis.
Applied Intelligence | 2012
Branislav M. Popovic; Marko Janev; Darko Pekar; Niksa Jakovljevic; Milan Gnjatović; Milan Sečujski; Vlado Delić
The paper presents a novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models, which tends to improve on the local optimal solution determined by the initial constellation. It is initialized by local optimal parameters obtained by using a baseline approach similar to k-means, and it tends to approach more closely to the global optimum of the target clustering function, by iteratively splitting and merging the clusters of Gaussian components obtained as the output of the baseline algorithm. The algorithm is further improved by introducing model selection in order to obtain the best possible trade-off between recognition accuracy and computational load in a Gaussian selection task applied within an actual recognition system. The proposed method is tested both on artificial data and in the framework of Gaussian selection performed within a real continuous speech recognition system, and in both cases an improvement over the baseline method has been observed.
Archive | 2010
Darko Pekar; Dragiša Mišković; Dragan Knezevic; Nataša Vujnović Sedlar; Milan Sečujski; Vlado Delić
The chapter will present the first applications of speech technologies in the countries of Western Balkans, launched by the Serbian company AlfaNum. The speech technologies for Serbian and kindred South Slavic languages are developed in cooperation with the University of Novi Sad, Serbia. Most of these applications are rather innovative in Western Balkans and they will serve as a base for complex systems which will enable 20 millions of inhabitants of this part of Europe to talk to machines in their midst in their native languages, equally to their counterparts who live in more developed countries in the region. Firstly, the importance of research and development of speech technologies will be stressed, particularly in view of their language dependence and, on the other hand, the possibility of their wide application. The central part of the chapter will focus on the results of the research and development of the first applications of automatic speech recognition (ASR) and text-to-speech synthesis (TTS) across Western Balkans – some of them are a novelty in a much wider region as well. The paper will be concluded by the directions of future research and development of new applications of speech technologies in the Western Balkan region and worldwide.
international symposium on intelligent systems and informatics | 2013
Milana Bojanić; Milan Gnjatović; Milan Sečujski; Vlado Delić
This paper reports on the application of the dimensional emotion model in automatic emotional speech recognition. Using the perceptron rule in combination with acoustic features, an approach to speech-based emotion recognition is introduced, which can classify the utterance with respect to the valence-arousal (V-A) dimensions of its emotional content. The mapping of 5 discrete emotion classes onto the 3-class emotional clusters in the V-A space was adopted. Two corpora of acted emotional speech were used to compare recognition results: the Berlin Emotional Speech Database (in German) and the Corpus of Emotional and Attitude Expressive Speech (in Serbian). The experimental results show that the discrimination of emotional speech along the arousal dimension is better than the discrimination along the valence dimension for both corpora.
ieee eurocon | 2009
Niksa Jakovljevic; Milan Sečujski; Vlado Delić
In this paper performances of automatic speech recognition systems which use Vocal Tract Length Normalization (VTN) are presented. Beside standard procedure for VTN coefficient estimation several variants based on robust statistic methods are introduced. All systems which use VTN performed better than referent systems, while the best performance was achieved by the system in which the VTN coefficient for a particular speaker is chosen as the one with maximum sample mean of likelihoods per phoneme. Phoneme likelihoods are calculated as sample medians of feature vectors corresponding to particular phonemes. The relative improvement of performance for this system is about 20%.
The Scientific World Journal | 2015
Branko Lučić; Stevan Ostrogonac; Nataša Vujnović Sedlar; Milan Sečujski
The inclusion of persons with disabilities has always represented an important issue. Advancements within the field of computer science have enabled the development of different types of aids, which have significantly improved the quality of life of the disabled. However, for some disabilities, such as visual impairment, the purpose of these aids is to establish an alternative communication channel and thus overcome the users disability. Speech technologies play the crucial role in this process. This paper presents the ongoing efforts to create a set of educational applications based on speech technologies for Serbian for the early stages of education of blind and partially sighted children. Two educational applications dealing with memory exercises and comprehension of geometrical shapes are presented, along with the initial tests results obtained from research including visually impaired pupils.
international symposium on intelligent systems and informatics | 2012
Stevan Ostrogonac; Dragisa Miskovic; Milan Sečujski; Darko Pekar; Vlado Delić
This paper proposes a method of creating language models for highly inflective non-agglutinative languages. Three types of language models were considered - a common n-gram model, an n-gram model of lemmas and a class n-gram model. The last two types were specially designed for the Serbian language reflecting its unique grammar structure. All the language models were trained on a carefully collected data set incorporating several literary styles and a great variety of domain-specific textual documents in Serbian. Language models of the three types were created for different sets of textual corpora and evaluated by perplexity values they have given on the test data. A log-linear combination of the common, lemma-based and class n-gram models that was also created shows promising results in overcoming the data sparsity problem. However, the evaluation of this combined model in the context of a large vocabulary continuous speech recognition system (LVCSR) is yet to be done in order to establish the improvement in terms of word error rate (WER).
international conference on speech and computer | 2016
Milan Sečujski; Branislav Gerazov; Tamás Gábor Csapó; Vlado Delić; Philip N. Garner; Aleksandar Gjoreski; David Guennec; Zoran A. Ivanovski; Aleksandar Melov; Géza Németh; Ana Stojkovic; György Szaszák
Since the prosody of a spoken utterance carries information about its discourse function, salience, and speaker attitude, prosody models and prosody generation modules have played a crucial part in text-to-speech (TTS) synthesis systems from the beginning, especially those set not only on sounding natural, but also on showing emotion or particular speaker intention. Prosody transfer within speech-to-speech translation is a recent research area with increasing importance, with one of its most important research topics being the detection and treatment of salient events, i.e. instances of prominence or focus which do not result from syntactic constraints, but are rather products of semantic or pragmatic level effects. This paper presents the design and the guidelines for the creation of a multilingual speech corpus containing prosodically rich sentences, ultimately aimed at training statistical prosody models for multilingual prosody transfer in the context of expressive speech synthesis.
international conference on speech and computer | 2016
Tijana Delic; Branislav Gerazov; Branislav M. Popovic; Milan Sečujski
One of the most recently proposed techniques for modeling the prosody of an utterance is the decomposition of its pitch, duration and/or energy contour into physiologically motivated units called atoms, based on matching pursuit. Since this model is based on the physiology of the production of sentence intonation, it is essentially language independent. However, the intonation of an utterance in a particular language is obviously under the influence of factors of a predominantly linguistic nature. In this research, restricted to the case of American English with prosody annotated using standard ToBI conventions, we have shown that, under certain mild constraints, the positive and negative atoms identified in the pitch contour coincide very well with high and low pitch accents and phrase accents of ToBI. By giving a linguistic interpretation of the atom decomposition model, this research enables its practical use in domains such as speech synthesis or cross-lingual prosody transfer.
New Journal of Physics | 2016
Norbert Cselyuszka; Milan Sečujski; Nader Engheta; Vesna Crnojevic-Bengin
Conventional approaches to the control of acoustic waves propagating along boundaries between fluids and hard grooved surfaces are limited to the manipulation of surface geometry. Here we demonstrate for the first time, through theoretical analysis, numerical simulation as well as experimentally, that the velocity of acoustic surface waves, and consequently the direction of their propagation as well as the shape of their wave fronts, can be controlled by varying the temperature distribution over the surface. This significantly increases the versatility of applications such as sound trapping, acoustic spectral analysis and acoustic focusing, by providing a simple mechanism for modifying their behavior without any change in the geometry of the system. We further discuss that the dependence between the behavior of acoustic surface waves and the temperature of the fluid can be exploited conversely as well, which opens a way for potential application in the domain of temperature sensing.