Miran Pobar
University of Rijeka
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Miran Pobar.
international convention on information and communication technology, electronics and microelectronics | 2014
Miran Pobar; Ivo Ipšić
Speaker de-identification is the process by which speech is transformed in a way that the speaker identity is masked, while at the same time the transformed speech preserves acoustic information that contributes to the intelligibility, naturalness and clarity. Systems that perform speech de-identification could be used in voice driven applications (for example in call centres) where the speakers identity has to be hidden. The paper describes the experiments we have performed in order to de-identify speech using GMM based voice transformation techniques and speaker identification using freely available tools. We propose a method by which speakers whose speech has not been used to build voice transformations (for training) can be efficiently de-identified online. The proposed method is evaluated using a speech database of read speech and a small set of speakers. The results we present show that the proposed de-identification method performs similarly as a closed-set de-identification procedure that requires previous enrolment and can efficiently be used for online speaker de-identification.
Pattern Recognition | 2016
Marina Ivašić-Kos; Miran Pobar; Slobodan Ribaric
Automatic image annotation involves automatically assigning useful keywords to an unlabelled image. The major goal is to bridge the so-called semantic gap between the available image features and the keywords that people might use to annotate images. Although different people will most likely use different words to annotate the same image, most people can use object or scene labels when searching for images.We propose a two-tier annotation model where the first tier corresponds to object-level and the second tier to scene-level annotation. In the first tier, images are annotated with labels of objects present in them, using multi-label classification methods on low-level features extracted from images. Scene-level annotation is performed in the second tier, using the originally developed inference-based algorithms for annotation refinement and for scene recognition. These algorithms use a fuzzy knowledge representation scheme based on Fuzzy Petri Net, KRFPNs, that is defined to enable reasoning with concepts useful for image annotation. To define the elements of the KRFPNs scheme, novel data-driven algorithms for acquisition of fuzzy knowledge are proposed.The proposed image annotation model is evaluated separately on the first and on the second tier using a dataset of outdoor images. The results outperform the published results obtained on the same image collection, both on the object-level and on scene-level annotation. Different subsets of features composed of dominant colours, image moments, and GIST descriptors, as well as different classification methods (RAKEL, ML-kNN and Naive Bayes), were tested in the first tier. The results of scene level annotation in the second tier are also compared with a common classification method (Naive Bayes) and have shown superior performance. The proposed model enables the expanding of image annotation with new concepts regardless of their level of abstraction. Multi-label classification and knowledge-based approach to image annotation.The definition of the fuzzy knowledge representation scheme based on FPN.Novel data-driven algorithms for automatic acquisition of fuzzy knowledge.Novel inference based algorithms for annotation refinement and scene recognition.A comparison of inference-based scene classification with an ordinary approach.
international convention on information and communication technology, electronics and microelectronics | 2014
Marina Ivašić-Kos; Miran Pobar; Luka Mikec
A person can quickly grasp the genre (drama, comedy, cartoons, etc.) from a movie poster, regardless of visual clutter and the level of details. Bearing this in mind, it can be assumed that simple properties of a movie poster should play a significant role in automated detection of movie genres. Therefore, low-level features based on colors and edges are extracted from poster images and used for poster classification into genres. In this paper, poster classification is modeled as a multilabel classification task, where a single movie may belong to more than one class (genre). To simplify and solve the multilabel problem, two methods for multi-label data transformation are described and evaluated given the classification results obtained by distance ranking, Naïve Bayes and RAKEL. Experiments are conducted on a set of 1500 posters with 6 movie genres. Results provide insights into the properties of the discussed algorithms and features.
text speech and dialogue | 2012
Tadej Justin; Miran Pobar; Ivo Ipšić; Janez Žibert
In this paper we investigate a bilingual HMM-based speech synthesis developed for Slovenian and Croatian languages. The primary goals of this research are to investigate the performance of an HMM-based synthesis build from two similar languages and to perform a comparison of such synthesis system with standard monolingual speaker-dependent HMM-based synthesis. The bilingual HMM synthesis is built by joining all the speech material from both languages by defining proper mapping of Slovenian and Croatian phonemes and by adapting acoustic models of Slovenian and Croatian speakers. Adapted acoustic models are then served as basic building blocks for speech synthesis in both languages. In such a way we are able to obtain synthesized speech of both languages, but with the same speaker voice. We made the quantitative comparison of such kind of synthesis with monolingual counterparts and study the performance of the synthesis in a relation to the amount of data, which is used for building the synthesis system.
text speech and dialogue | 2013
Miran Pobar; Tadej Justin; Janez Žibert; Ivo Ipšić
We compare the performance of two approaches when using cross-lingual data from different speakers to build bilingual speech synthesis systems capable of producing speech with the same speaker identity. One approach treats data from both languages as monolingual, by labeling all data with a manually joined phoneme set. Speaker independent voice is trained using the joined data, and adapted to the target speaker using the CMLLR adaptation.
Advances in intelligent systems and computing | 2014
Marina Ivašić-Kos; Miran Pobar; Ivo Ipšić
A person can quickly grasp the movie genre (drama, comedy, cartoons, etc.) from a poster, regardless of short observation time, clutter and variety of details. Bearing this in mind, it can be assumed that simple properties of a movie poster should play a significant role in automated detection of movie genres. Therefore, visual features based on colors and structural cues are extracted from poster images and used for poster classification into genres.
computer analysis of images and patterns | 2017
Miran Pobar; Marina Ivašić-Kos
Classification of movies into genres from the accompanying promotional materials such as posters is a typical multi-label classification problem. Posters usually highlight a movie scene or characters, and at the same time should inform about the genre or the plot of the movie to attract the potential audience, so our assumption was that the relevant information can be captured in visual features.
international convention on information and communication technology electronics and microelectronics | 2017
M. Buric; Miran Pobar; M. Ivasic Kos
Action recognition in videos is currently in the focus of scientific research due to improvements made in automatic analysis of static images and greater availability of processing power. The paper provides an overview of the key models and methods for action recognition that comprise human models and methods based on estimation of joint trajectories, silhouettes and template matching and spatio-temporal local descriptors. To deal with compound actions and activities, action semantic models are proposed with help of expert knowledge. Since the action recognition task is domain dependent, the methods and models are built and tested on domain specific databases. The paper provides an overview and description of recent video datasets that were created for developing action recognition methods, with an emphasis on datasets with additional modalities such as depth images or accelerometer data.
international conference on artificial intelligence | 2017
Marina Ivašić-Kos; Miran Pobar
Movies can belong to more than one genre, so the problem of determining the genres of a movie from its poster is a multi-label classification problem. To solve the multi-label problem, we have used the RAKEL ensemble method along with three typical single-label base classification methods: Naive Bayes, C4.5 decision tree, and k-NN. The RAKEL method strives to overcome the problem of computational cost and power set label explosion by breaking the initial set of labels into several small-sized label sets.
international convention on information and communication technology electronics and microelectronics | 2016
Miran Pobar; Marina Ivašić-Kos
Automatic image annotation methods automatically assign labels to images in order to facilitate tasks such as image retrieval, search, organizing and management. Incorrect labels may negatively influence the search results so image annotation should be as accurate as possible. Labels pertaining to objects or to whole scenes are commonly used for image annotation, and precision is especially important in case when scene labels are inferred from objects, as errors in the object labels may propagate to the scene level. One way to improve the annotation precision is by detecting and discarding the automatically assigned object labels that do not fit the context of other detected objects. This procedure is referred to as annotation refinement. Here, an approach to detection of likely incorrect labels based on the context of other labels and prior knowledge about mutual occurrence of various objects in images is tested.