Igor S. Pandžić
University of Zagreb
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Igor S. Pandžić.
Signal Processing | 2006
Goranka Zoric; Igor S. Pandžić
Lip synchronization is a method for the determination of the mouth and tongue motion during a speech. It is widely used in multimedia productions, and real time implementation is opening application possibilities in multimodal interfaces. We present an implementation of real time, language independent lip synchronization based on the classification of the speech signal, represented by MFCC vectors, into visemes using neural networks (NNs). Our implementation improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology. The automatic NN configuration with genetic algorithms eliminates the need for tedious manual NN design by trial and error and considerably improves the viseme classification results. Moreover, by the direct usage of visemes as the basic unit of the classification, computation overhead is reduced, since only visemes are used for the animation of the face. The results are obtained in comprehensive validation of the system using three different evaluation methods, two objective and one subjective. The obtained results indicate very good lip synchronization quality in real time conditions and for different languages, making the method suitable for a wide range of applications.
Multimedia Tools and Applications | 2011
Aleksandra Cerekovic; Igor S. Pandžić
Applications with intelligent conversational virtual humans, called Embodied Conversational Agents (ECAs), seek to bring human-like abilities into machines and establish natural human-computer interaction. In this paper we discuss realization of ECA multimodal behaviors which include speech and nonverbal behaviors. We devise RealActor, an open-source, multi-platform animation system for real-time multimodal behavior realization for ECAs. The system employs a novel solution for synchronizing gestures and speech using neural networks. It also employs an adaptive face animation model based on Facial Action Coding System (FACS) to synthesize face expressions. Our aim is to provide a generic animation system which can help researchers create believable and expressive ECAs.
COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony | 2009
Aleksandra Cerekovic; Tomislav Pejsa; Igor S. Pandžić
The Embodied Conversational Agents (ECAs) are an application of virtual characters that is subject of considerable ongoing research. An essential prerequisite for creating believable ECAs is the ability to describe and visually realize multimodal conversational behaviors. The recently developed Behavior Markup Language (BML) seeks to address this requirement by granting a means to specify physical realizations of multimodal behaviors through human-readable scripts. In this paper we present an approach to implement a behavior realizer compatible with BML language. The system’s architecture is based on hierarchical controllers which apply preprocessed behaviors to body modalities. Animation database is feasibly extensible and contains behavior examples constructed upon existing lexicons and theory of gestures. Furthermore, we describe a novel solution to the issue of synchronizing gestures with synthesized speech using neural networks and propose improvements to the BML specification.
Computer Graphics Forum | 2015
Nenad Markuš; Marco Fratarcangeli; Igor S. Pandžić; Jörgen Ahlberg
An image mosaic is an assembly of a large number of small images, usually called tiles, taken from a specific dictionary/codebook. When viewed as a whole, the appearance of a single large image emerges, i.e. each tile approximates a small block of pixels. ASCII art is a related (and older) graphic design technique for producing images from printable characters. Although automatic procedures for both of these visualization schemes have been studied in the past, some are computationally heavy and cannot offer real‐time and interactive performance. We propose an algorithm able to reproduce the quality of existing non‐photorealistic rendering techniques, in particular ASCII art and image mosaics, obtaining large performance speed‐ups. The basic idea is to partition the input image into a rectangular grid and use a decision tree to assign a tile from a pre‐determined codebook to each cell. Our implementation can process video streams from webcams in real time and it is suitable for modestly equipped devices. We evaluate our technique by generating the renderings of a variety of images and videos, with good results. The source code of our engine is publicly available.
active media technology | 2009
Aleksandra Cerekovic; Hsuan-Hung Huang; Takuya Furukawa; Yuji Yamaoka; Igor S. Pandžić; Toyoaki Nishida; Yukiko I. Nakano
In recent years, computer-generated interactive virtual characters, called Embodied Conversational Agents (ECAs), are subjects of considerable ongoing research. Nevertheless, their conversational abilities are mediocre compared to real human behaviors. Among limitations, most of ECAs are incapable of participating in natural conversations in which the number of participants can change dynamically. In the ongoing work we investigate principles of integrating a multi-user support in an ECA system. We present experiments and implementation approach of a prototype system in which a tour guide ECA interacts with one or two users. The system combines different technologies to detect and address the system users and draw their attention. Experimental interaction with the system produces encouraging results. The system can address the users appearance, departure, decreased level of interest and identify his conversational role.
grid computing | 2005
Lea Skorin-Kapov; Igor S. Pandžić; Maja Matijasevic; H. Komericki; Miran Mosmondor
Most current Grid monitoring systems provide a visual user interface. With recent advances in multimedia capabilities in user terminals, there is a strong trend towards interactive, multi-modal and multi-platform visualization. In this paper we describe a multi-platform visualization architecture and a Web based service built upon it, which provides a view of the monitored Grid hierarchy, and the values of selected monitoring parameters for different Grid sites. We demonstrate the application on four platforms: a desktop Personal Computer (PC), a handheld PC, a Java-enabled new-generation mobile phone, and a Wireless Application Protocol (WAP) enabled mobile phone.
computer graphics international | 2018
Michael Beham; Denis Gracanin; Silvana Podaras; Rainer Splechtna; Katja Bühler; Igor S. Pandžić; Kresimir Matkovic
Linking and brushing is an essential technique for interactive data exploration and analysis that leverages coordinated multiple views to identify, select, and combine data points of interest. We propose to augment this technique by directly exploring data space using textual queries. Textual and visual queries are freely combined and modified during the data exploration process. Visual queries are used to refine the results of textual queries and vice versa. This mixed brushing integrates procedural, textual, and visual based data exploration to provide a unified approach to brushing. We also propose an interface --- the Text Query Browser View, that allows users to specify and edit data queries as well as to browse the data query history. Further, we argue why an interactive, on-demand, data aggregation and derivation is necessary, and we provide a flexible mechanism that supports it. We have implemented the proposed approach within an existing visualization tool using a client-server architecture. The approach was illustrated and evaluated using two example data sets.
The Visual Computer | 2018
Rainer Splechtna; Michael Beham; Denis Gracanin; María Luján Ganuza; Katja Bühler; Igor S. Pandžić; Kresimir Matkovic
Studying complex problems often requires identifying and exploring connections and dependencies among several, seemingly unrelated, data sets. Those data sets are often represented as data tables. We propose a novel approach to studying such data sets using linking and brushing across multiple data tables in a coordinated multiple views system. We first identify possible mappings from a subset of one data set to a subset of another data set. That collection of mappings is then used to specify linking among data sets and to support brushing across data sets. Brushing in one data set is then mapped to a brush in the destination data set. If the brush is refined in the destination data set, the inverse mapping, or a back-link, is used to determine the refined brush in the original data set. Brushing and back-links make it possible to efficiently create and analyze complex queries interactively in an iterative process. That process is further supported by a user interface that keeps track of the mappings, links and brushes. The proposed approach is evaluated using three data sets.
The Visual Computer | 2018
Ivan Gogić; Martina Manhart; Igor S. Pandžić; Jörgen Ahlberg
Facial expression recognition applications demand accurate and fast algorithms that can run in real time on platforms with limited computational resources. We propose an algorithm that bridges the gap between precise but slow methods and fast but less precise methods. The algorithm combines gentle boost decision trees and neural networks. The gentle boost decision trees are trained to extract highly discriminative feature vectors (local binary features) for each basic facial expression around distinct facial landmark points. These sparse binary features are concatenated and used to jointly optimize facial expression recognition through a shallow neural network architecture. The joint optimization improves the recognition rates of difficult expressions such as fear and sadness. Furthermore, extensive experiments in both within- and cross-database scenarios have been conducted on relevant benchmark data sets for facial expression recognition: CK+, MMI, JAFFE, and SFEW 2.0. The proposed method (LBF-NN) compares favorably with state-of-the-art algorithms while achieving an order of magnitude improvement in execution time.
Proceedings of the 3rd Symposium on Facial Analysis and Animation | 2012
Nenad Markuš; Miroslav Frljak; Igor S. Pandžić; Jörgen Ahlberg; Robert Forchheimer
Face tracking is an extensively studied field. Nevertheless, it is still a challenge to make a robust and efficient face tracker, especially on mobile devices. This extended abstract briefly describes our implementation of a high-performance multi-platform face and facial feature tracking system. The main characteristics of our approach are that the tracker is fully automatic and works with the majority of faces without any manual initialization. It is robust, resistant to rapid changes in pose and facial expressions, does not suffer from drifting and is modestly computationally expensive. The tracker runs in real-time on mobile devices.