Goranka Zoric | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Goranka Zoric is active.

Explore More

Publication

Featured researches published by Goranka Zoric.

Signal Processing | 2006

Real-time language independent lip synchronization method using a genetic algorithm

Goranka Zoric; Igor S. Pandžić

Lip synchronization is a method for the determination of the mouth and tongue motion during a speech. It is widely used in multimedia productions, and real time implementation is opening application possibilities in multimodal interfaces. We present an implementation of real time, language independent lip synchronization based on the classification of the speech signal, represented by MFCC vectors, into visemes using neural networks (NNs). Our implementation improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology. The automatic NN configuration with genetic algorithms eliminates the need for tedious manual NN design by trial and error and considerably improves the viseme classification results. Moreover, by the direct usage of visemes as the basic unit of the classification, computation overhead is reduced, since only visemes are used for the animation of the face. The results are obtained in comprehensive validation of the system using three different evaluation methods, two objective and one subjective. The obtained results indicate very good lip synchronization quality in real time conditions and for different languages, making the method suitable for a wide range of applications.

international conference on multimedia and expo | 2005

A Real-Time Lip SYNC System Using a Genetic Algorithm for Automatic Neural Network Configuration

Goranka Zoric; Igor S. Pandzic

In this paper we present a new method for mapping a natural speech to the lip shape animation in the real time. The speech signal, represented by MFCC vectors, is classified into viseme classes using neural networks. The topology of neural networks is automatically configured using genetic algorithms. This eliminates the need for tedious manual neural network design by trial and error and considerably improves the viseme classification results. This method is suitable for real-time and offline applications

Multiagent and Grid Systems | 2008

Integrating embodied conversational agent components with a generic framework

Hung-Hsuan Huang; Aleksandra Cerekovic; Kateryna Tarasenko; Vjekoslav Levacic; Goranka Zoric; Igor S. Pandzic; Yukiko I. Nakano; Toyoaki Nishida

Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural multi-modal conversations, ECA systems are sophisticated and require numbers of building assemblies. They are thus difficult for an individual research group to develop. To address this problem, we are developing an approach to connect those components with a Generic ECA (GECA) framework. GECA is composed with a blackboard-model based platform, a high-level protocol and a set of APIs which are meant for easing component wrapper development. As an expectation, with such a generic ECA framework, rapid ECA system prototyping is possible while research result sharing and the collaboration between ECA researchers can be facilitated. This paper describes the basic concepts of this framework, an initial implementation and evaluations of actually using it to build a realistic ECA application.

intelligent virtual agents | 2006

[HUGE]: universal architecture for statistically based HUman GEsturing

Karlo Smid; Goranka Zoric; Igor S. Pandzic

We introduce a universal architecture for statistically based HUman GEsturing (HUGE) system, for producing and using statistical models for facial gestures based on any kind of inducement. As inducement we consider any kind of signal that occurs in parallel to the production of gestures in human behaviour and that may have a statistical correlation with the occurrence of gestures, e.g. text that is spoken, audio signal of speech, bio signals etc. The correlation between the inducement signal and the gestures is used to first build the statistical model of gestures based on a training corpus consisting of sequences of gestures and corresponding inducement data sequences. In the runtime phase, the raw, previously unknown inducement data is used to trigger (induce) the real time gestures of the agent based on the previously constructed statistical model. We present the general architecture and implementation issues of our system, and further clarify it through two case studies. We believe that this universal architecture is useful for experimenting with various kinds of potential inducement signals and their features and exploring the correlation of such signals or features with the gesturing behaviour.

international conference on telecommunications | 2007

Towards an Embodied Conversational Agent Talking in Croatian

Aleksandra Cerekovic; Hung-Hsuan Huang; Goranka Zoric; Kateryna Tarasenko; Vjekoslav Levacic; Igor S. Pandzic; Yukiko I. Nakano; Toyoaki Nishida

The advancement of traffic makes world more and more internationalized and increases frequency of communication between people who come from different cultures. Differences in their conversation go beyond the languages they speak to the non-verbal behaviors they express while talking. To improve the abilities of the embodied conversational agents (ECAs) while interacting with human users we are working on the ECA based application, a tour guide of city Dubrovnik that servers visitors in Japanese, Croatian and general western cultures speaking in English. This paper presents the overall architecture and explains possible extension with another culture. It describes issues we met while making agent talk in Croatian and proposed solutions.

Multimedia Tools and Applications | 2011

On creating multimodal virtual humans--real time speech driven facial gesturing

Goranka Zoric; Robert Forchheimer; Igor S. Pandzic

Because of extensive use of different computer devices, human-computer interaction design nowadays moves towards creating user centric interfaces. It assumes incorporating different modalities that humans use in everyday communication. Virtual humans, who look and behave believably, fit perfectly in the concept of designing interfaces in more natural, effective, as well as social oriented way. In this paper we present a novel method for automatic speech driven facial gesturing for virtual humans capable of real time performance. Facial gestures included are various nods and head movements, blinks, eyebrow gestures and gaze. A mapping from speech to facial gestures is based on the prosodic information obtained from the speech signal. It is realized using a hybrid approach—Hidden Markov Models, rules and global statistics. Further, we test the method using an application prototype—a system for speech driven facial gesturing suitable for virtual presenters. Subjective evaluation of the system confirmed that the synthesized facial movements are consistent and time aligned with the underlying speech, and thus provide natural behavior of the whole face.

Multimodal Signals: Cognitive and Algorithmic Issues | 2009

Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Goranka Zoric; Karlo Smid; Igor S. Pandzic

In our current work we concentrate on finding correlation between speech signal and occurrence of facial gestures. Motivation behind this work is computer-generated human correspondent, ECA. In order to have a believable human representative it is important for an ECA to implement facial gestures in addition to verbal and emotional displays. Information needed for generation of facial gestures is extracted from speech prosody by analyzing natural speech in real-time. This work is based on the previously developed HUGE architecture for statistically-based facial gesturing and extends our previous work on automatic real-time lip sync.

Journal on Multimodal User Interfaces | 2007

An agent based multicultural tour guide system with nonverbal user interface

Hung-Hsuan Huang; Kateryna Tarasenko; Toyoaki Nishida; Aleksandra Cerekovic; Vjekoslav Levacic; Goranka Zoric; Igor S. Pandzic; Yukiko I. Nakano

The advancement of traffic and computer networks makes the world more and more internationalized and increases the frequency of communications between people who speak different languages and show different nonverbal behaviors. To improve the communication of embodied conversational agent (ECA) systems with their human users, the importance of their capability to cover cultural differences emerged. Various excellent ECA systems are developed and proposed previously, however, the cross-culture communication issues are seldom addressed by researchers. This paper describes a short-term project aiming to explore the possibility of rapidly building multicultural and the multimodal ECA interfaces for a tour guide system by using a generic framework connecting their functional blocks.

international conference on telecommunications | 2005

Automatic lip sync and its use in the new multimedia services for mobile devices

Goranka Zoric; Igor S. Pandzic

In this paper we present a new method for mapping natural speech to lip shape animation in real time. The speech signal, represented by MFCC vectors, is classified into viseme classes using neural networks. The topology of neural networks is automatically configured using genetic algorithms. This eliminates the need for tedious manual neural network design by trial and error and considerably improves the viseme classification results. This method is available in real-time and offline mode, and is suitable for various applications. So, we propose the new multimedia services for mobile devices based on the lip sync system described.

Journal of Multimedia | 2006

Automated Gesturing for Virtual Characters: Speech-driven and Text-driven Approaches

Goranka Zoric; Karlo Smid; Igor S. Pandzic

We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic lip sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speakers facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model is triggered with the input speech prosody instead of lexical analysis of the input text.

Explore More