Karlo Smid
Ericsson Nikola Tesla
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karlo Smid.
Proceedings of Computer Animation 2002 (CA 2002) | 2002
Karlo Smid; Igor S. Pandzic
Talking virtual characters are graphical simulations of real or imaginary persons capable of human-like behaviour, most importantly talking and gesturing. Coupled with artificial intelligence (AI) techniques, the virtual characters are expected to represent the ultimate abstraction of a human-computer interface, the one where the computer looks, talks and acts like a human. Such an interface would include audio/video analysis and synthesis techniques combined with AI, dialogue management and a vast knowledge base in order to be able to respond quasi-intelligently to the users by: speech, gesture and even mood. While this goal lies further on in the future, we present an architecture that reaches towards it, at the same time aiming for a possibility of practical applications in the nearer future. Our architecture is aimed specifically at the Web. It involves a talking virtual character capable of involving in a fairly meaningful conversation with the user who types in the input.
intelligent virtual agents | 2006
Karlo Smid; Goranka Zoric; Igor S. Pandzic
We introduce a universal architecture for statistically based HUman GEsturing (HUGE) system, for producing and using statistical models for facial gestures based on any kind of inducement. As inducement we consider any kind of signal that occurs in parallel to the production of gestures in human behaviour and that may have a statistical correlation with the occurrence of gestures, e.g. text that is spoken, audio signal of speech, bio signals etc. The correlation between the inducement signal and the gestures is used to first build the statistical model of gestures based on a training corpus consisting of sequences of gestures and corresponding inducement data sequences. In the runtime phase, the raw, previously unknown inducement data is used to trigger (induce) the real time gestures of the agent based on the previously constructed statistical model. We present the general architecture and implementation issues of our system, and further clarify it through two case studies. We believe that this universal architecture is useful for experimenting with various kinds of potential inducement signals and their features and exploring the correlation of such signals or features with the gesturing behaviour.
Multimodal Signals: Cognitive and Algorithmic Issues | 2009
Goranka Zoric; Karlo Smid; Igor S. Pandzic
In our current work we concentrate on finding correlation between speech signal and occurrence of facial gestures. Motivation behind this work is computer-generated human correspondent, ECA. In order to have a believable human representative it is important for an ECA to implement facial gestures in addition to verbal and emotional displays. Information needed for generation of facial gestures is extracted from speech prosody by analyzing natural speech in real-time. This work is based on the previously developed HUGE architecture for statistically-based facial gesturing and extends our previous work on automatic real-time lip sync.
Journal of Multimedia | 2006
Goranka Zoric; Karlo Smid; Igor S. Pandzic
We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic lip sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speakers facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model is triggered with the input speech prosody instead of lexical analysis of the input text.
Lecture Notes in Computer Science | 2006
Goranka Zoric; Karlo Smid; Igor S. Pandzic
In this paper we present our recent results in automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speakers facial gestures, can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.
active media technology | 2005
Goranka Zoric; Karlo Smid; Igor S. Pandzic
We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speakers facial gestures, can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.
ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005. | 2005
Goranka Zoric; Karlo Smid; Igor S. Pandzic
We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic lip sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speakers facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model is triggered with the input speech prosody instead of lexical analysis of the input text.
intelligent virtual agents | 2008
Aleksandra Cerekovic; Goranka Zoric; Karlo Smid; Igor S. Pandzic
In this work we concentrate on finding correlation between speech signal and occurrence of facial gestures with the goal of creating believable virtual humans. We propose a method to implement facial gestures as a valuable part of human behavior and communication. Information needed for the generation of the facial gestures is extracted from speech prosody by analyzing natural speech in real time. This work is based on the previously developed HUGE architecture for statistically based facial gesturing, and extends our previous work on automatic real time lip sync.
Lecture Notes in Artificial Intelligence | 2006
Karlo Smid; Goranka Zoric; Igor S. Pandžić
Archive | 2007
Goranka Zoric; Karlo Smid; Igor S. Pandžić