Andrey Ronzhin
Russian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrey Ronzhin.
Speech Communication | 2014
Alexey Karpov; Konstantin Markov; Irina S. Kipyatkova; Daria Vazhenina; Andrey Ronzhin
Speech is the most natural way of human communication and in order to achieve convenient and efficient human-computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally focused on several main languages, such as English, French, Spanish, Chinese or Japanese, but some other languages, particularly Eastern European languages, have received much less attention. However, recently, research activities on speech technologies for Czech, Polish, Serbo-Croatian, Russian languages have been steadily increasing. In this paper, we describe our efforts to build an automatic speech recognition (ASR) system for the Russian language with a large vocabulary. Russian is a synthetic and highly inflected language with lots of roots and affixes. This greatly reduces the performance of the ASR systems designed using traditional approaches. In our work, we have taken special attention to the specifics of the Russian language when developing the acoustic, lexical and language models. A special software tool for pronunciation lexicon creation was developed. For the acoustic model, we investigated a combination of knowledge-based and statistical approaches to create several different phoneme sets, the best of which was determined experimentally. For the language model (LM), we introduced a new method that combines syntactical and statistical analysis of the training text data in order to build better n-gram models. Evaluation experiments were performed using two different Russian speech databases and an internally collected text corpus. Among the several phoneme sets we created, the one which achieved the fewest word level recognition errors was the set with 47 phonemes and thus we used it in the following language modeling evaluations. Experiments with 204 thousand words vocabulary ASR were performed to compare the standard statistical n-gram LMs and the language models created using our syntactico-statistical method. The results demonstrated that the proposed language modeling approach is capable of reducing the word recognition errors.
NEW2AN '09 and ruSMART '09 Proceedings of the 9th International Conference on Smart Spaces and Next Generation Wired/Wireless Networking and Second Conference on Smart Spaces | 2009
Andrey Ronzhin; V. Yu. Budkov
Multimodal user interface capable to perceive speech, movements, poses and gestures of meeting participants in order to determinate their needs provides the natural and intuitively understandable way of interaction with the developed intelligent meeting room. Awareness of the room about spatial position of the participants, their current activities, and roles in a current event, their preferences helps to predict more accurately the intentions and needs of participants. The accomplished integration of Nokia mobile phones with the sensor network and smart services allowed users to control effectors, audio/video and other facilities from outside. Some scenarios of multimodal interaction with the room as well as issues of adaptation of user interface to limitations of mobile phone browser are discussed.
international conference on speech and computer | 2013
Andrey Ronzhin; Victor Budkov
The main stage of speaker diarization is a detection of time labels, where speakers are changed. The most of the approaches to the decision of the speaker turn detection problem is focused on processing of audio signal captured in one channel and applied for archive records. Recently the problem of speaker diarization became to be considered from multimodal point of view. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and will be enhanced based on advances of multimodal situation analysis in a meeting room.
ruSMART/NEW2AN'10 Proceedings of the Third conference on Smart Spaces and next generation wired, and 10th international conference on Wireless networking | 2010
Andrey Ronzhin; Victor Budkov; Alexey Karpov
Web-based collaboration using the wireless devices that have multimedia playback capabilities is a viable alternative to traditional face-to-face meetings. E-meetings are popular in businesses because of their cost savings. To provide quick and effective engagement to the meeting activity, the remote user should be able to perceive whole events in the meeting room and have the same possibilities like participants inside. The technological framework of the developed intelligent meeting room implements multichannel audio-visual system for participant activity detection and automatically composes actual multimedia content for remote mobile user. The developed web-based application for remote user interaction with equipment of the intelligent meeting room and organization of E-meetings were tested with Nokia mobile phones.
international conference on human computer interaction | 2011
Alexey Karpov; Andrey Ronzhin; Irina S. Kipyatkova
In this paper, we present a bi-modal user interface aimed both for assistance to persons without hands or with physical disabilities of hands/arms, and for contactless HCI with able-bodied users as well. Human being can manipulate a virtual mouse pointer moving his/her head and verbally communicate with a computer, giving speech commands instead of computer input devices. Speech is a very useful modality to reference objects and actions on objects, whereas head pointing gesture/motion is a powerful modality to indicate spatial locations. The bi-modal interface integrates a tri-lingual system for multi-channel audio signal processing and automatic recognition of voice commands in English, French and Russian as well as a vision-based head detection/tracking system. It processes natural speech and head pointing movements in parallel and fuses both informational streams in a united multimodal command, where each modality transmits own semantic information: head position indicates 2D head/pointer coordinates, while speech signal yields control commands. Testing of the bi-modal user interface and comparison with contact-based pointing interfaces was made by the methodology of ISO 9241-9.
international conference on universal access in human-computer interaction | 2014
Alexey Karpov; Andrey Ronzhin
In this paper, we present a universal assistive technology with multimodal input and multimedia output interfaces. The conceptual model and the software-hardware architecture with levels and components of the universal assistive technology are described. The architecture includes five main interconnected levels: computer hardware, system software, application software of digital signal processing, application software of human-computer interfaces, software of assistive information technologies. The universal assistive technology proposes several multimodal systems and interfaces to the people with disabilities: audio-visual Russian speech recognition system (AVSR), “Talking head” synthesis system (text-to-audiovisual speech), “Signing avatar” synthesis system (sign language visual synthesis), ICANDO multimodal system (hands-free PC control system), and the control system of an assistive smart space.
NEW2AN'11/ruSMART'11 Proceedings of the 11th international conference and 4th international conference on Smart spaces and next generation wired/wireless networking | 2011
Victor Budkov; Alexander L. Ronzhin; Sergey V. Glazkov; Andrey Ronzhin
Context awareness is one of the key issues at the development of a content management system for meeting recording and teleconference support. An analysis of participant behavior including position in the meeting room, speech activity, face direction, usage of a projector or whiteboard allows the content management system to select actual multimedia streams for recording. The system operation diagram explaining meaningful events for context prediction and equipment actions during the meeting are considered. Experimental results have shown that the graphical content was selected correctly in 97% of whole meeting time.
International Conference on Interactive Collaborative Robotics | 2016
Andrey Ronzhin; Anton I. Saveliev; Oleg Basov; Sergey Solyonyj
In this paper, we propose a conceptual model of a cyberphysical environment based on a new approach to distribution of sensor, network, computing, information-control and service tasks between mobile robots, embedded devices, mobile client devices, stationary service equipment, and cloud computing and information resources. The task of structural-parametric synthesis of the corresponding cyberphysical system is formalized. Methods of integer-valued programming are used for the task solution.
international conference on ultra modern telecommunications | 2010
V.Yu. Budkov; Maria Prischepa; Andrey Ronzhin; Alexey Karpov
Multimodal human-computer interface provides a connection between a user and a computer in a natural manner. In order to determine position of the user and his speech queries, the robot uses a set of web-cameras, microphone array and technologies for distant speech recognition and face tracking. There are some methods for interaction between users and computers, such as manual entry, verbal dialog, visual and emotional communications. Interaction model of the proposed mobile informational robot has 3 main parts: beginning of the dialogue, task execution and completion of the dialogue. Further development of the model is aimed to implement a technology for gesture recognition for supporting a special group of disabled people.
NEW2AN | 2012
Andrey Ronzhin; Anton I. Saveliev; Victor Budkov
Peculiarities of communication between users and devices in intelligent environment are described in the paper. The main problems are the heterogeneity of hardware and software at integration of mobile devices as well as the variability of natural signals at the development of multimodal user interfaces. The modern context-aware applications take into account preferences and abilities of user and adapt ones to conditions of physical environments, computing and network resources. The means of Android-based mobile devices, which could be used to acquire attributes of context, are analyzed.