Victor Budkov
Russian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Victor Budkov.
international conference on speech and computer | 2013
Andrey Ronzhin; Victor Budkov
The main stage of speaker diarization is a detection of time labels, where speakers are changed. The most of the approaches to the decision of the speaker turn detection problem is focused on processing of audio signal captured in one channel and applied for archive records. Recently the problem of speaker diarization became to be considered from multimodal point of view. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and will be enhanced based on advances of multimodal situation analysis in a meeting room.
ruSMART/NEW2AN'10 Proceedings of the Third conference on Smart Spaces and next generation wired, and 10th international conference on Wireless networking | 2010
Andrey Ronzhin; Victor Budkov; Alexey Karpov
Web-based collaboration using the wireless devices that have multimedia playback capabilities is a viable alternative to traditional face-to-face meetings. E-meetings are popular in businesses because of their cost savings. To provide quick and effective engagement to the meeting activity, the remote user should be able to perceive whole events in the meeting room and have the same possibilities like participants inside. The technological framework of the developed intelligent meeting room implements multichannel audio-visual system for participant activity detection and automatically composes actual multimedia content for remote mobile user. The developed web-based application for remote user interaction with equipment of the intelligent meeting room and organization of E-meetings were tested with Nokia mobile phones.
NEW2AN'11/ruSMART'11 Proceedings of the 11th international conference and 4th international conference on Smart spaces and next generation wired/wireless networking | 2011
Victor Budkov; Alexander L. Ronzhin; Sergey V. Glazkov; Andrey Ronzhin
Context awareness is one of the key issues at the development of a content management system for meeting recording and teleconference support. An analysis of participant behavior including position in the meeting room, speech activity, face direction, usage of a projector or whiteboard allows the content management system to select actual multimedia streams for recording. The system operation diagram explaining meaningful events for context prediction and equipment actions during the meeting are considered. Experimental results have shown that the graphical content was selected correctly in 97% of whole meeting time.
NEW2AN | 2012
Andrey Ronzhin; Anton I. Saveliev; Victor Budkov
Peculiarities of communication between users and devices in intelligent environment are described in the paper. The main problems are the heterogeneity of hardware and software at integration of mobile devices as well as the variability of natural signals at the development of multimodal user interfaces. The modern context-aware applications take into account preferences and abilities of user and adapt ones to conditions of physical environments, computing and network resources. The means of Android-based mobile devices, which could be used to acquire attributes of context, are analyzed.
international conference on speech and computer | 2017
Denis Ivanko; Alexey Karpov; Dmitry Ryumin; Irina S. Kipyatkova; Anton I. Saveliev; Victor Budkov; Dmitriy Ivanko; M. Železný
The purpose of this study is to develop a robust audio-visual speech recognition system and to investigate the influence of a high-speed video data on the recognition accuracy of continuous Russian speech under different noisy conditions. Developed experimental setup and collected multimodal database allow us to explore the impact brought by the high-speed video recordings with various frames per second (fps) starting from standard 25 fps up to high-speed 200 fps. At the moment there is no research objectively reflecting the dependence of the speech recognition accuracy from the video frame rate. Also there are no relevant audio-visual databases for model training. In this paper, we try to fill in this gap for continuous Russian speech. Our evaluation experiments show the increase of absolute recognition accuracy up to 3% and prove that the use of the high-speed camera JAI Pulnix with 200 fps allows achieving better recognition results under different acoustically noisy conditions.
Conference on Smart Spaces | 2015
Oleg Basov; Andrey Ronzhin; Victor Budkov; Igor Saitov
In the article, a review of the existing methods of transmitted information falsity diagnostics is presented. A conclusion concerning the purposefulness of this function realization in polymodal infocommunication systems has been drawn. A method of defining the multimodal information falsity transmitted in the course of communication act with the help of these systems has been suggested. Common tendencies concerning the dynamics of subscribers’ non-verbal behavior parameters have been formulated. Based on the factor and multiple regressive analysis, the factors depending on such dynamics have been distinguished. Based on the carried out research, a conclusion concerning the possibility of realization of transmitted information falsity in the course of interpersonal communication between subscribers has been drawn and a decisive rule has been formulated.
international conference on information and communication security | 2013
Andrey Ronzhin; Victor Budkov; Irina S. Kipyatkova
The main goal of the development of systems of formal logging activities is to automate the whole process of transcription of the participant speech. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and will be enhanced based on advances of multimodal situation analysis in a meeting room.
International Conference on Interactive Collaborative Robotics | 2016
Nikita Pavluk; Arseniy Ivin; Victor Budkov; Andrey Kodyakov; Andrey Ronzhin
An overview of the existing anthropomorphic robots and an analysis of servomechanisms and bearing parts involved in the assembly of robot legs are presented. We propose an option for constructing the legs of the robot Antares under development. A two-motor layout, used in the knee, ensures higher joint power along with independent interaction with the neighboring upper and lower leg joints when bending. To reduce the electrical load on the main battery of the robot, the upper legs are provided with a mounting pad for additional batteries powering servos. Direct control of the servos is also carried out through the sub-controllers, responsible for all 6 engines installed in the articular joints of the robot legs.
international conference on human-computer interaction | 2013
Alexander L. Ronzhin; Andrey Ronzhin; Victor Budkov
Space-time context structurization is one of the key issues of the development of an automatic audiovisual monitoring system for meeting supporting and analysis of participants’ behavior in a smart space. An analysis of accumulated data about participant’s behavior including position in the meeting room, speech activity, and face direction allows monitoring system to generate participant profile, which is further used for predicting his/her behavior on successive meetings. It is also used for an adjustment of an audio and video recording and of multimedia devices controlling model in the smart room. In the experiment the main attention was paid to speaker localization in the chair zone with 32 predefined positions.
international conference on information and communication security | 2011
Andrey Ronzhin; Alexander L. Ronzhin; Victor Budkov
The issue of automatic selection of the current active speaker among more than thirty participants located in the medium-sized meeting room is considered. Techniques of video tracking and sound source localization are implemented for recording AVI files of speaker remarks in the developed smart meeting room. Video processing of streams from five cameras serves for registration of participants in fixed chair positions, tracking main speaker based on histogram comparison and AdaBoosted cascade classifier for face detection. Multichannel sound source localization based on GCC-PHAT method is used for estimation of the speaker position by four microphone arrays. In the 18dB SNR case the sound source localization rate was about 97% and fine RMSE was lower 0.23 m.
Collaboration
Dive into the Victor Budkov's collaboration.
Tomsk State University of Control Systems and Radio-electronics
View shared research outputs