Alexander L. Ronzhin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander L. Ronzhin is active.

Explore More

Publication

Featured researches published by Alexander L. Ronzhin.

NEW2AN'11/ruSMART'11 Proceedings of the 11th international conference and 4th international conference on Smart spaces and next generation wired/wireless networking | 2011

Event-driven content management system for smart meeting room

Victor Budkov; Alexander L. Ronzhin; Sergey V. Glazkov; Andrey Ronzhin

Context awareness is one of the key issues at the development of a content management system for meeting recording and teleconference support. An analysis of participant behavior including position in the meeting room, speech activity, face direction, usage of a projector or whiteboard allows the content management system to select actual multimedia streams for recording. The system operation diagram explaining meaningful events for context prediction and equipment actions during the meeting are considered. Experimental results have shown that the graphical content was selected correctly in 97% of whole meeting time.

international conference on speech and computer | 2016

HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech

Vasilisa Verkhodanova; Alexander L. Ronzhin; Irina S. Kipyatkova; Denis Ivanko; Alexey Karpov; M. Železný

In this paper we present a software-hardware complex for collection of audio-visual speech databases with a high-speed camera and a dynamic microphone. We describe the architecture of the developed software as well as some details of the collected database of Russian audio-visual speech HAVRUS. The developed software provides synchronization and fusion of both audio and video channels and makes allowance for and processes the natural factor of human speech - the asynchrony of audio and visual speech modalities. The collected corpus comprises recordings of 20 native speakers of Russian and is meant for further research and experiments on audio-visual Russian speech recognition.

ruSMART/NEW2AN'10 Proceedings of the Third conference on Smart Spaces and next generation wired, and 10th international conference on Wireless networking | 2010

A video monitoring model with a distributed camera system for the smart space

Alexander L. Ronzhin; Maria Prischepa; Alexey Karpov

The developed model of video monitoring based on distributed camera system is intended for automation of registration of meeting participants. The problem of detecting position of participants simultaneously attended in the room and their faces is solved by joint use of wide-angle, omni-directional and PTZ cameras located on walls and ceiling of the room, as well as implementation of face detection technology for photographing the zones of possible occurrence of participants faces. The quality of photos was estimated by accuracy of pointing the PTZ camera and the diagonal value of a face in the captured image. The deviation of face in the frame relatively the center averaged 8% and the square of face was no less than 7% of the image. Also it was observed that the mean time of registration increases, when the number of participants increases and the level of illumination decreases. Further development of the model is aimed at the automation of capturing the main speaker, discussion, vote and other key phases of events.

Journal on Multimodal User Interfaces | 2011

Automatic fingersign-to-speech translation system

Marek Hrúz; Pavel Campr; Erinç Dikici; Ahmet Alp Kindiroglu; Z. Krňoul; Alexander L. Ronzhin; Hasim Sak; Daniel Schorno; Hulya Yalcin; Lale Akarun; Oya Aran; Alexey Karpov; Murat Saraclar; M. Železný

The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.

international conference on speech and computer | 2015

Algorithms for Low Bit-Rate Coding with Adaptation to Statistical Characteristics of Speech Signal

Anton I. Saveliev; Oleg Basov; Andrey Ronzhin; Alexander L. Ronzhin

The article establishes the general trends of speech coding algorithms based on linear prediction. The task of adaptation of speech codec to the statistical characteristics of the coding parameters is set and accomplished. The main procedures of their forming are examined. The results of experimental studies of the developed adaptive low bit-rate coding algorithms are presented. The benefits of the quality of remade speech in comparison with algorithms on FS1015, FS1017 and FS1016 standards and Full-rate GSM are displayed.

international conference on universal access in human-computer interaction | 2015

Automatic Analysis of Speech and Acoustic Events for Ambient Assisted Living

Alexey Karpov; Alexander L. Ronzhin; Irina S. Kipyatkova

We present a prototype of an ambient assisted living (AAL) with multimodal user interaction. In our research, the AAL environment is one studio room of 60 + square meters that has several tables, chairs and a sink, as well as equipped with four stationary microphones and two omni-directional video cameras. In this paper, we focus mainly on audio signal processing techniques for monitoring the assistive smart space and recognition of speech and non-speech acoustic events for automatic analysis of human’s activities and detection of possible emergency situations with the user (when an emergent help is needed). Acoustical modeling in our audio recognition system is based on single order Hidden Markov Models with Gaussian Mixture Models. The recognition vocabulary includes 12 non-speech acoustic events for different types of human activities plus 5 useful spoken commands (keywords), including a subset of alarm audio events. We have collected an audio-visual corpus containing about 1.3 h of audio data from 5 testers, who performed proposed test scenarios, and made the practical experiments with the system, results of which are reported in this paper.

international conference on speech and computer | 2014

Algorithms for Acceleration of Image Processing at Automatic Registration of Meeting Participants

Alexander L. Ronzhin; Irina V. Vatamaniuk; Andrey Ronzhin; M. Železný

The aim of the research is to develop the algorithms for acceleration of image processing at automatic registration of meeting participant based on implementation of blurriness estimation and recognition of participants faces procedures. The data captured by the video registration system in the intelligent meeting room are used for calculation variety of person face size in captured image as well as for estimation of face recognition methods. The results shows that LBP method has highest recognition accuracy (79,5%) as well as PCA method has the lowest false alarm rate (1,3%). The implementation of the blur estimation procedure allowed the registration system to exclude 22% photos with insufficient quality, as a result the speed of the whole system were significantly increased.

international conference on human-computer interaction | 2013

Methodology of Facility Automation Based on Audiovisual Analysis and Space-Time Structuring of Situation in Meeting Room

Alexander L. Ronzhin; Andrey Ronzhin; Victor Budkov

Space-time context structurization is one of the key issues of the development of an automatic audiovisual monitoring system for meeting supporting and analysis of participants’ behavior in a smart space. An analysis of accumulated data about participant’s behavior including position in the meeting room, speech activity, and face direction allows monitoring system to generate participant profile, which is further used for predicting his/her behavior on successive meetings. It is also used for an adjustment of an audio and video recording and of multimedia devices controlling model in the smart room. In the experiment the main attention was paid to speaker localization in the chair zone with 32 predefined positions.

international conference on information and communication security | 2011

Audiovisual speaker localization in medium smart meeting room

Andrey Ronzhin; Alexander L. Ronzhin; Victor Budkov

The issue of automatic selection of the current active speaker among more than thirty participants located in the medium-sized meeting room is considered. Techniques of video tracking and sound source localization are implemented for recording AVI files of speaker remarks in the developed smart meeting room. Video processing of streams from five cameras serves for registration of participants in fixed chair positions, tracking main speaker based on histogram comparison and AdaBoosted cascade classifier for face detection. Multichannel sound source localization based on GCC-PHAT method is used for estimation of the speaker position by four microphone arrays. In the 18dB SNR case the sound source localization rate was about 97% and fine RMSE was lower 0.23 m.

international conference on pattern recognition | 2010

Multimodal Human Computer Interaction with MIDAS Intelligent Infokiosk

Alexey Karpov; Andrey Ronzhin; Irina S. Kipyatkova; Alexander L. Ronzhin; Lale Akarun

In this paper, we present an intelligent information kiosk called MIDAS (Multimodal Interactive-Dialogue Automaton for Self-service), including its hardware and software architecture, stages of deployment of speech recognition and synthesis technologies. MIDAS uses the methodology Wizard of Oz (WOZ) that allows an expert to correct speech recognition results and control the dialogue flow. User statistics of the multimodal human computer interaction (HCI) have been analyzed for the operation of the kiosk in the automatic and automated modes. The infokiosk offers information about the structure and staff of laboratories, the location and phones of departments and employees of the institution. The multimodal user interface is provided with a touch screen, natural speech input and head and manual gestures, both for ordinary and physically handicapped users.

Explore More