Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Irina S. Kipyatkova is active.

Publication


Featured researches published by Irina S. Kipyatkova.


Speech Communication | 2014

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Alexey Karpov; Konstantin Markov; Irina S. Kipyatkova; Daria Vazhenina; Andrey Ronzhin

Speech is the most natural way of human communication and in order to achieve convenient and efficient human-computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally focused on several main languages, such as English, French, Spanish, Chinese or Japanese, but some other languages, particularly Eastern European languages, have received much less attention. However, recently, research activities on speech technologies for Czech, Polish, Serbo-Croatian, Russian languages have been steadily increasing. In this paper, we describe our efforts to build an automatic speech recognition (ASR) system for the Russian language with a large vocabulary. Russian is a synthetic and highly inflected language with lots of roots and affixes. This greatly reduces the performance of the ASR systems designed using traditional approaches. In our work, we have taken special attention to the specifics of the Russian language when developing the acoustic, lexical and language models. A special software tool for pronunciation lexicon creation was developed. For the acoustic model, we investigated a combination of knowledge-based and statistical approaches to create several different phoneme sets, the best of which was determined experimentally. For the language model (LM), we introduced a new method that combines syntactical and statistical analysis of the training text data in order to build better n-gram models. Evaluation experiments were performed using two different Russian speech databases and an internally collected text corpus. Among the several phoneme sets we created, the one which achieved the fewest word level recognition errors was the set with 47 phonemes and thus we used it in the following language modeling evaluations. Experiments with 204 thousand words vocabulary ASR were performed to compare the standard statistical n-gram LMs and the language models created using our syntactico-statistical method. The results demonstrated that the proposed language modeling approach is capable of reducing the word recognition errors.


international conference on human computer interaction | 2011

An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision

Alexey Karpov; Andrey Ronzhin; Irina S. Kipyatkova

In this paper, we present a bi-modal user interface aimed both for assistance to persons without hands or with physical disabilities of hands/arms, and for contactless HCI with able-bodied users as well. Human being can manipulate a virtual mouse pointer moving his/her head and verbally communicate with a computer, giving speech commands instead of computer input devices. Speech is a very useful modality to reference objects and actions on objects, whereas head pointing gesture/motion is a powerful modality to indicate spatial locations. The bi-modal interface integrates a tri-lingual system for multi-channel audio signal processing and automatic recognition of voice commands in English, French and Russian as well as a vision-based head detection/tracking system. It processes natural speech and head pointing movements in parallel and fuses both informational streams in a united multimodal command, where each modality transmits own semantic information: head position indicates 2D head/pointer coordinates, while speech signal yields control commands. Testing of the bi-modal user interface and comparison with contact-based pointing interfaces was made by the methodology of ISO 9241-9.


international conference on speech and computer | 2016

HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech

Vasilisa Verkhodanova; Alexander L. Ronzhin; Irina S. Kipyatkova; Denis Ivanko; Alexey Karpov; M. Železný

In this paper we present a software-hardware complex for collection of audio-visual speech databases with a high-speed camera and a dynamic microphone. We describe the architecture of the developed software as well as some details of the collected database of Russian audio-visual speech HAVRUS. The developed software provides synchronization and fusion of both audio and video channels and makes allowance for and processes the natural factor of human speech - the asynchrony of audio and visual speech modalities. The collected corpus comprises recordings of 20 native speakers of Russian and is meant for further research and experiments on audio-visual Russian speech recognition.


Proceedings of the 2012 Joint International Conference on Human-Centered Computer Environments | 2012

State-of-the-art speech recognition technologies for Russian language

Daria Vazhenina; Irina S. Kipyatkova; Konstantin Markov; Alexey Karpov

In this paper, we present a review of the latest developments in the Russian speech recognition research. Although the underlying speech technology is mostly language-independent, differences between languages with respect to their structure and grammar have substantial effect on the recognition systems performance. The Russian language has a complicated word formation system, which is characterized by a high degree of inflection and unrigidness of the word order. This greatly reduces the predictive power of the conventional language models and consequently increases the error rate. Current statistical approach to speech recognition requires large amount of both speech and text data. There exist several Russian speech databases and their descriptions are given in this paper. In addition, we describe and compare several speech recognition systems developed in Russia as well as in some other countries. Finally we suggest some promising directions for further research in Russian speech technology.


international conference on speech and computer | 2017

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

Denis Ivanko; Alexey Karpov; Dmitry Ryumin; Irina S. Kipyatkova; Anton I. Saveliev; Victor Budkov; Dmitriy Ivanko; M. Železný

The purpose of this study is to develop a robust audio-visual speech recognition system and to investigate the influence of a high-speed video data on the recognition accuracy of continuous Russian speech under different noisy conditions. Developed experimental setup and collected multimodal database allow us to explore the impact brought by the high-speed video recordings with various frames per second (fps) starting from standard 25 fps up to high-speed 200 fps. At the moment there is no research objectively reflecting the dependence of the speech recognition accuracy from the video frame rate. Also there are no relevant audio-visual databases for model training. In this paper, we try to fill in this gap for continuous Russian speech. Our evaluation experiments show the increase of absolute recognition accuracy up to 3% and prove that the use of the high-speed camera JAI Pulnix with 200 fps allows achieving better recognition results under different acoustically noisy conditions.


international conference on speech and computer | 2014

A Framework for Recording Audio-Visual Speech Corpora with a Microphone and a High-Speed Camera

Alexey Karpov; Irina S. Kipyatkova; M. Železný

In this paper, we present a novel software framework for recording audio-visual speech corpora with a high-speed video camera (JAI Pulnix RMC 6740) and a dynamic microphone (Oktava MK-012) Architecture of the developed software framework for recording audio-visual Russian speech corpus is described. It provides synchronization and fusion of audio and video data captured by the independent sensors. The software automatically detects voice activity in audio signal and stores only speech fragments discarding non-informative signals. It takes into account and processes natural asynchrony of audio-visual speech modalities as well.


international conference on speech and computer | 2015

A Comparison of RNN LM and FLM for Russian Speech Recognition

Irina S. Kipyatkova; Alexey Karpov

In the paper, we describe a research of recurrent neural network (RNN) language model (LM) for N-best list rescoring for automatic continuous Russian speech recognition and make a comparison of it with factored language model (FLM). We tried RNN with different number of units in the hidden layer. For FLM creation, we used five linguistic factors: word, lemma, stem, part-of-speech, and morphological tag. All models were trained on the text corpus of 350M words. Also we made linear interpolation of RNN LM and FLM with the baseline 3-gram LM. We achieved the relative WER reduction of 8 % using FLM and 14 % relative WER reduction using RNN LM with respect to the baseline model.


international conference on information and communication security | 2013

PARAD-R: Speech analysis software for meeting support

Andrey Ronzhin; Victor Budkov; Irina S. Kipyatkova

The main goal of the development of systems of formal logging activities is to automate the whole process of transcription of the participant speech. In this paper we outline modern methods of audio and video signal processing and personification data analysis for multimodal speaker diarization. The proposed PARAD-R software for Russian speech analysis implemented for audio speaker diarization and will be enhanced based on advances of multimodal situation analysis in a meeting room.


text speech and dialogue | 2010

Client and speech detection system for intelligent infokiosk

Andrey Ronzhin; Alexey Karpov; Irina S. Kipyatkova; Miloý Železný

Timely attraction of a client and detection of his/her speech message in real noisy conditions are main difficulties at deployment of speech and multimodal interfaces in information kiosks. Combination of sound source localization, voice activity and face detection technologies allowed to determine client mouth coordinates and extract boundaries of speech signal appeared in the kiosk dialogue area. Talking head model based on audio-visual speech synthesis immediately greets the client, when her face is captured in the videomonitoring area, in order to attract him/her to the information service before leaving the interaction area. Clients face tracking is also used for turning the talking head in direction to the client that significantly improves the naturalness of interaction. The developed infokiosk set in the institute hall provides information about structure and staff of laboratories. Statistics of human-kiosk interaction is accumulated within last six months in 2009.


Procedia Computer Science | 2016

Automatic Technologies for Processing Spoken Sign Languages

Alexey Karpov; Irina S. Kipyatkova; Milos Zelezny

Abstract Sign languages are known as a natural means for verbal communication of the deaf and hard of hearing people. There is no universal sign language, and almost each country has its own national sign language and fingerspelling alphabet. Sign languages use visual-kinetic clues for human-to-human communication combining hand gestures with lips articulation and facial mimics. They also possess a special grammar that is quite different from that of speech-based spoken languages. Sign languages are spoken (silently) by a hundred million deaf people all over the world and the most popular are American (ASL), Chinese, Brazilian, Russian, and British Sign Languages; there are almost 140 such languages according to the Ethnologue. They do not have a natural written form, and there is a huge lack of electronic resources for them, in particular, vocabularies, audio-visual databases, automatic recognition and synthesis systems, etc. Thus, sign languages may be considered as non-written under-resourced spoken languages. In this paper, we present a computer system for text-to-sign language synthesis for the Russian and Czech Sign Languages.

Collaboration


Dive into the Irina S. Kipyatkova's collaboration.

Top Co-Authors

Avatar

Alexey Karpov

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Andrey Ronzhin

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Milos Zelezny

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Anton I. Saveliev

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Denis Ivanko

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Victor Budkov

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

M. Železný

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dmitry Ryumin

Russian Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge