Marek Hrúz
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marek Hrúz.
Journal on Multimodal User Interfaces | 2008
Oya Aran; Ismail Ari; Lale Akarun; Erinç Dikici; Siddika Parlak; Murat Saraclar; Pavel Campr; Marek Hrúz
The objective of this study is to automatically extract annotated sign data from the broadcast news recordings for the hearing impaired. These recordings present an excellent source for automatically generating annotated data: In news for the hearing impaired, the speaker also signs with the hands as she talks. On top of this, there is also corresponding sliding text superimposed on the video. The video of the signer can be segmented via the help of either the speech or both the speech and the text, generating segmented, and annotated sign videos. We call this application as Signiary, and aim to use it as a sign dictionary where the users enter a word as text and retrieve sign videos of the related sign. This application can also be used to automatically create annotated sign databases that can be used for training recognizers.
Pattern Recognition and Image Analysis | 2012
Ahmet Alp Kindiroglu; Hulya Yalcin; Oya Aran; Marek Hrúz; Pavel Campr; Lale Akarun; Alexey Karpov
This paper presents the design and evaluation of a multi-lingual fingerspelling recognition module that is designed for an information terminal. Through the use of multimodal input and output methods, the information terminal acts as a communication medium between deaf and blind people. The system converts fingerspelled words to speech and vice versa using fingerspelling recognition, fingerspelling synthesis, speech recognition and speech synthesis in Czech, Russian, and Turkish languages. We describe an adaptive skin color based fingersign recognition system with a close to real-time performance and present recognition results on 88 different letters signed by five different signers, using above four hours of training and test videos.
international conference on acoustics, speech, and signal processing | 2017
Marek Hrúz; Zbynek Zajic
The aim of this paper is to propose a speaker change detection technique based on Convolutional Neural Network (CNN) and evaluate its contribution to the performance of a speaker diarization system for telephone conversations. For the comparison we used an i-vector based speaker diarization system. The baseline speaker change detection uses Generalized Likelihood Ratio (GLR) metric. Experiments were conducted on the English part of the CallHome corpus. Our proposed CNN speaker change detection outperformed the GLR approach, reducing the Equal Error Rate relatively by 46 %. The final results on speaker diarization system indicate that the use of speaker change detection based on CNN is beneficial with relative improvement of diarization error rate by 28 %.
Pattern Recognition and Image Analysis | 2012
Marek Hrúz; Jana Trojanová; M. Železný
In this paper we focus on appearance features particularly the Local Binary Patterns describing the manual component of Sign Language. We compare the performance of these features with geometric moments describing the trajectory and shape of hands. Since the non-manual component is also very important for sign recognition we localize facial landmarks via Active Shape Model combined with Landmark detector that increases the robustness of model fitting. We test the recognition performance of individual features and their combinations on a database consisting of 11 signers and 23 signs with several repetitions. Local Binary Patterns outperform the geometric moments. When the features are combined we achieve a recognition rate up to 99.75% for signer dependent tests and 57.54% for signer independent tests.
Journal on Multimodal User Interfaces | 2011
Marek Hrúz; Pavel Campr; Erinç Dikici; Ahmet Alp Kindiroglu; Z. Krňoul; Alexander L. Ronzhin; Hasim Sak; Daniel Schorno; Hulya Yalcin; Lale Akarun; Oya Aran; Alexey Karpov; Murat Saraclar; M. Železný
The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.
international conference on speech and computer | 2016
Marek Hrúz; Marie Kunešová
This paper presents an approach to detect speaker changes in telephone conversations. The speaker change problem is presented as a classification problem. We use a Convolutional Neural Network to analyze short audio segments. The Network plays a role of a regressor. It outputs higher values for segments that are more likely to contain a speaker change. Upon thresholding the regressed value the decision about the segment is made. The experiment shows that the Convolutional Neural Network outperforms a baseline system based on the Bayesian Information Criterion. It behaves very well on previously unseen data produced by previously unheard speakers.
international conference on signal processing | 2010
Zdenek Krnoul; Marek Hrúz; Pavel Campr
In this paper we focus on the potential correlation of the manual and the non-manual component of sign language. This information is useful for sign language analysis, recognition and synthesis. We are mainly concerned with the application for sign synthesis. First we extracted features that represent the manual and non-manual component. We present a simple but robust method for the hand tracking to obtain a 2D trajectory representing a portion of the manual component. The head is tracked via Active Appearance Model. We introduce initial experiments to reveal the relationship between these features. The procedure is verified on the corpus of isolated signs from Czech Sign Language. The results imply that the components of sign language are correlated. The most correlated signals are the vertical movement of head and hands.
text, speech and dialogue | 2011
Marek Hrúz; Z. Krňoul; Pavel Campr; Luděk Müller
This paper deals with novel automatic categorization of signs used in sign language dictionaries. The categorization provides additional information about lexical signs interpreted in the form of video files. We design a new method for automatic parameterization of these video files and categorization of the signs from extracted information. The method incorporates advanced image processing for detection and tracking of hands and head of signing character in the input image sequences. For tracking of hands we developed an algorithm based on object detection and discriminative probability models. For the tracking of head we use active appearance model. This method is a very powerful for detection and tracking of human face. We specify feasible conditions of the model enabling to use the extracted parameters for basic categorization of the non-manual component. We introduce an experiment with the automatic categorization determining symmetry, location and contact of hands, shape of mouth, close eyes and others. The result of experiment is primary the categorization of more than 200 signs and discussion of problems and next extension.
Pattern Recognition and Image Analysis | 2011
Ahmet Alp Kindiroglu; Hulya Yalcin; Oya Aran; Marek Hrúz; Pavel Campr; Lale Akarun; Alexey Karpov
This paper presents the design and evaluation of a multi-lingual fingerspelling recognition module that is designed for an information terminal. Through the use of multimodal input and output methods, the information terminal acts as a communication medium between deaf and blind people. The system converts fingerspelled words to speech and vice versa using fingerspelling recognition, fingerspelling synthesis, speech recognition and speech synthesis in Czech, Russian and Turkish Languages. We describe an adaptive skin color based fingersign recognition system with a close to real-time performance and present recognition results on 88 different letters signed by five different signers, using more than four hours of training and test videos.
International Conference on Interactive Collaborative Robotics | 2016
Ivan Gruber; Miroslav Hlaváč; Marek Hrúz; M. Železný; Alexey Karpov
This paper presents an analysis of datasets of images of human faces with annotated facial keypoints, which are important in human-machine interaction, and their comparison. Datasets are divided according to external conditions of the subject into two groups: datasets in laboratory conditions and in the wild data. Moreover, a quick review of the state-of-the-art methods for keypoints detection is provided. Existing methods are categorized into the following three groups according to the approach to the solution of the problem: top-down, bottom-up and their combination.