Jakub Kanis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jakub Kanis is active.

Explore More

Publication

Featured researches published by Jakub Kanis.

text speech and dialogue | 2010

Comparison of different lemmatization approaches through the means of information retrieval performance

Jakub Kanis; Lucie Skorkovská

This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared dictionary of lemmas and set of derivation rules while the second one is based on automatic inference of the dictionary and the rules from training data. The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments. Such method is suitable for efficient and rather reliable comparison of the lemmatization performance since a correct lemmatization has proven to be crucial for IR effectiveness in highly inflected languages. Moreover, the proposed indirect comparison of the lemmatizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across different systems.

text speech and dialogue | 2005

Automatic lemmatizer construction with focus on OOV words lemmatization

Jakub Kanis; Luděk Müller

This paper deals with the automatic construction of a lemmatizer from a Full Form – Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three kinds of OOV words (missing full forms, unknown words, and compound words) are introduced. These methods were tested on Czech test data. The best result (recall: 99.3 % and precision: 75.1 %) has been achieved by a combination of these methods. The lexicon-free lemmatizer based on the method of lemmatization of unknown words (lemmatization patterns method) is introduced too.

international conference on machine learning | 2007

Czech text-to-sign speech synthesizer

Z. Krňoul; Jakub Kanis; M. Železný; Luděk Müller

Recent research progress in developing of the Czech - Sign Speech synthesizer is presented. The current goal is to improve the system for automatic synthesis to produce accurate synthesis of the Sign Speech. The synthesis system converts written text to an animation of an artificial human model (avatar). This includes translation of text to sign phrases and their conversion to the animation of the avatar. The animation is composed of movements and deformations of segments of hands, a head and also a face. The system has been evaluated by two initial perceptual tests. The perceptual tests indicate that the designed synthesis system is capable to produce the intelligible Sign Speech.

text speech and dialogue | 2006

Czech-Sign speech corpus for semantic based machine translation

Jakub Kanis; J. Zahradil; F. Jurčíček; Luděk Müller

This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of the corpus with Sign Speech translation and we have proposed methods how to do automatic machine translation from Czech to Sign Speech using semantic annotation contained in the corpus.

text speech and dialogue | 2009

Advances in Czech --- Signed Speech Translation

Jakub Kanis; Luděk Müller

This article describes advances in Czech --- Signed Speech translation. A method using a new criterion based on minimal loss principle for log-linear model phrase extraction was introduced and it was evaluated against two another criteria. The performance of phrase table extracted with introduced method was compared with performance of two another phrase tables (manually and automatically extracted). A new criterion for semantic agreement evaluation of translations was introduced too.

International Conference on Interactive Collaborative Robotics | 2018

Improvements in 3D Hand Pose Estimation Using Synthetic Data

Jakub Kanis; Dmitry Ryumin; Z. Krňoul

The neural networks currently outperform earlier approaches to the hand pose estimation. However, to achieve the superior results a large amount of the appropriate training data is desperately needed. But the acquisition of the real hand pose data is a time and resources consuming process. One of the possible solutions uses the synthetic training data. We introduce a method to generate synthetic depth images of the hand closely matching the real images. We extend the approach of the previous works to the modeling of the depth image data using the 3D scan of the subject’s hand and the hand pose prior given by the real data distribution. We found out that combining them with the real training data can result in a better performance.

text speech and dialogue | 2016

Digging Language Model – Maximum Entropy Phrase Extraction

Jakub Kanis

This work introduces our maximum entropy phrase extraction method for the Czech – English translation task. Two different corpora and language models of the different sizes were used to explore a potential of the maximum entropy phrase extraction method and phrase table content optimization. Additionally, two different maximum entropy estimation criteria were compared with the state of the art phrase extraction method too. In the case of a domain oriented translation, maximum entropy phrase extraction significantly improves translation precision.

international conference on speech and computer | 2016

Toward Sign Language Motion Capture Dataset Building

Z. Krňoul; Pavel Jedlička; Jakub Kanis; M. Železný

The article deals with a recording procedure for motion dataset building mainly for sign language synthesis systems. Data gloves and two types of optical motion capture techniques are considered such as one source of sign language speech data for advanced training of more natural and acceptable body movements of signing avatars. A summary of the state-of-the-art technologies provides an overview of possibilities, and even limiting factors in relation to the sign language recording. The combination of the motion capture technologies overcomes the existing difficulties of such a complex task of recording both manual and non-manual component of the sign language. A result is the recording procedure for simultaneous motion capture of signing subject towards further research yet unexplored phenomenon of sign language production by a human.

Lecture Notes in Computer Science | 2008