Michal Fapso
Brno University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michal Fapso.
spoken language technology workshop | 2008
Igor Szöke; Lukas Burget; Jan Cernocky; Michal Fapso
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.
ACM Transactions on Information Systems | 2012
Javier Tejedor; Michal Fapso; Igor Szöke; Jan Cernocký; Frantisek Grezl
This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors, the third one, based on weighted finite-state transducers (WFST), processes phone lattices. QbE STD is compared to two standard STD systems with text queries: acoustic keyword spotting and WFST-based search of phone strings in phone lattices. The results are reported on four languages (Czech, English, Hungarian, and Levantine Arabic) using standard metrics: equal error rate (EER) and two versions of popular figure-of-merit (FOM). Language-dependent and language-independent cases are investigated; the latter being particularly interesting for scenarios lacking standard resources to train speech recognition systems. While the DTW and GMM/HMM approaches produce the best results for a language-dependent setup depending on the target language, the GMM/HMM approach performs the best dealing with a language-independent setup. As far as WFSTs are concerned, they are promising as they allow for indexing and fast search.
international conference on machine learning | 2007
Igor Szöke; Michal Fapso; Martin Karafiát; Lukas Burget; Frantisek Grezl; Petr Schwarz; Ondřej Glembek; Pavel Matějka; Jiří Kopecký; Jan Cernocký
The paper presents the Brno University of Technology (BUT) system for indexing and search of speech, combining LVCSR and phonetic approach. It brings a complete description of individual building blocks of the system from signal processing, through the recognizers, indexing and search until the normalization of detection scores. It also describes the data used in the first edition of NIST Spoken term detection (STD) evaluation. The results are presented on three US-English conditions - meetings, broadcast news and conversational telephone speech, in terms of detection error trade-off (DET) curves and term-weighted values (TWV) metrics defined by NIST.
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech | 2010
Javier Tejedor; Igor Szöke; Michal Fapso
Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training material is hardly available and word-based speech recognition systems cannot be employed. We present two novel contributions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search system. The second presents a novel feature level example combination to construct a more robust query used during the search. Experiments, tested on with-in language and cross-lingual QbE STD setups, show a significant improvement when the query is selected according to an optimal criterion over when the query is selected randomly for both setups and a significant improvement when several examples are combined to build the input query for the search system compared with the use of the single best example. They also show comparable performance to that of a state-of-the-art acoustic keyword spotting system.
spoken language technology workshop | 2010
Igor Szöke; Jan Cernocky; Michal Fapso; J. Zizka
This paper describes an innovative web-based browser used for video recordings of lectures that is built on speech and image processing technologies. The aim of this project is to simplify the access to information that is spread across video recordings. This is mainly achieved by coupling to the speech search engine and due to a possibility to quickly navigate through an automatically generated list of slides presented. The reader is briefly acquainted with the technological background of the browser; the emphasis is laid on the use of the browser from the user point of view.
spoken language technology workshop | 2010
Igor Szöke; Frantisek Grezl; Jan Cernocky; Michal Fapso; Tomas Cipr
The paper deals with the development of acoustic keyword spotter (KWS) meeting requirements of a real user from the security community. While the basic scheme of the KWS is relatively standard, it uses novel features derived by a hierarchy of neural networks, and score normalization trained to maximize a user-like evaluation metric. The results are reported on a selection of Czech conversational telephone speech (CTS), radio and read data.
international conference radioelektronika | 2007
Jan Cernocky; Lukas Burget; Petr Schwarz; Pavel Matejka; Martin Karafiát; Ondrej Glembek; Jiri Kopecky; Igor Szöke; Michal Fapso; Frantisek Grezl; Valiantsina Hubeika; Ilya Oparin
This paper describes search in speech techniques developed in the Speech@FIT research group at FIT BUT in the last couple of years. It concentrates on spoken term detection (STD) and presents our system for NIST STD 2006 evaluations in detail. It also briefly mentions our systems for speaker and language recognition.
international conference on computational linguistics | 2006
Michal Fapso; Pavel Smrž; Petr Schwarz; Igor Szöke; Milan Schwarz; Jan Cernocký; Martin Karafiát; Lukas Burget
This paper describes a designed and implemented system for efficient storage, indexing and search in collections of spoken documents that takes advantage of automatic speech recognition. As the quality of current speech recognizers is not sufficient for a great deal of applications, it is necessary to index the ambiguous output of the recognition, i. e. the acyclic graphs of word hypotheses — recognition lattices. Then, it is not possible to directly apply the standard methods known from text-based systems. The paper discusses an optimized indexing system for efficient search in the complex and large data structure that has been developed by our group. The search engine works as a server. The meeting browser JFerret, developed withing the European AMI project, is used as a client to browse search results.
MLMI | 2005
Igor Szöke; Petr Schwarz; Pavel Matejka; Lukas Burget; Martin Karafiát; Michal Fapso; Jan Cernocky
conference of the international speech communication association | 2005
Igor Szöke; Petr Schwarz; Pavel Matejka; Lukas Burget; Martin Karafiát; Michal Fapso; Jan Cernocký