Svatava Škodová | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Svatava Škodová is active.

Explore More

Publication

Featured researches published by Svatava Škodová.

international conference on telecommunications | 2012

Post-processing of the recognized speech for web presentation of large audio archive

Marek Bohac; Karel Blavka; Michaela Kucharova; Svatava Škodová

This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings. The post-processing is adapted for the needs of the web presentation of the archive. Up to now it has been used to process about 60,000 audio documents. We present the overall structure of the system as well as its core modules - speech recognition engine, speaker diarization module and final text processing. Special attention is paid to the punctuation issue. The punctuation accuracy is evaluated and compared to human use. In the final part of the paper we propose further improvements and ideas for the future research.

text speech and dialogue | 2013

On the Quantitative and Qualitative Speech Changes of the Czech Radio Broadcasts News within Years 1969–2005

Michaela Kuchařová; Svatava Škodová; Ladislav Seps; Václav Lábus; Jan Nouza; Marek Bohac

In this paper we introduce the quantitative and qualitative characteristics of the Czech Radio Broadcasts News during a period of significant political and social changes in the Czech Republic (1969 - 2005). The research is mainly focused on the quantitative features of speech that can be determined from the results of automatic speech recognition system. We describe the used archive transcription system and selected characteristics of the macro- and micro- structure of the Radio Broadcasts News; namely the changes in studio vs. out-of-studio speech ratio, distribution of speakers by male and female, moderators and guest-speakers, changes in the use of signature tunes (including jingles), approximate use of phrasal introductory and closing language specific for the time periods, speech speed changes, average silence length, coordinative vs. subordinate conjunctions ratio and the most frequent semantic words. The sample of data consists of 6,580 hours of news broadcasting and 48,721,952 lexical words.

text speech and dialogue | 2014

Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News

Michaela Kuchařová; Svatava Škodová; Ladislav Seps; Marek Bohac

In this paper we introduce a methodology leading to the extension of speakers’ database used in the process of automatic transcription of spoken documents stored in the largest Czech Radio audio archive. We address the issue of the conversion of spoken speech to written texts – the automatic detection of speakers and their names. We work with a subset of the archive that consists of 8,020 hours of broadcasting news and 58,914,179 words within the years 1968–2011. We observed the occurrence of thousands of speakers’ names during the period and therefore it is necessary to use their automatic or semi-automatic identification. Another investigated issue leading to the extension of speakers’ database is the co-occurrence of a speaker’s name in a specific phrase in the text transcription linked with the speaker’s change in the audio recording.

text speech and dialogue | 2012

Discretion of Speech Units for the Text Post-processing Phase of Automatic Transcription (in the Czech Language)

Svatava Škodová; Michaela Kuchařová; Ladislav Seps

In this paper we introduce an experiment leading to the improvement of the text post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive of oral documents. This archive contains the largest collection of spoken documents recorded during the last 90 years. The underlying aim of the project introduced in the paper is to transcribe a part of the audio archive and store the transcription in the database, in which it will be possible to search for, and retrieve information. The value of the search is that one can find the information on the two linguistic levels: in the written form and the spoken form. This doubled information-storage is important especially for the comfortable retrieval of information and it diametrically extends the possibilities of work with the information. One of the important issues of the conversion of spoken speech to written texts is the automatic delimitation of speech units and sentences/clauses in the final text processing, which is connected with the punctuation important for convenient perception of the rewritten texts. For this reason we decided to test Czech native speakers’ perception of speech and their need of punctuation in the rewritten texts. We compared their results with the punctuation added by an automaton. The results should serve to train a program for automatic discretion of speech units and the correct supplying of punctuation. For the experiment we prepared a sample of texts spoken by typologically various speakers (the amount of speech was 30, minutes; 5,247 words), these automatically rewritten texts were given to 59 respondents whose task was to supply punctuation to the automatically rewritten texts. We used two special tools to run this experiment; NanoTrans – this tool was used by respondents for supplying the punctuation. The other tool for viewing and comparing the respondents’ and machine performance, especially written for the probe, was Transcription Viewer. In the text we give detailed information about these comparisons. In the final part of the paper we propose further improvements and ideas for future research.

Bohemistyka | 2018

Co lze „vyčíst z ruky” aneb somatické frazémy v proměnách času

Jasňa Pacovská; Svatava Škodová; Václav Lábus

Archive | 2017

CzeSL Grammatical Error Correction Dataset (CzeSL-GEC)

Karel Šebesta; Zuzanna Bedřichová; Kateřina Šormová; Barbora Štindlová; Milan Hrdlička; Tereza Hrdličková; Jiří Hana; Vladimír Petkevič; Tomáš Jelínek; Svatava Škodová; Petr Janeš; Kateřina Lundáková; Hana Skoumalová; Šimon Sládek; Piotr Pierscieniak; Dagmar Toufarová; Milan Straka; Alexandr Rosen; Jakub Náplava; Marie Poláčková

Archive | 2015

Preliminária k moderní mluvnici češtiny

Oldřich Uličný; Ondřej Bláha; Jasňa Pacovská; Eva Hájková; Soňa Schneiderová; Denisa Bordag; Robert Dittmann; Ivo Matinec; Helena Stranjik; Marie Bořkovcová; Alena Macurová; Ivo Vasiljev; Patrik Mitter; Vladimír Petkevič; Zdena Palková; Barbora Štěpánková; Petr Kaderka; Martin Prošek; Martin Beneš; Martina Smejkalová; Markéta Ziková; Jan Křivan; Miloslav Vondráček; Václav Lábus; Svatava Škodová

Studie z aplikované lingvistiky | 2014

Mluvené slovo v pořadech Českého rozhlasu

Svatava Škodová; Michaela Kuchařová

Archive | 2014

AKCES 5 (CzeSL-SGT) Release 2

Karel Šebesta; Zuzanna Bedřichová; Kateřina Šormová; Barbora Štindlová; Milan Hrdlička; Tereza Hrdličková; Jiří Hana; Vladimír Petkevič; Tomáš Jelínek; Svatava Škodová; Marie Poláčková; Petr Janeš; Kateřina Lundáková; Hana Skoumalová; Šimon Sládek; Piotr Pierscieniak; Dagmar Toufarová; Michal Richter; Milan Straka; Alexandr Rosen

Archive | 2014