Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dawid Skurzok is active.

Publication


Featured researches published by Dawid Skurzok.


multimedia and ubiquitous engineering | 2010

Polish N-Grams and Their Correction Process

Bartosz Ziółko; Dawid Skurzok; Małgorzata Michalska

Word n-gram statistics collected from over 1 300 000 000 words are presented. Eventhough they were collected from various good sources, they contain several types of errors. The paper focuses on the process of partly supervised correction of the n- grams. Types of errors are described as well as our software allowing efficient and fast corrections.


Archive | 2011

N-Grams Model for Polish

Bartosz Ziółko; Dawid Skurzok

N-grams are very popular in automatic speech recognition (ASR) systems (Young et al., 2005), (Lamere et al., 2004), (Whittaker & Woodland, 2003), (Hirsimaki et al., 2009). They have been found as the most effective models for several languages. N-grams calculated by us will be used for the language model of a large vocabulary Polish ASR system and other outside application, first of them being SnapKeys virtual keyboard. Our earlier results and process of collecting statistics were described already (Ziolko, Skurzok & Ziolko, 2010). In this chapter we want to describe a complete model and its applications. Creating a large vocbulary model of Polish is a difficult task because there are fewer Polish text corpora then for English. What is more, Polish is very inflected in contrast to English. The rich morphology causes difficulties in training language models due to data sparsity. Much more text data must be used for inflected languages than for positional ones to achieve the model of the same efficiency (Whittaker & Woodland, 2003).


Artificial Intelligence and Applications | 2010

Word N -Grams for Polish

Bartosz Ziółko; Dawid Skurzok; Mariusz Ziółko

The large collection of word n-gram statistics for Polish is described. Some details of the text analysis algorithm supporting processing data on computer clusters is presented as well. The corpora of total size of 267 030 267 words were used. The encountered problems due to the special Polish characters are described as well as the impact of rich morphology in Polish on this type of statistics. The most common n-grams are presented and commented. This is the first publication of such statistics of Polish.


Pacific Voice Conference (PVC), 2014 XXII Annual | 2014

Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR

Bartosz Ziółko; Piotr Żelasko; Dawid Skurzok

Recognition of continuous speech is one of the major challenges in automatic speech recognition (ASR), especially in phonetically complex languages (i.e. Polish). To improve ASR of the Polish language, we obtained phoneme statistics to locate diphones and triphones within the running speech sequences. We found that these clusters occur more likely between the words boundaries rather than within the word boundaries. Our research identified the most frequently appearing diphones and triphones in the natural speech corpus (Corpora) and we normalized these data for the Polish language at large. The results can be used in the various ASR application systems, i.e. by the speech recognizer module to enhance word boundaries recognitions, or to recognize non-dictionary words embedded in a natural sentence, (e.g. proper names).


international conference on audio, language and image processing | 2012

Confidence measure by substring comparison for automatic speech recognition

Bartosz Ziółko; Tomasz Jadczyk; Dawid Skurzok; Mariusz Ziółko

Two possible confidence measures for automatic speech recognition are presented along with results of tests where they were applied. One of them is widely known and it is based on comparing the strongest hypotheses with an average of a few next hypotheses. We found it not efficient in all cases, this is why we came up with our own method based on comparison of substrings. New algorithm was found useful in real applications for spoken dialogue system, in a module asking to repeat a phrase or declaring that it was not recognised. The method was designed for Polish language, which is morphologically rich. The method is tuned to situations in which there are several similar utterances in a dictionary.


international conference on audio, language and image processing | 2010

Speech modelling using phoneme segmentation and modified weighted levenshtein distance

Bartosz Ziółko; Jakub Gałka; Dawid Skurzok

A method of choosing a word hypothesis from a dictionary of a speech recognition system is presented. The method applies a modified weighted Levenshtein distance for better accuracy. The distance is counted between phonetic transcriptions of a string of phonemes received from a classifier and of a dictionary. It allows efficient conducting of speech classifying task.


multimedia and ubiquitous engineering | 2013

Edit Distance Comparison Confidence Measure for Speech Recognition

Dawid Skurzok; Bartosz Ziółko

A new possible confidence measure for automatic speech recognition is presented along with results of tests where they were applied. A classical method based on comparing the strongest hypotheses with an average of a few next hypotheses was used as a ground truth. Details of our own method based on comparison of edit distances are depicted with results of tests. It was found useful for spoken dialogue system as a module asking to repeat a phrase or declaring that it was not recognised. The method was designed for Polish language, which is morphologically rich.


international multi-conference on computing in global information technology | 2010

Speech Modelling Based on Phone Statistics

Bartosz Ziółko; Dawid Skurzok; Jakub Gałka; Mariusz Ziółko

The statistics of Polish phones, biphones and triphones were collected from several corpora. The paper presents summarisation of the data and some statistics phenomena including a distribution of frequency of biphones and triphones occurring. The model applying these statistics in speech recognition is presented as well.


ieee international conference semantic computing | 2010

Automatic Speech Recognition System Based on Wavelet Analysis

Mariusz Ziółko; Jakub Gałka; Bartosz Ziółko; Tomasz Jadczyk; Dawid Skurzok; Jan Wicijowski

We demonstrate an automatic speech recognition system for Polish continuous speech. As most of the progress in the field is done for English, a few layers of our system are different from popular approaches in this field. These elements of our system could be successfully ported to other languages which share some features with Polish: the speech contains a lot of high-frequency phones (fricatives and plosives) and is highly inflective and non-positional.


conference of the international speech communication association | 2011

Automatic Speech Recognition System Dedicated for Polish

Mariusz Ziółko; Jakub Gałka; Bartosz Ziółko; Tomasz Jadczyk; Dawid Skurzok; Mariusz Masior

Collaboration


Dive into the Dawid Skurzok's collaboration.

Top Co-Authors

Avatar

Bartosz Ziółko

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Tomasz Jadczyk

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jakub Gałka

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mariusz Ziółko

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mariusz Mąsior

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ireneusz Gawlik

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Wicijowski

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mariusz Masior

AGH University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Małgorzata Michalska

AGH University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge