Jan Zelinka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan Zelinka is active.

Explore More

Publication

Featured researches published by Jan Zelinka.

text speech and dialogue | 2010

Adaptation of a feedforward artificial neural network using a linear transform

Jan Trmal; Jan Zelinka; Luděk Müller

In this paper we present a novel method for adaptation of a multi-layer perceptron neural network (MLP ANN). Nowadays, the adaptation of the ANN is usually done as an incremental retraining either of a subset or the complete set of the ANN parameters. However, since sometimes the amount of the adaptation data is quite small, there is a fundamental drawback of such approach - during retraining, the network parameters can be easily overfitted to the new data. There certainly are techniques that can help overcome this problem (early-stopping, cross-validation), however application of such techniques leads to more complex and possibly more data hungry training procedure. The proposed method approaches the problem from a different perspective. We use the fact that in many cases we have an additional knowledge about the problem. Such additional knowledge can be used to limit the dimensionality of the adaptation problem. We applied the proposed method on speaker adaptation of a phoneme recognizer based on TRAPS (Temporal Patterns) parameters. We exploited the fact that the employed TRAPS parameters are constructed using log-outputs of mel-filter bank and by virtue of reformulating the first layer weight matrix adaptation problem as a mel-filter bank output adaptation problem, we were able to significantly limit the number of free variables. Adaptation using the proposed method resulted in a substantial improvement of phoneme recognizer accuracy.

International Conference on Statistical Language and Speech Processing | 2017

A Regularization Post Layer: An Additional Way How to Make Deep Neural Networks Robust

Jan Vaněk; Jan Zelinka; Daniel Soutner; Josef Psutka

Neural Networks (NNs) are prone to overfitting. Especially, the Deep Neural Networks in the cases where the training data are not abundant. There are several techniques which allow us to prevent the overfitting, e.g., L1/L2 regularization, unsupervised pre-training, early training stopping, dropout, bootstrapping or cross-validation models aggregation. In this paper, we proposed a regularization post-layer that may be combined with prior techniques, and it brings additional robustness to the NN. We trained the regularization post-layer in the cross-validation (CV) aggregation scenario: we used the CV held-out folds to train an additional neural network post-layer that boosts the network robustness. We have tested various post-layer topologies and compared results with other regularization techniques. As a benchmark task, we have selected the TIMIT phone recognition which is a well-known and still favorite task where the training data are limited, and the used regularization techniques play a key role. However, the regularization post-layer is a general method, and it may be employed in any classification task.

SLSP 2015 Proceedings of the Third International Conference on Statistical Language and Speech Processing - Volume 9449 | 2015

Neural-Network-Based Spectrum Processing for Speech Recognition and Speaker Verification

Jan Zelinka; Jan Vanĕk; Ludĕk Müller

In this paper, neural networks are applied as ai?źfeature extractors for ai?źspeech recognition system and ai?źspeaker verification system. Ai?źlong-temporal features with delta coefficients, mean and variance normalization are applied when ai?źneural-network-based feature extraction is trained together with ai?źneural-network-based voice activity detector and with ai?źneural-network-based acoustic model for speech recognition. In speaker verification, thei?źacoustic model is replaced with ai?źscore computation. Thei?źperformance of our speech recognition system was evaluated on thei?źBritish English speech corpus WSJCAM0 and thei?źperformance of our speech verification system was evaluated on our Czech speech corpus.

international conference on speech and computer | 2014

Convolutional Neural Network for Refinement of Speaker Adaptation Transformation

Zbyněk Zajíc; Jan Zelinka; Jan Vaněk; Luděk Müller

The aim of this work is to propose a refinement of the shift-MLLR (shift Maximum Likelihood Linear Regression) adaptation of an acoustics model in the case of limited amount of adaptation data, which can lead to ill-conditioned transformations matrices. We try to suppress the influence of badly estimated transformation parameters utilizing the Artificial Neural Network (ANN), especially Convolutional Neural Network (CNN) with bottleneck layer on the end. The badly estimated shift-MLLR transformation is propagated through an ANN (suitably trained beforehand), and the output of the net is used as the new refined transformation. To train the ANN the well and the badly conditioned shift-MLLR transformations are used as outputs and inputs of ANN, respectively.

international conference on speech and computer | 2014

On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

Daniel Soutner; Jan Zelinka; Luděk Müller

In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.

text, speech and dialogue | 2018

Deep Learning and Online Speech Activity Detection for Czech Radio Broadcasting

Jan Zelinka

In this paper, enhancements of online speech activity detection (SAD) is presented. Our proposed approach combines standard signal processing methods and modern deep-learning methods which allows simultaneous training of the detector’s parts that are usually trained or designed separately. In our SAD, an NN-based early score computation system, an NN-based score smoothing system and proposed online decoding system were incorporated in a training process. Besides the CNN and DNN, spectral flux and spectral variance features are also investigated. The proposed approach was tested on a Czech Radio broadcasting corpus. The corpus was used for investigation supervised and also semi-supervised machine learning.

International Conference on Statistical Language and Speech Processing | 2018

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Jan Vaněk; Josef Michalek; Jan Zelinka; Josef Psutka

Recently, recurrent neural networks have become state-of-the-art in acoustic modeling for automatic speech recognition. The long short-term memory (LSTM) units are the most popular ones. However, alternative units like gated recurrent unit (GRU) and its modifications outperformed LSTM in some publications. In this paper, we compared five neural network (NN) architectures with various adaptation and feature normalization techniques. We have evaluated feature-space maximum likelihood linear regression, five variants of i-vector adaptation and two variants of cepstral mean normalization. The most adaptation and normalization techniques were developed for feed-forward NNs and, according to results in this paper, not all of them worked also with RNNs. For experiments, we have chosen a well known and available TIMIT phone recognition task. The phone recognition is much more sensitive to the quality of AM than large vocabulary task with a complex language model. Also, we published the open-source scripts to easily replicate the results and to help continue the development.

international conference on speech and computer | 2017

Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech.

Zbyněk Zajíc; Jan Zelinka; Luděk Müller

In this paper, we have been investigating an approach to a speaker representation for a diarization system that clusters short telephone conversation segments (produced by the same speaker). The proposed approach applies a neural-network-based descriptor that replaces a usual i-vector descriptor in the state-of-the-art diarization systems. The comparison of these two techniques was done on the English part of the CallHome corpus. The final results indicate the superiority of the i-vector’s approach although our proposed descriptor brings an additive information. Thus, the combined descriptor represents a speaker in a segment for diarization purpose with lower diarization error (almost 20% relative improvement compared with only i-vector application).

text speech and dialogue | 2015

Simultaneously Trained NN-Based Acoustic Model and NN-Based Feature Extractor

Jan Zelinka; Jan Vanĕk; Ludĕk Müller

This paper demonstrates how standard feature extraction methods such as PLP can be successfully replaced by ai¾?neural network and methods such as mean normalization, variance normalization and delta coefficients can be simultaneously utilized in ai¾?neural-network-based acoustic model. Our experiments show that this replacement is significantly beneficial. Moreover, in our experiments, also ai¾?neural-network-based voice activity detector was employed and trained simultaneously with ai¾?neural-network-based feature extraction and ai¾?neural-network-based acoustic model. The system performance was evaluated on the British English speech corpus WSJCAM0.

international conference on speech and computer | 2015

On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum

Jan Zelinka; Petr Salajka; Luděk Müller

This paper demonstrates how usual feature extraction methods such as the PLP can be successfully replaced by a neural network and how signal processing methods such as mean normalization, variance normalization and delta coefficients can be successfully utilized when a NN-based feature extraction and a NN-based acoustic model are used simultaneously. The importance of the deep NNs is also investigated. The system performance was evaluated on the British English speech corpus WSJCAM0.

Explore More