Peter Pocta
Multimedia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Pocta.
integrated network management | 2015
Tobias Hoßfeld; Lea Skorin-Kapov; Yoram Haddad; Peter Pocta; Vasilios A. Siris; Andrej Zgank; Hugh Melvin
Over the last decade or so, significant research has focused on defining Quality of Experience (QoE) of Multimedia Systems and identifying the key factors that collectively determine it. Some consensus thus exists as to the role of System Factors, Human Factors and Context Factors. In this paper, the notion of context is broadened to include information gleaned from simultaneous out-of-band channels, such as social network trend analytics, that can be used if interpreted in a timely manner, to help further optimise QoE. A case study involving simulation of HTTP adaptive streaming (HAS) and load balancing in a content distribution network (CDN) in a flash crowd scenario is presented with encouraging results.
The International Conference on Digital Technologies 2013 | 2013
Miroslava Mrvova; Peter Pocta
In this article, an evolutionary algorithm known as Genetic Programming (GP) was used to design a parametric speech quality estimation model. Nowadays, GP is one of the machine learning techniques employed in a quality estimation process. In principle, the set of quality-affecting parameters was used as an input to the designed estimation model based on GP approach in order to estimate a quality of synthesized speech transmitted over IP channel (VoIP environment). The performance results obtained by the designed estimation model have confirmed the good properties of genetic programming, namely good accuracy and generalization ability; this makes it to be perspective approach to a quality estimation of this type of speech in the corresponding environment. The developed model can be helpful for network operators and service providers implementing it in planning phase or early-development stage of telecommunication services based on synthesized speech.
quality of multimedia experience | 2013
Andrew Hines; Peter Pocta; Hugh Melvin
This paper undertakes a detailed comparative analysis of both PESQ and VISQOL model behaviour, when tested against speech samples modified through playout delay adjustments. The adjustments are typical (in extent and magnitude) to those introduced by VoIP jitter buffer algorithms. Furthermore, the analysis examines the impact of adjustment location as well as speaker factors on MOS scores predicted by both models and seeks to determine if both models are able to correctly predict the impact on quality perceived by the end user from earlier subjective tests. The earlier results showed speaker voice preference and potentially wideband experience dominating subjective tests more than playout delay adjustment duration or location. By design, PESQ and VISQOL do not qualify speaker voice difference reducing their correlation with the subjective tests. In addition, it was found that PESQ scores are impacted by playout delay adjustments and thus the impact of playout delay adjustments on a quality perceived by the end user is not well modelled. On the other hand, VISQOL model is better in predicting an impact of playout delay adjustments on a quality perceived by the user but there are still some discrepancies in the predicted scores. The reasons for those discrepancies are particularly analysed and discussed.
international conference radioelektronika | 2010
Peter Pocta; Matúš Bilšák; Jana Rouseková
This paper deals with measurements of the impact of fragmentation threshold tuning on speech quality and background traffic throughput in mixed voice/data transmission in an environment of WLANs (IEEE 802.11b). The ITU-T G.729AB encoding scheme is deployed in this study and the Distributed Internet Traffic Generator (D-ITG) is used for the purpose of the background traffic generation. The primary goal of generated background traffic is to affect the speech transmission by changing of VoIP connection network performance parameters such as jitter (delay variation), and packet loss. In general, those parameters have a significant impact on overall speech quality perceived by user. The speech quality and performance of background traffic are assessed by means of the accomplished PESQ algorithm and Wireshark network analyzer, respectively. This experiment shows that fragmentation threshold tuning can significantly decline speech quality and performance of background traffic.
international conference radioelektronika | 2016
Jozef Polacky; Peter Pocta; Roman Jarina
The paper considers an influence of packet loss on a remote speaker verification in Voice over IP (VoIP) environment. A lossy speech coding and packet loss represent a significant part of speech degradation in the VoIP environment. As an extent of packet loss impact is tightly related to a type of speech coder used to transmit speech data, different transmission conditions along with different speech codecs are investigated here. The speaker verification system used in this experimental study is based on a probabilistic GMM-UBM approach. In this paper, a speaker verification accuracy is evaluated against a level of packet loss in narrowband and wideband communication channel.
5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016) | 2016
Mohannad Alahmadi; Yusuf Cinar; Hugh Melvin; Peter Pocta
Real-time communication (RTC) applications like VoIP ideally require networks that support the necessary quality of service (QoS) whereas the reality is that network impairments such as latency, jitter and packet loss exist. In order to cope with jitter and delay, some VoIP applications employ time-scale modification or warping in the jitter buffer that adjusts the rate of playout while controlling the pitch to minimize Mouth-to-Ear (M2E) delay whilst preserving speech intelligibility and quality. In this paper, we firstly investigate the extent to which timescaling occurs using WebRTC [1] VoIP clients over Wi-Fi networks with different levels of congestion. We then assess the impact of such time-scaling, both subjectively via expert listening test and objectively using POLQA, on quality experienced by the end user, and review the correlation between
2016 ELEKTRO | 2016
Jozef Polacky; Peter Pocta; Roman Jarina
An automatic verification of persons identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the verification system can be influenced by various distortions that can occur when transmitting a speech signal through a communication channel. This paper studies an effect of the state of art wideband (WB) speech codecs on a performance of automatic speaker verification in the context of a channel/codec mismatch between enrollment and test utterances. The speaker verification system is developed on GMM-UBM method. The results show that EVS codec provides the best performance over all the investigated scenarios in this study. Moreover, deploying G.729.1 codec in a training process of the verification system provides the best equal error rate in the fully-codec mismatched scenario. Anyhow, differences between the equal error rates reported for all of the codecs involved in this scenario are mostly nonsignificant.
Digital Technologies (DT), 2014 10th International Conference on | 2014
Jozef Polacky; Peter Pocta
This paper deals with an analysis of internal parameters of the P.563 non-intrusive quality prediction model forming an overall quality prediction of this model in the context of an impact of natural and synthesized speech degraded by packet loss (independent and dependent losses) and speech coding (ITU-T G.711 codec, ITU-T G.729AB codec and iLBC codec). A main aim of this paper is to identify dominant internal parameters of the P.563 model for all the investigated codecs and clp parameters by conducting two-way analysis of variance (ANOVA) tests on all internal parameters of the P.563 model. All the identified dominant internal parameters will be further used in an investigation of non-monotonic behavior of the P.563 model predictions in this context, reported for ITU-T G.729AB codec in [6].
Computer Standards & Interfaces | 2013
Peter Pocta; Jan Holub
This paper deals with the investigation of PESQs (Perceptual Evaluation of Speech Quality; also known as ITU-T Recommendation P.862) behavior under independent and dependent loss conditions from a speech activity parameter perspective. The results show that an increase in amount of speech in the reference signal (expressed by the activity parameter) may result in an increase of the PESQ sensitivity to packet loss change as well as PESQs prediction accuracy improvement. On the other hand, it seems that human brain is a bit less sensitive to loss of some parts of words than PESQ. The reasons for those findings are particularly discussed.
Acta Acustica United With Acustica | 2009
Peter Pocta; Jan Holub; Miroslava Mrvova
In this work, we experimentally study how behaviour of the PESQ predictions varies with reference signal characteristic. In particular we investigate the impact of different Active-Speech-Ratios on speech quality prediction in simulated VoIP environment from objective and subjective testing point of view. This reference signal characteristic is defined very broadly by ITU-T Recommendation P.862.3. That is the reason to investigate an impact of this characteristic on speech quality prediction more in-depth. We assess the variability of PESQs predictions with respect to Active-Speech-Ratio and network conditions, as well as their accuracy, by comparing the predictions with subjective assessments.