Artur Janicki
Warsaw University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Artur Janicki.
text speech and dialogue | 2011
Artur Janicki; Tomasz Staroszczyk
We proposed to use support vector machines (SVMs) to recognize speakers from signal transcoded with different speech codecs. Experiments with SVM-based text-independent speaker classification using a linear GMM supervector kernel were presented for six different codecs and uncoded speech. Both matched (the same codec for creating speaker models and for testing) and mismatched conditions were investigated. SVMs proved to provide high accuracy of speaker recognition, however requiring higher number of Gaussian mixtures than in the baseline GMM-UBM system. In mismatched conditions the Speex codec was shown to perform best for creating robust speaker models.
International Journal of Applied Mathematics and Computer Science | 2013
Jan Rybka; Artur Janicki
Abstract This paper describes a study of emotion recognition based on speech analysis. The introduction to the theory contains a review of emotion inventories used in various studies of emotion recognition as well as the speech corpora applied, methods of speech parametrization, and the most commonly employed classification algorithms. In the current study the EMO-DB speech corpus and three selected classifiers, the k-Nearest Neighbor (k-NN), the Artificial Neural Network (ANN) and Support Vector Machines (SVMs), were used in experiments. SVMs turned out to provide the best classification accuracy of 75.44% in the speaker dependent mode, that is, when speech samples from the same speaker were included in the training corpus. Various speaker dependent and speaker independent configurations were analyzed and compared. Emotion recognition in speaker dependent conditions usually yielded higher accuracy results than a similar but speaker independent configuration. The improvement was especially well observed if the base recognition ratio of a given speaker was low. Happiness and anger, as well as boredom and neutrality, proved to be the pairs of emotions most often confused.
Perceptual and Motor Skills | 2012
Zuzanna Górska; Artur Janicki
This study investigated whether it is possible to train a machine to discriminate levels of extraversion based on handwriting variables. Support vector machines (SVMs) were used as a learning algorithm. Handwriting of 883 people (404 men, 479 women) was examined. Extraversion was measured using the Polish version of the NEO-Five Factor Inventory. The handwriting samples were described by 48 variables. The support vector machines were separately trained and tested for each sex, using 10-fold cross-validation. Good recognition accuracy (around .7) was achieved for 10 handwriting variables, different for men and women. The results suggest the existence of a relationship between handwriting elements and extraversion.
Security and Communication Networks | 2016
Artur Janicki
This paper presents an improved version of a steganographic algorithm for IP telephony called HideF0. It is based on approximating the F0 parameter, which is responsible for conveying information about the pitch of the speech signal. The bits saved due to simplification of the pitch contour are used for the hidden transmission. In our experiments, the proposed method was applied to the narrowband Speex codec working in five different modes, with bitrates between 5,950i?źbps and 24,600i?źbps. We showed that HideF0 was able to create hidden channels with steganographic bandwidths of around 200i?źbps at the expense of a steganographic cost of between 0.5 and 0.7 MOS, depending on the Speex mode. Because of placing the approximation flag in the voice packet header, the improved version of the proposed algorithm yielded a significantly lower decrease in speech quality, when compared with the original version of HideF0. In addition, for low bitrates of the hidden channel i.e., below ca. 50i?źbps it was able to operate without introducing any steganographic cost. Copyright
text speech and dialogue | 2012
Artur Janicki
This paper investigates the impact of non-speech sounds on the performance of speaker recognition. Various experiments were conducted to check what the accuracy of speaker classification would be if non-speech sounds, such as breaths, were removed from the training and/or testing speech. Experiments were run using the GMM-UBM algorithm and speech taken from the TIMIT speech corpus, either original or transcoded using the G.711 or GSM 06.10 codecs. The results show a remarkable contribution of non-speech sounds to the overall speaker recognition performance.
international symposium on communications, control and signal processing | 2008
Artur Janicki; Piotr Meus; Maciej Topczewski
The paper presents how to take advantage of pronunciation variation when constructing a speech synthesis system for Polish, so that even a small speech corpus can be sufficient to produce intelligible and good quality speech. The system uses a unit selection algorithm based directly on linguistic features of the input text, without using a prosody model. Proposed target and concatenation cost functions are described. Results of intelligibility tests and cost comparison between canonical and best selected pronunciation variant are presented. They show that for a small corpus using pronunciation variation modeling can increase effectiveness of unit selection.
Security and Communication Networks | 2016
Artur Janicki; Federico Alegre; Nicholas W. D. Evans
This paper analyses the threat of replay spoofing or presentation attacks in the context of automatic speaker verification. As relatively high-technology attacks, speech synthesis and voice conversion, which have thus far received far greater attention in the literature, are probably beyond the means of the average fraudster. The implementation of replay attacks, in contrast, requires no specific expertise nor sophisticated equipment. Replay attacks are thus likely to be the most prolific in practice, while their impact is relatively under-researched. The work presented here aims to compare at a high level the threat of replay attacks with those of speech synthesis and voice conversion. The comparison is performed using strictly controlled protocols and with six different automatic speaker verification systems including a state-of-the-art iVector/probabilistic linear discriminant analysis system. Experiments show that low-effort replay attacks present at least a comparable threat to speech synthesis and voice conversion. The paper also describes and assesses two replay attack countermeasures. A relatively new approach based on the local binary pattern analysis of speech spectrograms is shown to outperform a competing approach based on the detection of far-field recordings. Copyright
Journal of Homeland Security and Emergency Management | 2014
Artur Janicki; Wojciech Mazurczyk; Krzysztof Szczypiorski
Abstract Nowadays cyber criminalists’ interest in incorporation of steganography into armory of rogue hackers is on the rise and information hiding techniques are becoming the new black among Black Hats. In this paper we focus on analyzing the efficiency of the recently proposed IP telephony information hiding method called transcoding steganography (TranSteg) that enables hidden communication with a high steganographic bandwidth while retaining good voice quality. Specifically, we focus on analyzing which voice codecs would be the most favourable for TranSteg to minimize the negative influence on voice quality while maximizing the obtained steganographic bandwidth and limiting the risk of detection.
Multimedia Tools and Applications | 2017
Artur Janicki
This article addresses the problem of anti-spoofing protection in an automatic speaker verification (ASV) system. An improved version of a previously proposed spoofing countermeasure is presented. The presented method is based on the analysis of linear prediction error that results from both short- and long-term prediction of the input speech signal. It was observed that non-natural speech signals, i.e., synthetic or converted speech, were predicted in a different way than genuine speech. Therefore, in contrast to the classical linear prediction analysis, where usually only the prediction coefficients are analyzed, in the proposed approach the residual (error) signals were examined. During this analysis, 23 various prediction parameters were extracted, such as the energy of the prediction error, prediction gains and temporal parameters related to the prediction error signals. Various binary classifiers were researched to separate human and spoof classes, however the support vector machines with radial basis function (SVM-RBF) yielded the best results. When tested on the corpora provided for the ASVspoof 2015 Challenge, the proposed countermeasure returned better results than the previous version of the algorithm and, in most of the cases, the baseline spoofing detector based on the local binary patterns (LBP). It is hoped that the proposed method can be part of a generalized spoofing countermeasure helping to increase security of ASV systems.
Security and Communication Networks | 2016
Wojciech Mazurczyk; Krzysztof Szczypiorski; Artur Janicki; Hui Tian
As the production, storage, and exchange of information become more extensive and important in the functioning of societies, the problem of protecting the information from unintended and undesired usage becomes more complex. In modern societies, protection of information involves many interdependent technological and policy issues related to information confidentiality, integrity, anonymity, authenticity, utility, etc. Information hiding techniques are receiving much attention today. Digital audio, video, and images are increasingly furnished with distinguishing but imperceptible marks, which may contain a hidden copyright notice or serial number or even help to prevent unauthorized copying directly. Digital watermarking and steganography may protect information, conceal secrets, or are used as core primitives in digital rights’ management schemes. Alongside the previously mentioned types of digital media steganography, currently, the target of increased interest is network steganography—a part of information hiding focused on modern networks. It is a method of hiding secret data in users’ normal data transmissions. Steganographic techniques arise and evolve with the development of network protocols and mechanisms and are expected to be used in secret communication or information sharing. Presently, it becomes a hot topic because of the proliferation of information networks and multimedia services in networks and social networks. The purpose of establishing applications of Information Hiding may be varied—possible uses can fall into the category of legal actions or illicit activity. Frequently, the illegal aspect is accentuated—starting from the criminal communication, through information leakage from protected systems, cyber weapon exchange, up to industrial espionage. Recently discovered malware like Hammertoss or Stegoloader utilize various information hiding techniques for botnets purposes to enable covert communication for the C and C (Command and Control) channel. This makes detection of such malware even more difficult, and it poses a serious challenge also to investigators. On the other side of the spectrum lies legitimate uses, which include circumvention of web censorship and surveillance, computer forensics (tracing and identification), and copyright protection (e.g., watermarking images). In this special issue, we are delighted to present a selection of nine papers, which, in our opinion, will contribute to the enhancement of knowledge in information hiding. The collection of high-quality research papers provides a