Janez Žibert
University of Ljubljana
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Janez Žibert.
International Journal of Speech Technology | 2003
Jerneja Gros; Simon Dobrisek; Janez Žibert; Nikola Pavešić; x
This paper presents the Slovene-language spoken resources that were acquired at the Laboratory of Artificial Perception, Systems and Cybernetics (LUKS) at the Faculty of Electrical Engineering, University of Ljubljana over the past ten years. The resources consist of:• isolated-spoken-word corpora designed for phonetic research of the Slovene spoken language;• read-speech corpora from dialogues relating to air flight information;• isolated-word corpora, designed for studying the Slovene spoken diphthongs;• Slovene diphone corpora used for text-to-speech synthesis systems;• a weather forecast speech database, as an attempt to capture radio and television broadcast news in the Slovene language; and• read- and spontaneous-speech corpora used to study the effects of the psycho physical conditions of the speakers on their speech characteristics.All the resources are accompanied by relevant text transcriptions, lexicons and various segmentation labels. The read-speech corpora relating to the air flight information domain also are annotated prosodically and semantically. The words in the orthographic transcription were automatically tagged for their lemma and morphosyntactic description. Many of the mentioned speech resources are freely available for basic research purposes in speech technology and linguistics. In this paper we describe all the resources in more detail and give a brief description of their use in the spoken language technology products developed at LUKS.
EURASIP Journal on Advances in Signal Processing | 2006
Janez Žibert; Nikola Pavesic
This work assesses different approaches for speech and non-speech segmentation of audio data and proposes a new, high-level representation of audio signals based on phoneme recognition features suitable for speech/non-speech discrimination tasks. Unlike previous model-based approaches, where speech and non-speech classes were usually modeled by several models, we develop a representation where just one model per class is used in the segmentation process. For this purpose, four measures based on consonant-vowel pairs obtained from different phoneme speech recognizers are introduced and applied in two different segmentation-classification frameworks. The segmentation systems were evaluated on different broadcast news databases. The evaluation results indicate that the proposed phoneme recognition features are better than the standard mel-frequency cepstral coefficients and posterior probability-based features (entropy and dynamism). The proposed features proved to be more robust and less sensitive to different training and unforeseen conditions. Additional experiments with fusion models based on cepstral and the proposed phoneme recognition features produced the highest scores overall, which indicates that the most suitable method for speech/non-speech segmentation is a combination of low-level acoustic features and high-level recognition features.
text speech and dialogue | 2006
Janez Žibert
We introduce new method for discriminating speech and non-speech segments in audio signals based on the transcriptions produced by phoneme recognizers Four measures based on consonant-vowels and voiced-unvoiced pairs obtained from different phonemes speech recognizers were proposed They were constructed in a way to be recognizer and language independent and could be applied in different segmentation-classification frameworks The segmentation systems were evaluated on different broadcast news datasets consisted of more than 60 hours of multilingual BN shows The results of these evaluations illustrate the robustness of the proposed features in comparison to MFCC and posterior probability based features The overall frame accuracies of the proposed approaches varied in range from 95% to 98% and remained stable through different test conditions and different phoneme recognizers.
Artificial Organs | 2017
Kalisnik Jm; Eva Hrovat; Alenka Hrastovec; Janez Žibert; Aleš Jerin; Milan Skitek; Giuseppe Santarpino; Tomislav Klokocovnik
Acute kidney injury (AKI) represents frequent complication after cardiac surgery using cardiopulmonary bypass (CPB). In the hope to enhance earlier more reliable characterization of AKI, we tested the utility of neutrophil gelatinase-associated lipocalin (NGAL) and cystatin C (CysC) in addition to standard creatinine for early determination of AKI after cardiac surgery using CPB. Forty-one patients met the inclusion criteria. Arterial blood samples collected after induction of general anesthesia were used as baseline, further sampling occurred at CPB termination, 2 h after CPB, on the first and second day after surgery. According to AKIN classification 18 patients (44%) developed AKI (AKI1-2 groups) and 23 (56%) did not (non-AKI group). Groups were similar regarding demographics and operative characteristics. CysC levels differed already preoperatively (non-AKI vs. AKI2; P = 0.045; AKI1 vs. AKI2; P = 0.011), while postoperatively AKI2 group differed on the first day and AKI1 on the second regarding non-AKI group (P = 0.004; P = 0.021, respectively). NGAL and creatinine showed significant difference already 2 h after CPB between groups AKI2 and non-AKI and later on the first postoperative day between groups AKI1 and AKI2 (P = 0.028; P = 0.014, respectively). This study shows similar performance of early plasma creatinine and NGAL in patients with preserved preoperative renal function. It demonstrates that creatinine, as well as NGAL, differentiate subsets of patients developing AKI of clinically more advanced grade early after 2 h, also when used single and uncombined.
BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication | 2009
Rok Gajšek; Vitomir Struc; Simon Dobrisek; Janez Žibert; Nikola Pavesic
The paper presents our initial attempts in building an audio video emotion recognition system. Both, audio and video sub-systems are discussed, and description of the database of spontaneous emotions is given. The task of labelling the recordings from the database according to different emotions is discussed and the measured agreement between multiple annotators is presented. Instead of focusing on the prosody in audio emotion recognition, we evaluate the possibility of using linear transformations (CMLLR) as features. The classification results from audio and video sub-systems are combined using sum rule fusion and the increase in recognition results, when using both modalities, is presented.
Brain and Language | 2016
Veronika Rutar Gorišek; Vlasta Zupanc Isoski; Aleš Belič; Christina Manouilidou; Blaž Koritnik; Jure Bon; Nuška Pečarič Meglič; Matej Vrabec; Janez Žibert; Grega Repovs; Janez Zidar
Brocas region and adjacent cortex presumably take part in working memory (WM) processes. Electrophysiologically, these processes are reflected in synchronized oscillations. We present the first study exploring the effects of a stroke causing Brocas aphasia on these processes and specifically on synchronized functional WM networks. We used high-density EEG and coherence analysis to map WM networks in ten Brocas patients and ten healthy controls during verbal WM task. Our results demonstrate that a stroke resulting in Brocas aphasia also alters two distinct WM networks. These theta and gamma functional networks likely reflect the executive and the phonological processes, respectively. The striking imbalance between task-related theta synchronization and desynchronization in Brocas patients might represent a disrupted balance between task-positive and WM-irrelevant functional networks. There is complete disintegration of left fronto-centroparietal gamma network in Brocas patients, which could reflect the damaged phonological loop.
text speech and dialogue | 2008
Rok Gajšek; Janez Žibert
In the article we evaluate different techniques of acoustic modeling for speech recognition in the case of limited audio resources. The objective was to build different sets of acoustic models, the first was trained on a small set of telephone speech recordings and the other was trained on a bigger database with broadband speech recordings and later adapted to a different audio environment. Different adaptation methods (MLLR, MAP) were examined in combination with different parameterization features (MFCC, PLP, RPLP). We show that using adaptation methods, which are mainly used for speaker adaptation purposes, can increase the robustness of speech recognition in cases of mismatched training and working acoustic environment conditions.
Automatika: Journal for Control, Measurement, Electronics, Computing and Communications | 2016
Tadej Justin; Janez Žibert
Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the users own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI- capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.
text speech and dialogue | 2008
Bostjan Vesnicer; Janez Žibert; Elmar Nöth
We present an objective-evaluation method of the prosody modeling in an HMM-based Slovene speech-synthesis system. Method is based on the results of the automatic recognition of syntactic-prosodic boundary positions and accented words in the synthetic speech. We have shown that the recognition results represent a close match with the prosodic notations, labeled by the human expert on the natural-speech counterpart that was used to train the speech-synthesis system. The recognition rate of the prosodic events is proposed as an objective evaluation measure for the quality of the prosodic modeling in the speech-synthesis system. The results of the proposed evaluation method are also in accordance with previous subjective-listening assesment evaluations, where high scores for the naturalness for such a type of speech synthesis were observed.
Atmospheric Environment | 2016
Janez Žibert; Jure Cedilnik; Jure Pražnikar