Sotiris Karabetsos
Technological Educational Institute of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sotiris Karabetsos.
IEEE Transactions on Consumer Electronics | 2009
Sotiris Karabetsos; Pirros Tsiakoulis; Aimilios Chalamandaris; Spyros Raptis
Nowadays, unit selection based text-to-speech technology is the mainstream approach for near natural speech synthesis systems. However, this is achieved at the expense of raised requirements in terms of computational resources. This work describes design and implementation approaches for the efficient integration of this technology in computational environments with limited resources, such as mobile devices, with no considerable speech quality degradation. In particular, the issues of database reduction, acoustic inventory compression and runtime computational load minimization are mainly addressed in this paper. Both objective and subjective assessments confirm the effectiveness of these approaches in terms of constructing a general purpose embedded unit selection TTS system and reducing the computational requirements while maintaining high speech quality.
IEEE Photonics Technology Letters | 2012
Sotiris Karabetsos; Evangelos Pikasis; Thomas Nikas; Athanase Nassiopoulos; Dimitris Syvridis
In this letter, the discrete Fourier transform-spread discrete multitone modulation (DFT-spread DMT) scheme has been used for the first time in laser-based short range optical transmission (over 100 m) with 1-mm step-index plastic optical fiber link, and its superior performance, compared with the standard DMT scheme, was demonstrated. The DFT-spread DMT combines the advantages of both single carrier and multicarrier transmission together with a lower peak-to-average power ratio. Under the same experimental conditions, a lower bit-error-rate is achieved with DFT-spread DMT compared with standard DMT for transmission rates in the 1-Gb/s range.
IEEE Signal Processing Letters | 2010
Sotiris Karabetsos; Pirros Tsiakoulis; Aimilios Chalamandaris; Spyros Raptis
This letter introduces one-class classification as a framework for the spectral join cost calculation in unit selection speech synthesis. Instead of quantifying the spectral cost by a single distance measure, a data-driven approach is adopted which exploits the natural similarity of consecutive speech frames in the speech database. A pair of consecutive frames is jointly represented as a vector of spectral distance measures which provide training data for the one-class classifier. At synthesis runtime, speech units are selected based on the scores derived from the classifier. Experimental results provide evidence on the effectiveness of the proposed method which clearly outperforms the conventional approaches currently employed.
hellenic conference on artificial intelligence | 2014
Pirros Tsiakoulis; Sotiris Karabetsos; Aimilios Chalamandaris; Spyros Raptis
This paper presents an overview of the Text-to-Speech synthesis system developed at the Institute for Language and Speech Processing (ILSP). It focuses on the key issues regarding the design of the system components. The system currently fully supports three languages (Greek, English, Bulgarian) and is designed in such a way to be as language and speaker independent as possible. Also, experimental results are presented which show that the system produces high quality synthetic speech in terms of naturalness and intelligibility. The system was recently ranked among the first three systems worldwide in terms of achieved quality for the English language, at the international Blizzard Challenge 2013 workshop.
IEEE Transactions on Consumer Electronics | 2008
Spiros Mikroulis; Sotiris Karabetsos; Evagelos Pikasis; Athanase Nassiopoulos
The increasing demand for high data rates in wireless networks indicates radio-over-fiber (RoF) as an excellent candidate for physical layer infrastructure in the development of future broadband communication systems. In this work, a thorough investigation on the performance of an intensity-modulation direct detection (IM-DD) RoF system utilizing orthogonal frequency division multiplexing (OFDM) is performed, emphasizing on the transmitters impairments, and to what extend the above limitations are compensated by typical demodulation techniques implied in OFDM technology. It is shown that an acceptable performance is depicted up to 5.8 GHz carrier frequency, by a proper adjustment of the lasers current RF amplitude level.
international conference on signal and image processing applications | 2009
Aimilios Chalamandaris; Pirros Tsiakoulis; Sotiris Karabetsos; Spyros Raptis
In a Text-to-Speech system based on time-domain techniques that employ pitch-synchronous manipulation of the speech waveforms, one of the most important issues that affect the output quality is the way the analysis points of the speech signal are estimated and the actual points, i.e. the analysis pitchmarks. In this paper we present our methodology for calculating the pitchmarks of a speech waveform, a pitchmark detection algorithm, which after thorough experimentation and in comparison with other algorithms, proves to behave better with our TD-PSOLA-based Text-to-Speech synthesizer (Time-Domain Pitch-Synchronous Overlap Add Text to Speech System).
USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion | 2009
Aimilios Chalamandaris; Spyros Raptis; Pirros Tsiakoulis; Sotiris Karabetsos
Blind people and in general print-impaired people are often restricted to use their own computers, enhanced most often with expensive, screen reading programs, in order to access the web, and in a form that every screen reading program allows to. In this paper we present SpellCast Navi , a tool that is intended for people with visual impairments, which attempts to combine advantages from both customized and generic web enhancement tools. It consists of a generically designed engine and a set of case-specific filters. It can run on a typical web browser and computer, without the need of installing any additional application locally. It acquires and parses the content of web pages, converts bi-lingual text into synthetic speech using high quality speech synthesizer, and supports a set of common functionalities such as navigation through hotkeys, audible navigation lists and more. By using a post-hoc approach based on a-priori information of the websites layout, the audible presentation and navigation through the website is more intuitive a more efficient than with a typical screen reading application. SpellCast Navi poses no requirements on web pages and introduces no overhead to the design and development of a website, as it functions as a hosted proxy service.
panhellenic conference on informatics | 2016
Pirros Tsiakoulis; Spyros Raptis; Sotiris Karabetsos; Aimilios Chalamandaris
This work explores affective word ratings as an auxiliary target cost for unit-selection-based concatenative speech synthesis. The method does not require task-specific crafted corpora, nor does it rely on additional annotations, making it ideal for found data. Following the general philosophy of our text-to-speech system, the approach does not enforce any explicit prosodic model, instead the affect information is implicitly modeled via its contribution to the unit-selection cost function. The auxiliary affective feature vector comprises of continuous ratings in three dimensions (valence, arousal and dominance), extracted at the word level via state-of-the-art sentiment analysis techniques. In this case study, speech data consists of several professionally-produced childrens audiobooks totaling about 5 hours of speech. The affective dimensions are shown to correlate well with acoustic/prosodic features extracted from the speech data, highlighting their utility for the affective speech synthesis. This is further confirmed via a preference listening test between the baseline and the affective voice.
international conference on human-computer interaction | 2014
Spyros Raptis; Aimilios Chalamandaris; Pirros Tsiakoulis; Sotiris Karabetsos
A system is presented that offers a set of complementary services based on text-to-speech technology. The services and the underlying system that supports them are described. These services include: (a) a service for automatic document-to-speech conversion via e-mail, (b) an open library of audio books, and (c) a dynamic audio news service. The system seeks to maximize the availability and the social impact of text-to-speech technology, making its benefits widely available to the public through open services that address important daily needs of persons with visual impairments and reading difficulties.
IEEE Photonics Technology Letters | 2012
Evangelos Pikasis; Sotiris Karabetsos; Nikos Raptis; Dimitris Syvridis
In this letter, the code division multiple access discrete multitone (CDMA-DMT) modulation scheme is experimentally investigated for transmission over an intensity modulation direct-detection link of 100 m of 1-mm polymethyl methacrylate step-index plastic optical fibers. CDMA-DMT is a multicarrier modulation scheme combining the merits of both CDMA and DMT and in which CDMA is utilized to perform a spreading operation of the data symbols in the frequency domain using orthogonal spreading codes. It is experimentally shown that CDMA-DMT performs better than conventional DMT in terms of achieved bit-error rate, for transmission rates in the range of 1 Gbps and over the same experimental configuration.