Francesc Alías
La Salle University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francesc Alías.
IEEE Transactions on Multimedia | 2012
Xavier Valero; Francesc Alías
In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
T. Trilla; Francesc Alías
Current research to improve state of the art Text-To-Speech (TTS) synthesis studies both the processing of input text and the ability to render natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this article investigates the proper adaptation of a Sentiment Analysis procedure (positive/neutral/negative) that can then be used as an input feature for expressive speech synthesis. To this end, we evaluate different combinations of textual features and classifiers to determine the most appropriate adaptation procedure. The effectiveness of this scheme for Sentiment Analysis is evaluated using the Semeval 2007 dataset and a Twitter corpus, for their affective nature and their granularity at the sentence level, which is appropriate for an expressive TTS scenario. The experiments conducted validate the proposed procedure with respect to the state of the art for Sentiment Analysis.
non linear speech processing | 2009
Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Elisa Martínez; Francesc Alías; Carlos Monzo
This paper presents an automatic system able to enhance expressiveness in speech corpora recorded from acted or stimulated speech. The system is trained with the results of a subjective evaluation carried out on a reduced set of the original corpus. Once the system has been trained, it is able to check the complete corpus and perform an automatic pruning of the unclear utterances, i.e. with expressive styles which are different from the intended corpus. The content which most closely matches the subjective classification remains in the resulting corpus. An expressive speech corpus in Spanish, designed and recorded for speech synthesis purposes, has been used to test the presented proposal. The automatic refinement has been applied to the whole corpus and the result has been validated with a second subjective test.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Francesc Alías; Xavier Sevillano; Joan Claudi Socoró; Xavier Gonzalvo
This paper is a contribution to the recent advancements in the development of high-quality next generation text-to-speech (TTS) synthesis systems. Two of the hottest research topics in this area are oriented towards the improvement of speech expressiveness and flexibility of synthesis. In this context, this paper presents a new TTS strategy called multidomain TTS (MD-TTS) for synthesizing among different domains. Although the multidomain philosophy has been widely applied in spoken language systems, few research efforts have been conducted to extend it to the TTS field. To do so, several proposals are described in this paper. First, a text classifier (TC) is included in the classic TTS architecture in order to automatically conduct the selection of the most appropriate domain for synthesizing the input text. In contrast to classic topic text classification tasks, the MD-TTS TC should not only consider the contents of text but also its structure. To this end, this paper introduces a new text modeling scheme based on an associative relational network, which represents texts as a directional weighted word-based graph. The conducted experiments validate the proposal in terms of both objective (TC efficiency) and subjective (perceived synthetic speech quality) evaluation criteria.
genetic and evolutionary computation conference | 2006
Xavier Llorà; Kumara Sastry; Francesc Alías; David E. Goldberg; Michael Welge
This paper builds introduces visual-analytic techniques to aggregate, summarize, and visualize the information generated during interactive evolutionary processes. Special visualizations of the user-provided partial ordering of solutions, the synthetic fitness surrogates induced, and the model of user preferences were prepared. The proposed visual-analytic techniques point out potential pitfalls, strengths, and possible improvements in a non-trivial case study where the hierarchical tournament selection scheme of an active interactive genetic algorithm is replaced by an incremental selection scheme. Visual analytics provided an intuitive reasoning environment that unveiled important properties that greatly affect the performance of active interactive genetic algorithms that could not have been easily reveled otherwise.
Journal of the Acoustical Society of America | 2013
Marc Arnela; Oriol Guasch; Francesc Alías
One of the key effects to model in voice production is that of acoustic radiation of sound waves emanating from the mouth. The use of three-dimensional numerical simulations allows to naturally account for it, as well as to consider all geometrical head details, by extending the computational domain out of the vocal tract. Despite this advantage, many approximations to the head geometry are often performed for simplicity and impedance load models are still used as well to reduce the computational cost. In this work, the impact of some of these simplifications on radiation effects is examined for vowel production in the frequency range 0-10 kHz, by means of comparison with radiation from a realistic head. As a result, recommendations are given on their validity depending on whether high frequency energy (above 5 kHz) should be taken into account or not.
international acm sigir conference on research and development in information retrieval | 2006
Xavier Sevillano; Germán Cobo; Francesc Alías; Joan Claudi Socoró
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. As a first step towards building robust document clusterers, a strategy based on feature diversity and cluster ensembles is presented in this work. Experiments conducted on a binary clustering problem show that our method is robust to near-optimal model order selection and able to detect constructive interactions between different document representations in the test bed.
non-linear speech processing | 2007
Ignasi Iriondo; Santiago Planet; Joan-Claudi Socoró; Francesc Alías
This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis.
international conference on acoustics, speech, and signal processing | 2006
Francesc Alías; Xavier Llorà; Lluís Formiga; Kumara Sastry; David E. Goldberg
The quality of corpus-based text-to-speech systems depends on the accuracy of the unit selection process, which in turn relies on the cost function definition. This function should map the user perceptual preference when selecting synthesis units, which is a very difficult task. This paper continues our previous work on fusing the human judgements with the cost function by means of interactive weight tuning. The application of active interactive genetics algorithms mitigates user fatigue by improving user consistency. As a result, the obtained weights generate more natural synthetic speech when compared to previous objective and subjective proposals
Fuzzy Sets and Systems | 2012
Xavier Sevillano; Francesc Alías; Joan Claudi Socoró
Consensus clustering, i.e. the task of combining the outcomes of several clustering systems into a single partition, has lately attracted the attention of researchers in the unsupervised classification field, as it allows the creation of clustering committees that can be applied with multiple interesting purposes, such as knowledge reuse or distributed clustering. However, little attention has been paid to the development of algorithms, known as consensus functions, especially designed for consolidating the outcomes of multiple fuzzy (or soft) clustering systems into a single fuzzy partition-despite the fact that fuzzy clustering is far more informative than its crisp counterpart, as it provides information regarding the degree of association between objects and clusters that can be helpful for deriving richer descriptive data models. For this reason, this paper presents a set of fuzzy consensus functions capable of creating soft consensus partitions by fusing a collection of fuzzy clusterings. Our proposals base clustering combination on a cluster disambiguation process followed by the application of positional and confidence voting techniques. The modular design of these algorithms makes it possible to sequence their constituting steps in different manners, which allows to derive versions of the proposed consensus functions optimized from a computational standpoint. The proposed consensus functions have been evaluated in terms of the quality of the consensus partitions they deliver and in terms of their running time on multiple benchmark data sets. A comparison against several representative state-of-the-art consensus functions reveals that our proposals constitute an appealing alternative for conducting fuzzy consensus clustering, as they are capable of yielding high quality consensus partitions at a low computational cost.