Sérgio Paulo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sérgio Paulo is active.

Explore More

Publication

Featured researches published by Sérgio Paulo.

processing of the portuguese language | 2008

DIXI --- A Generic Text-to-Speech System for European Portuguese

Sérgio Paulo; Luís C. Oliveira; Carlos Mendes; Luís Figueira; Renato Cassaca; Céu Viana; Helena Moniz

This paper describes a new generic text-to-speech synthesis system, developed in the scope of the Tecnovoz Project. Although it was primarily targeted at speech synthesis in European Portuguese, its modular architecture and flexible components allows its use for different languages. We also provide a survey on the development of the language resources needed by the TTS.

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

Multilevel annotation of speech signals using weighted finite state transducers

Sérgio Paulo; Luís C. Oliveira

The purpose of this work was the development of a set of tools to automate the process of multilevel annotation of speech signals, preserving the alignments of the utterances different levels of the linguistic representation. Our goal is to build speech databases, using speech from non professional speakers with multilevel relational annotations, that can be used for the development of concatenative-based text-to-speech synthesizers or for training and testing statistical models. The method is based on the linguistic analysis of the transcription of the spoken material performed by a TTS system. The predicted phone sequence is then compared with the sequence produced by the speaker. The problem of aligning these two sequences is solved in a language-independent way using Weighted Finite State Transducers. After the alignment, a re-synchronization procedure is applied to the remaining levels to put them in agreement with the spoken utterance.

international conference natural language processing | 2004

Automatic Phonetic Alignment and Its Confidence Measures

Sérgio Paulo; Luís C. Oliveira

In this paper we propose the use of an HMM-based phonetic aligner together with a speech-synthesis-based one to improve the accuracy of the global alignment system. We also present a phone duration-independent measure to evaluate the accuracy of the automatic annotation tools. In the second part of the paper we propose and evaluate some new confidence measures for phonetic annotation.

Multimedia Tools and Applications | 2016

Automating live and batch subtitling of multimedia contents for several European languages

Aitor Álvarez; Carlos Mendes; Matteo Raffaelli; Tiago Luís; Sérgio Paulo; Nicola Piccinini; Haritz Arzelus; João Paulo Neto; Carlo Aliprandi; Arantza del Pozo

The subtitling demand of multimedia content has grown quickly over the last years, especially after the adoption of the new European audiovisual legislation, which forces to make multimedia content accessible to all. As a result, TV channels have been moved to produce subtitles for a high percentage of their broadcast content. Consequently, the market has been seeking subtitling alternatives more productive than the traditional manual process. The large effort dedicated by the research community to the development of Large Vocabulary Continuous Speech Recognition (LVCSR) over the last decade has resulted in significant improvements on multimedia transcription, becoming the most powerful technology for automatic intralingual subtitling. This article contains a detailed description of the live and batch automatic subtitling applications developed by the SAVAS consortium for several European languages based on proprietary LVCSR technology specifically tailored to the subtitling needs, together with results of their quality evaluation.

processing of the portuguese language | 2003

Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

Sérgio Paulo; Luís C. Oliveira

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.

conference of the international speech communication association | 2003