Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fernando Batista is active.

Publication


Featured researches published by Fernando Batista.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts

Fernando Batista; Helena Moniz; Isabel Trancoso; Nuno J. Mamede

This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.


Speech Communication | 2008

Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Fernando Batista; Diamantino Caseiro; Nuno J. Mamede; Isabel Trancoso

The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, including lexica, written newspaper corpora and speech transcriptions. Finite state transducers produced the best results for written newspaper corpora, but the maximum entropy approach also proved to be a good choice, suitable for the capitalization of speech transcriptions, and allowing straightforward on-the-fly capitalization. Evaluation results are presented both for written newspaper corpora and for broadcast news speech transcriptions. The frequency of each punctuation mark in BN speech transcriptions was analyzed for three different languages: English, Spanish and Portuguese. The punctuation task was performed using a maximum entropy modeling approach, which combines different types of information both lexical and acoustic. The contribution of each feature was analyzed individually and separated results for each focus condition are given, making it possible to analyze the performance differences between planned and spontaneous speech. All results were evaluated on speech transcriptions of a Portuguese broadcast news corpus. The benefits of enriching speech recognition with punctuation and capitalization are shown in an example, illustrating the effects of described experiments into spoken texts.


artificial intelligence methodology systems applications | 2006

Cooking an ontology

Ricardo Ribeiro; Fernando Batista; Joana Paulo Pardal; Nuno J. Mamede; H. Sofia Pinto

An effective solution to the problem of extending a dialogue system to new knowledge domains requires a clear separation between the knowledge and the system: as ontologies are used to conceptualize information, they can be used as a means to improve the separation between the dialogue system and the domain information. This paper presents the development of an ontology for the cooking domain, to be integrated in a dialog system. The ontology comprehends four main modules covering the key concepts of the cooking domain – actions, food, recipes, and utensils – and three auxiliary modules – units and measures, equivalencies and plate types.


ieee international conference on fuzzy systems | 2014

Twitter Topic Fuzzy Fingerprints

Hugo Rosa; Fernando Batista; João Paulo Carvalho

In this paper we propose to approach the subject of Twitter Topic Detection using a new technique called Topic Fuzzy Fingerprints. A comparison is made with two popular text classification techniques, Support Vector Machines (SVM) and fc-Nearest Neighbours (fcNN). Preliminary results show that Twitter Topic Fuzzy Fingerprints outperforms the other two techniques achieving better Precision and Recall, while still being much faster, which is an essential feature when processing large volumes of streaming data.


Speech Communication | 2014

Speaking style effects in the production of disfluencies

Helena Moniz; Fernando Batista; Ana Isabel Mata; Isabel Trancoso

Abstract This work explores speaking style effects in the production of disfluencies. University lectures and map-task dialogues are analyzed in order to evaluate if the prosodic strategies used when uttering disfluencies vary across speaking styles. Our results show that the distribution of disfluency types is not arbitrary across lectures and dialogues. Moreover, although there is a statistically significant cross-style strategy of prosodic contrast marking (pitch and energy increases) between the region to repair and the repair of fluency, this strategy is displayed differently depending on the specific speech task. The overall patterns observed in the lectures, with regularities ascribed for speaker and disfluency types, do not hold with the same strength for the dialogues, due to underlying specificities of the communicative purposes. The tempo patterns found for both speech tasks also confirm their distinct behaviour, evidencing the more dynamic tempo characteristics of dialogues. In university lectures, prosodic cues are given to the listener both for the units inside disfluent regions and between these and the adjacent contexts. This suggests a stronger prosodic contrast marking of disfluency–fluency repair when compared to dialogues, as if teachers were monitoring the different regions – the introduction to a disfluency, the disfluency itself and the beginning of the repair – demarcating them in very contrastive ways.


ieee international conference on fuzzy systems | 2012

A critical survey on the use of Fuzzy Sets in Speech and Natural Language Processing

João Paulo Carvalho; Fernando Batista; Luísa Coheur

This paper shows how the use and applications of Fuzzy Sets (FS) in Speech and Natural Language Processing (SNLP) have seen a steady decline to a point where FS are virtually unknown or unappealing for most of the researchers currently working in the SNLP field, tries to find the reasons behind this decline, and proposes some guidelines on what could be done to reverse it and make FS assume a relevant role in SNLP.


meeting of the association for computational linguistics | 2008

Language Dynamics and Capitalization using Maximum Entropy

Fernando Batista; Nuno J. Mamede; Isabel Trancoso

This paper studies the impact of written language variations and the way it affects the capitalization task over time. A discriminative approach, based on maximum entropy models, is proposed to perform capitalization, taking the language changes into consideration. The proposed method makes it possible to use large corpora for training. The evaluation is performed over newspaper corpora using different testing periods. The achieved results reveal a strong relation between the capitalization performance and the elapsed time between the training and testing data periods.


ieee international conference on fuzzy systems | 2015

Twitter gender classification using user unstructured information

Marco Vicente; Fernando Batista; João Paulo Carvalho

This paper describes an approach to automatically detect the gender of Twitter users, based only on clues provided by their profile information in an unstructured form. A number of features that capture phenomena specific of Twitter users is proposed and evaluated on a dataset of about 242K English language users. Different supervised and unsupervised approaches are used to assess the performance of the proposed features, including Naive Bayes variants, Logistic Regression, Support Vector Machines, Fuzzy c-Means clustering, and K-means. An unsupervised approach based on Fuzzy c-Means proved to be very suitable for this task, returning the correct gender for about 96% of the users.


joint ifsa world congress and nafips annual meeting | 2013

Towards intelligent mining of public social networks' influence in society

João Paulo Carvalho; Vasco Calais Pedro; Fernando Batista

This paper presents an overview of a proposed framework for the intelligent mining of the influence of public social networks in society. The framework consists of a data collection platform and a set of intelligent social-data processing modules. These modules identify relevant trending social topics and relevant actors within those topics, and trace back in time the evolution of those topics in the public social networks. The framework creates quantitative and qualitative indicators that can be used to analyze the role and the influence of the social networks and of their main actors in society.


Intonational Grammar in Ibero-Romance: Approaches across linguistic subfields | 2016

Stylistic variation in the intonation of European Portuguese teenagers and adults

Ana Isabel Mata; Helena Moniz; Fernando Batista

Studies of emerging prosody from the word to the phrase, integrating various sources of evidence, are scarce, and our understanding of the pathways of prosodic development is still very limited. An investigation of emerging intonation and prosodic phrasing was undertaken on the basis of production data on intonation and duration patterns from the speech of two European Portuguese children between 1;00 and 2;04. The results show that both the development of intonation and phrasing were found to precede the onset of combinatorial speech, and to coincide in time with critical points in lexical development. Prosodic phrasing evolved in three steps, by the unfolding of key prosodic levels. Implications of these results are discussed in relation to early prosodic development across languages.

Collaboration


Dive into the Fernando Batista's collaboration.

Top Co-Authors

Avatar

Isabel Trancoso

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nuno J. Mamede

Technical University of Lisbon

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge