Thiago Alexandre Salgueiro Pardo
University of São Paulo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Hotspot
Dive into the research topics where Thiago Alexandre Salgueiro Pardo is active.
Publication
Featured researches published by Thiago Alexandre Salgueiro Pardo.
Archive | 2003
Jorge Baptista; Nuno J. Mamede; Sara Candeias; Ivandré Paraboni; Thiago Alexandre Salgueiro Pardo; Maria das Graças Volpe Nunes
This paper reports findings from an analysis of errors made by an automatic speech recogniser trained and tested with 3-10-year-old European Portuguese childrens speech. We expected and were able to identify frequent pronunciation error patterns in the childrens speech. Furthermore, we were able to correlate some of these pronunciation error patterns and automatic speech recognition errors. The findings reported in this paper are of phonetic interest but will also be useful for improving the performance of automatic speech recognisers aimed at children representing the target population of the study.This book constitutes the refereed proceedings of the 11th International Workshop on Computational Processing of the Portuguese Language, PROPOR 2014, held in Sao Carlos, Brazil, in October 2014. The 14 full papers and 19 short papers presented in this volume were carefully reviewed and selected from 63 submissions. The papers are organized in topical sections named: speech language processing and applications; linguistic description, syntax and parsing; ontologies, semantics and lexicography; corpora and language resources and natural language processing, tools and applications.
document engineering | 2008
Sandra Maria Aluísio; Lucia Specia; Thiago Alexandre Salgueiro Pardo; Erick Galani Maziero; Renata Pontin de Mattos Fortes
In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese and propose simplification strategies for this language. This study illustrates the need for text simplification to facilitate accessibility to information by poor literacy readers and potentially by people with other cognitive disabilities. It also highlights characteristics of simplification for Portuguese, which may differ from other languages. Such study consists of the first step towards building Brazilian Portuguese text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by relevant news agencies.
processing of the portuguese language | 2003
Thiago Alexandre Salgueiro Pardo; Lucia Helena Machado Rino; Maria das Graças Volpe Nunes
This paper presents a new extractive approach to automatic summarization based on the gist of the source text. The gist-based system, called GistSumm (GIST SUMMarizer), uses the gist as a guideline to identify and select text segments to include in the final extract. Automatically produced extracts have been evaluated under the light of gist preservation and textuality.
international conference on design of communication | 2009
Willian Massami Watanabe; Arnaldo Candido Junior; Vinícius Rodrigues de Uzêda; Renata Pontin de Mattos Fortes; Thiago Alexandre Salgueiro Pardo; Sandra Maria Aluísio
Texts are the media content primarily available on Web sites and applications. However, this heavy use of texts creates an accessibility barrier to those who cannot read fluently in their mother tongue due to both text length and linguistic complexity. To offer an accessible alternative to these readers, shorter and simplified versions of text content should be provided. Taking that into consideration, this paper introduces Facilita, an assistive technology to help lower-literacy users to understand the text content of Web applications. Facilita generates an accessible content from Web pages automatically, using summarization and simplification techniques. It is also important to consider interface design requirements, since Facilitas target audience (the functionally illiterate) is often classified as computer illiterate as well. Thus, interaction and user interface design were developed considering the limitations and skills of the functionally illiterate.
workshop on innovative use of nlp for building educational applications | 2009
Arnaldo Candido; Erick Galani Maziero; Lucia Specia; Caroline Gasperin; Thiago Alexandre Salgueiro Pardo; Sandra Maria Aluísio
In this paper we investigate the task of text simplification for Brazilian Portuguese. Our purpose is three-fold: to introduce a simplification tool for such language and its underlying development methodology, to present an on-line authoring system of simplified text based on the previous tool, and finally to discuss the potentialities of such technology for education. The resources and tools we present are new for Portuguese and innovative in many aspects with respect to previous initiatives for other languages.
brazilian symposium on multimedia and the web | 2008
Erick Galani Maziero; Thiago Alexandre Salgueiro Pardo; Ariani Di Felippo; Bento Carlos Dias-da-Silva
In this paper, we describe the TeP 2.0 -- Electronic Thesaurus for Brazilian Portuguese -- which stores sets of synonym and antonym word forms. Specifically, we present the lexical database and the Web interface of TeP 2.0.
International Journal of Modern Physics C | 2008
Diego R. Amancio; Lucas Antiqueira; Thiago Alexandre Salgueiro Pardo; Luciano da Fontoura Costa; Osvaldo Novais Oliveira; Maria das Graças Volpe Nunes
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by MT tools can be distinguished from their manual counterparts by means of metrics such as in- (ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better MT tools and automatic evaluation metrics.
ACM Transactions on Speech and Language Processing | 2010
Vinícius Rodrigues de Uzêda; Thiago Alexandre Salgueiro Pardo; Maria das Graças Volpe Nunes
Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on Rhetorical Structure Theory (RST), claimed to be among the best ones. We compare our results to superficial summarizers, which belong to a paradigm with severe limitations, and to hybrid methods, combining RST and superficial methods. We also test voting systems and machine learning techniques trained on RST features. We run experiments for English and Brazilian Portuguese languages and compare the results obtained by using manually and automatically parsed texts. Our results systematically show that all RST methods have comparable overall performance and that they outperform most of the superficial methods. Machine learning techniques achieved high accuracy in the classification of text segments worth of being in the summary, but were not able to produce more informative summaries than the regular RST methods.
international conference on design of communication | 2008
Sandra Maria Aluísio; Lucia Specia; Thiago Alexandre Salgueiro Pardo; Erick Galani Maziero; Helena de Medeiros Caseli; Renata Pontin de Mattos Fortes
In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese (BP). This study illustrates the need for text simplification to facilitate accessibility to information by poor readers and by people with cognitive disabilities. It also highlights features of simplification for BP, which may differ from other languages. Moreover, we propose simplification strategies and a Simplification Annotation Editor. This study consists of the first step towards building BP text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by news agencies.
international conference on communications, circuits and systems | 2006
Thiago Alexandre Salgueiro Pardo; Lucas Antiqueira; M. das Gracas Nunes; Osvaldo Novais Oliveira; L. da F. Costa
The ability to access embedded knowledge makes complex networks extremely promising for natural language processing, which normally requires deep knowledge representation that is not accessible with first-order statistics. In this paper, we demonstrate that features of complex networks, which have been shown to correlate with text quality, can be used to evaluate summaries. The metrics are the average degree, cluster coefficient, and the extent to which the dynamics of network growth deviates from a straight line. They were found to be much smaller for the high-quality, manual summaries, and increased for automatic summaries, thus pointing to a loss of quality, as expected. We also discuss the comparative performance of automatic summarizers.